20 Gc Log Reading and Analysis Case Analysis Part 3

20 GC Log Reading and Analysis Case Analysis Part 3 #

Review: G1 stands for Garbage-First, which means it prioritizes cleaning up the region with the most garbage.

For G1-related tuning parameters, you can refer to:

https://www.oracle.com/technical-resources/articles/java/g1gc.html

Example of using G1:

# Note that there is no line break when starting from the command line
java -XX:+UseG1GC
-Xms512m
-Xmx512m
-Xloggc:gc.demo.log
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
demo.jvm0204.GCLogAnalysis

After running the program, let’s take a look at what the G1 log looks like:

Java HotSpot(TM) 64-Bit Server VM (25.162-b12) ......
Memory: 4k page,physical 16777216k(709304k free)

CommandLine flags: -XX:InitialHeapSize=536870912
  -XX:MaxHeapSize=536870912
  -XX:+PrintGC -XX:+PrintGCDateStamps
  -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
  -XX:+UseCompressedClassPointers -XX:+UseCompressedOops
  -XX:+UseG1GC

2019-12-23T01:45:40.605-0800: 0.181:
 [GC pause (G1 Evacuation Pause) (young),0.0038577 secs]
   [Parallel Time: 3.1 ms,GC Workers: 8]
   ...... Multiple lines omitted
   [Code Root Fixup: 0.0 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.2 ms]
   [Other: 0.6 ms]
     ...... Multiple lines omitted
   [Eden: 25.0M(25.0M)->0.0B(25.0M)
    Survivors: 0.0B->4096.0K Heap: 28.2M(512.0M)->9162.7K(512.0M)]
 [Times: user=0.01 sys=0.01,real=0.00 secs]

2019-12-23T01:45:40.881-0800: 0.456:
 [GC pause (G1 Evacuation Pause) (young) (to-space exhausted),0.0147955 secs]
   [Parallel Time: 12.3 ms,GC Workers: 8]
   ...... Multiple lines omitted
   [Eden: 298.0M(298.0M)->0.0B(63.0M)
    Survivors: 9216.0K->26.0M
    Heap: 434.1M(512.0M)->344.2M(512.0M)]
 [Times: user=0.02 sys=0.05,real=0.02 secs]

2019-12-23T01:45:41.563-0800: 1.139:
 [GC pause (G1 Evacuation Pause) (mixed),0.0042371 secs]
   [Parallel Time: 3.7 ms,GC Workers: 8]
   ...... Multiple lines omitted
   [Eden: 20.0M(20.0M)->0.0B(34.0M) Survivors: 5120.0K->4096.0K Heap: 393.7M(512.0M)->358.5M(512.0M)]
 [Times: user=0.02 sys=0.00,real=0.00 secs]

2019-12-23T01:45:41.568-0800: 1.144: [GC pause (G1 Humongous Allocation) (young) (initial-mark),0.0012116 secs]
   [Parallel Time: 0.7 ms,GC Workers: 8]
      ...... Multiple lines omitted
   [Other: 0.4 ms]
      [Humongous Register: 0.1 ms]
      [Humongous Reclaim: 0.0 ms]
   [Eden: 2048.0K(34.0M)->0.0B(33.0M)
    Survivors: 4096.0K->1024.0K
    Heap: 359.5M(512.0M)->359.0M(512.0M)]
 [Times: user=0.01 sys=0.00,real=0.00 secs]
2019-12-23T01:45:41.569-0800: 1.145: [GC concurrent-root-region-scan-start]
2019-12-23T01:45:41.569-0800: 1.145: [GC concurrent-root-region-scan-end,0.0000360 secs]
2019-12-23T01:45:41.569-0800: 1.145: [GC concurrent-mark-start]
2019-12-23T01:45:41.571-0800: 1.146: [GC concurrent-mark-end,0.0015209 secs]
2019-12-23T01:45:41.571-0800: 1.146: [GC remark
  2019-12-23T01:45:41.571-0800: 1.147: [Finalize Marking,0.0002456 secs]
  2019-12-23T01:45:41.571-0800: 1.147: [GC ref-proc,0.0000504 secs]
  2019-12-23T01:45:41.571-0800: 1.147: [Unloading,0.0007297 secs],
 0.0021658 secs]
 [Times: user=0.01 sys=0.00,real=0.00 secs]
2019-12-23T01:45:41.573-0800: 1.149: [GC cleanup 366M->366M(512M),0.0006795 secs]
 [Times: user=0.00 sys=0.00,real=0.00 secs]

Heap
 garbage-first heap total 524288K,used 381470K [......
  region size 1024K,12 young (12288K),1 survivors (1024K)
 Metaspace used 3331K,capacity 4494K,committed 4864K,reserved 1056768K
  class space used 364K,capacity 386K,committed 512K,reserved 1048576K

The above is an excerpt from the GC log information. In actual execution of our sample program for one second, it may generate thousands of lines of GC log information.

Evacuation Pause: young (Evacuation Pause for Pure Young Generation) #

When the young generation space is full, the application threads will be paused, and the live objects in the young generation memory block will be copied to the survivor space. If there is no survivor space available, a portion of the free memory blocks will be selected as the survivor space.

This copying process is called evacuation, and it works in the same way as other young generation collectors introduced earlier.

The log information for the evacuation pause is quite long, so for simplicity, we have removed some unimportant information. We will provide a detailed explanation after the concurrent phase. Additionally, because the logs are extensive, the logs for the parallel and “other” phases will be divided into multiple parts for explanation.

Let’s extract some key information from the GC log:

2019-12-23T01:45:40.605-0800: 0.181:
 [GC pause (G1 Evacuation Pause) (young), 0.0038577 secs]
   [Parallel Time: 3.1 ms, GC Workers: 8]
      ...... details of worker threads, will be explained separately
   [Code Root Fixup: 0.0 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.2 ms]
   [Other: 0.6 ms]
      ...... other miscellaneous tasks, will be explained separately
   [Eden: 25.0M(25.0M)->0.0B(25.0M)
    Survivors: 0.0B->4096.0K Heap: 28.2M(512.0M)->9162.7K(512.0M)]
 [Times: user=0.01 sys=0.01, real=0.00 secs]

Let’s analyze together:

  1. [GC pause (G1 Evacuation Pause) (young), 0.0038577 secs]: G1 evacuation pause, in pure young generation mode; only cleans up the young generation space. This pause starts at 181ms after JVM startup and lasts for 0.0038577 seconds, which is 3.8ms.

  2. [Parallel Time: 3.1 ms, GC Workers: 8]: Indicates that the following activities are carried out by 8 worker threads in parallel, taking 3.1 milliseconds of real time; worker is a mode similar to a boss directing multiple workers.

  3. ......: For ease of reading, some content has been omitted, which can be referred to in the previous logs and will be explained immediately below.

  4. [Code Root Fixup: 0.0 ms]: Releases internal data used for managing parallel activities, generally close to zero. This process is executed serially.

  5. [Code Root Purge: 0.0 ms]: Cleans up other parts of data, which is also very fast and usually close to zero. This process is executed serially.

  6. [Other: 0.6 ms]: Time consumed by other activities, most of which are executed in parallel.

  7. ...: Please refer to the following text.

  8. [Eden: 25.0M(25.0M)->0.0B(25.0M)]: Usage/capacity of the Eden region before and after the pause.

  9. Survivors: 0.0B->4096.0K: Usage of the survivor space before and after the GC pause. Heap: 28.2M(512.0M)->9162.7K(512.0M)]: Usage of the entire heap memory before and after the pause.

  10. [Times: user=0.01 sys=0.01, real=0.00 secs]: Duration of the GC event.

Note: Real time, also known as wall clock time or elapsed time, refers to the time the system clock takes from the start of a program to its termination. Generally, the real time is slightly longer than CPU time.

The most intensive GC tasks are performed by multiple dedicated worker threads. The following log describes their behaviors:

[Parallel Time: 3.1 ms, GC Workers: 8]
 [GC Worker Start (ms): Min: 180.6, Avg: 180.6, Max: 180.7, Diff: 0.1]
 [Ext Root Scanning (ms): Min: 0.1, Avg: 0.3, Max: 0.6, Diff: 0.4, Sum: 2.1]
 [Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
   [Processed Buffers: Min: 0, Avg: 0.0, Max: 0, Diff: 0, Sum: 0]
 [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
 [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]
 [Object Copy (ms): Min: 2.2, Avg: 2.5, Max: 2.7, Diff: 0.4, Sum: 19.8]
 [Termination (ms): Min: 0.0, Avg: 0.2, Max: 0.4, Diff: 0.4, Sum: 1.5]
   [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 8]
 [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
 [GC Worker Total (ms): Min: 2.9, Avg: 3.0, Max: 3.0, Diff: 0.1, Sum: 23.7]
 [GC Worker End (ms): Min: 183.6, Avg: 183.6, Max: 183.6, Diff: 0.0]

Interpreting the log information for worker threads:

  1. [Parallel Time: 3.1 ms, GC Workers: 8]: As introduced earlier, this indicates that the following activities were executed in parallel by 8 threads, and it took 3.1ms of real time.
  2. GC Worker Start (ms): The number of milliseconds relative to the start of the pause when the GC worker threads started. If there is a large difference between Min and Max, it means that other processes on the machine are using too many threads, which would occupy the available CPU time for GC.
  3. Ext Root Scanning (ms): The time it took to scan the non-heap GC ROOT, such as classloaders, JNI references, JVM system roots, etc. The runtime is shown later, and “Sum” refers to CPU time.
  4. Update RS, Processed Buffers, and Scan RS are similar parts, where RS refers to the abbreviation of Remembered Set which can be referred to in previous chapters.
  5. Code Root Scanning (ms): The time it took to scan the roots in the actual code, such as local variables in thread stacks.
  6. Object Copy (ms): The time it took to copy the live objects in the collection set.
  7. Termination (ms): The time it took for the GC worker threads to ensure that they can safely terminate. During this time, they do nothing, and once completed, the GC threads stop running. This is called the termination wait time.
  8. Termination Attempts: How many times the GC worker threads tried to try and terminate. If a worker thread finds that there are still some tasks that haven’t been completed, this attempt is considered a failure, and termination cannot be done temporarily.
  9. GC Worker Other (ms): Other short tasks that are grouped together in the GC log because they have very short durations.
  10. GC Worker Total (ms): The total working time of the GC worker threads.
  11. [GC Worker End (ms): The time when the GC worker threads finished their work, in milliseconds relative to the start of this GC pause. Normally, these numbers should be roughly equal. If not, it means that too many threads were suspended, possibly due to the “noisy neighbor effect”.

In addition, during the evacuation pause, there are some miscellaneous small tasks.

[Other: 0.6 ms]
  [Choose CSet: 0.0 ms]
  [Ref Proc: 0.3 ms]
  [Ref Enq: 0.0 ms]
  [Redirty Cards: 0.1 ms]
  [Humongous Register: 0.0 ms]
  [Humongous Reclaim: 0.0 ms]
  [Free CSet: 0.0 ms]

Other miscellaneous tasks, with only part of them introduced here:

  1. [Other: 0.6 ms]: The time consumed by other activities, many of which are executed in parallel.
  2. Choose CSet: The time taken to select the Collection Set (CSet).
  3. [Ref Proc: 0.3 ms]: The time taken to process non-strong references, mainly for cleaning up or deciding whether to clean up.
  4. [Ref Enq: 0.0 ms]: Used to arrange the remaining non-strong references into the appropriate ReferenceQueue.
  5. Humongous Register, Humongous Reclaim: Parts related to large objects, which will be explained later.
  6. [Free CSet: 0.0 ms]:The time it takes to return the small heaps that have been released in the central recycling to be used for allocating new objects.

The diagram corresponding to this Young GC is as follows:

58726143.png

Concurrent Marking #

When the overall usage of heap memory reaches a certain threshold, concurrent marking is triggered. The default threshold is 45%, but it can also be set using the JVM parameter InitiatingHeapOccupancyPercent. Similar to CMS, G1’s concurrent marking consists of multiple phases, with some phases being completely concurrent and others pausing application threads.

Phase 1: Initial Mark #

You can see the pause for initial mark in the Evacuation Pause log, like this:

2019-12-23T01:45:41.568-0800: 1.144:
  [GC pause (G1 Humongous Allocation) (young) (initial-mark),
  0.0012116 secs]

Of course, the reason for triggering the GC could be due to large object allocations or other reasons such as to-space exhausted or default GC reasons.

Phase 2: Root Region Scan #

This phase marks all live objects reachable from “root regions”.

The root regions include non-empty regions as well as regions that need to be collected during the marking process.

Corresponding logs:

2019-12-23T01:45:41.569-0800: 1.145:
  [GC concurrent-root-region-scan-start]
2019-12-23T01:45:41.569-0800: 1.145:
  [GC concurrent-root-region-scan-end, 0.0000360 secs]

Phase 3: Concurrent Mark #

Corresponding logs:

2019-12-23T01:45:41.569-0800: 1.145:
  [GC concurrent-mark-start]
2019-12-23T01:45:41.571-0800: 1.146:
  [GC concurrent-mark-end, 0.0015209 secs]

Phase 4: Remark #

Corresponding logs:

2019-12-23T01:45:41.571-0800: 1.146:
  [GC remark
    2019-12-23T01:45:41.571-0800: 1.147:
      [Finalize Marking, 0.0002456 secs]
    2019-12-23T01:45:41.571-0800: 1.147:
      [GC ref-proc, 0.0000504 secs]
    2019-12-23T01:45:41.571-0800: 1.147:
      [Unloading, 0.0007297 secs], 0.0021658 secs]
 [Times: user=0.01 sys=0.00, real=0.00 secs]

Phase 5: Cleanup #

This final cleanup phase prepares for the upcoming evacuation phase, by counting all live objects in the small heap regions and sorting them to improve the efficiency of GC. This phase also performs all necessary housekeeping activities for the next marking cycle - maintaining the internal state of the concurrent marking.

It is important to note that all small heap regions that do not contain any live objects are reclaimed in this phase. Some tasks are concurrent, such as the reclamation of empty heap regions and the calculation of the majority of live object ratios. This phase also requires a short STW pause to complete its work without interference from the application threads.

The corresponding log for this STW pause is as follows:

2019-12-23T01:45:41.573-0800: 1.149:
 [GC cleanup 366M->366M(512M), 0.0006795 secs]
 [Times: user=0.00 sys=0.00, real=0.00 secs]

If it is found that some small heap regions only contain garbage, the log format may be slightly different, such as:

2019-12-23T21:26:42.411-0800: 0.689:
 [GC cleanup 247M->242M(512M), 0.0005349 secs]
 [Times: user=0.00 sys=0.00, real=0.00 secs]
2019-12-23T21:26:42.412-0800: 0.689:
 [GC concurrent-cleanup-start]
2019-12-23T21:26:42.412-0800: 0.689:
 [GC concurrent-cleanup-end, 0.0000134 secs]

If you don’t see the corresponding GC log after running the sample program, you can try running it multiple times. After all, GC and memory allocation are runtime dynamic, and each run may have some differences.

The size of arrays generated in the sample program and which objects are cached are all using random numbers, so the results may vary each time.

Please think about what method we have learned in the Java random number API to make the generated random numbers consistent each time.

If there are unfamiliar students, please search for: random number seed.

The marking cycle is generally only triggered when there is no live object in a region, and memory is not released in most cases.

The schematic diagram is shown below:

52452256.png

Evacuation Pause (mixed) #

After the concurrent marking phase is completed, G1 will perform a mixed collection, which not only cleans up the young generation, but also includes some old generation regions in the collection set.

The mixed mode evacuation pause does not always follow the concurrent marking phase.

Between the concurrent marking and the mixed evacuation pause, there may be multiple young mode evacuation pauses.

“Mixed mode” means that this GC event involves processing regions in both the young and old generations. This is also a feature of G1 and other incremental garbage collectors.

However, the latest garbage collectors like ZGC do not use generational algorithms. Of course, generational algorithms may still be implemented in the future, as performance can be improved through generation separation.

In the logs of the mixed mode, compared with the pure young generation mode, some interesting points can be observed:

2019-12-23T21:26:42.383-0800: 0.661:
 [GC pause (G1 Evacuation Pause) (mixed),0.0029192 secs]
   [Parallel Time: 2.2 ms,GC Workers: 8]
      ......
      [Update RS (ms): Min: 0.1,Avg: 0.2,Max: 0.3,Diff: 0.2,Sum: 1.4]
         [Processed Buffers: Min: 0,Avg: 1.8,Max: 3,Diff: 3,Sum: 14]
      [Scan RS (ms): Min: 0.0,Avg: 0.0,Max: 0.0,Diff: 0.0,Sum: 0.1]
      ......
   [Clear CT: 0.4 ms]
   [Other: 0.4 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 0.1 ms]
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 0.1 ms]
      [Free CSet: 0.1 ms]
   [Eden: 21.0M(21.0M)->0.0B(21.0M)
    Survivors: 4096.0K->4096.0K
    Heap: 337.7M(512.0M)->274.3M(512.0M)]
 [Times: user=0.01 sys=0.00,real=0.00 secs]

Simple Interpretation (some concepts and names, refer to chapter G1):

1. [Update RS (ms)]: Because Remembered Sets are processed concurrently, it is necessary to ensure that cards in the buffer are processed before the actual garbage collection. If there are a large number of cards, the workload of the GC concurrent threads may be high. This could be due to a large number of modified fields or limited CPU resources.
2. Processed Buffers: Indicates the number of local buffers processed by each worker thread.
3. Scan RS (ms): Shows the time taken to scan references from the RSet.
4. [Clear CT: 0.4 ms]: Indicates the time taken to clear cards in the card table. The clear operation simply removes the "dirty" state, which is used to identify whether a field has been updated, for use by the Remembered Sets.
5. [Redirty Cards: 0.1 ms]: Indicates the time taken to mark appropriate positions in the card table as "dirty". The positions deemed "appropriate" are determined by heap memory changes made by the GC itself, such as reference enqueuing.

Full GC (Allocation Failure):

G1 is an adaptive incremental garbage collector. Generally, a Full GC occurs only when there is severe memory shortage, such as insufficient heap space or insufficient to-space.

In the example program, increasing the number of cached objects can simulate a Full GC.

The example log is as follows:

2020-03-02T18:44:17.814-0800: 2.826: [Full GC (Allocation Failure) 403M->401M(512M), 0.0046647 secs] [Eden: 0.0B(25.0M)->0.0B(25.0M) Survivors: 0.0B->0.0B Heap: 403.6M(512.0M)->401.5M(512.0M)], [Metaspace: 2789K->2789K(1056768K)] [Times: user=0.00 sys=0.00, real=0.00 secs]


Because the heap memory space is small and the number of live objects is not large, the pause time for this Full GC is short.

The diagram below illustrates this Full GC:

![59111406.png](../images/1b6e2e60-6866-11ea-a490-d3e65769b9bf)

In situations where the heap memory is larger (8G+), if G1 performs a Full GC, the pause time may degrade and reach several tens of seconds or even more. The following image shows an example:

![5b03ee3d-1e0a-4375-a5f6-aab17f4d1184.jpg](../images/2bce1360-6866-11ea-9d64-851af22d0044)

From the OldGen section, it can be seen that 118 Full GCs took 31 minutes, with an average of 20 seconds per GC. Based on the image scale, a throughput of less than 30% can be inferred.

This scenario occurred during load testing when performing aggregation calculations in Flink using time windows. The main reason was that there were too many objects and insufficient heap memory. The issue was partially resolved by changing the object types to primitive data types. Increasing the heap memory to meet the requirements of batch processing/streaming computing also resolved the GC issue.

When a Full GC pause time is long-lasting, it is necessary to investigate and analyze whether GC configuration needs modification, or whether memory needs to be increased, or if the program's business logic needs to be changed. We will discuss G1 tuning further in the optimization section.

This concludes the analysis of G1 logs. In the next section, we will explore how to use visualization tools to view and analyze GC logs.