29 Gc Difficult Situation Problem Investigation and Analysis Part 1

29 GC Difficult Situation Problem Investigation and Analysis Part 1 #

This chapter introduces typical situations that can cause GC performance issues. The related examples are derived from production environments, and have been simplified to some extent for demonstration purposes.

Terminology: Allocation Rate should be translated as “分配速率” (allocation speed) rather than “分配率” (allocation rate). This is because it represents the amount of memory allocated per unit of time, not a percentage. Similarly, Promotion Rate should be translated as “提升速率” (promotion speed).

High Allocation Rate #

Allocation Rate represents the amount of memory allocated per unit of time. It is typically measured in MB/sec, but other units such as PB/year can also be used. A high allocation rate can significantly impact program performance and may result in substantial GC overhead in the JVM.

How to Measure Allocation Rate? #

Allocation Rate can be calculated from the GC log by specifying the following JVM options: -XX:+PrintGCDetails -XX:+PrintGCTimeStamps. The GC log shown below can be used to calculate allocation rate:

    0.291: [GC (Allocation Failure)
            [PSYoungGen: 33280K->5088K(38400K)]
            33280K->24360K(125952K), 0.0365286 secs]
        [Times: user=0.11 sys=0.02, real=0.04 secs]
    0.446: [GC (Allocation Failure)
            [PSYoungGen: 38368K->5120K(71680K)]
            57640K->46240K(159232K), 0.0456796 secs]
        [Times: user=0.15 sys=0.02, real=0.04 secs]
    0.829: [GC (Allocation Failure)
            [PSYoungGen: 71680K->5120K(71680K)]
            112800K->81912K(159232K), 0.0861795 secs]
        [Times: user=0.23 sys=0.03, real=0.09 secs]

To calculate the allocation rate, we need to compare the usage of the young generation after the previous garbage collection with the usage before the next GC. The difference in usage divided by the time interval will give us the allocation rate. Based on the log above, we can calculate the following information:

  • 291ms after JVM startup, 33280KB of objects were created. After the first Minor GC, there were 5088KB of surviving objects in the young generation.
  • At 446ms after startup, the usage of the young generation increased to 38368KB, triggering the second GC. After the GC, the usage decreased to 5120KB.
  • At 829ms after startup, the young generation usage was 71680KB, which was reduced to 5120KB after the GC.

The allocation rate can be calculated using the young generation usage, as shown in the table below:

Event Time Young before Young after Allocated during Allocation rate
1st GC 291ms 33,280KB 5,088KB 33,280KB 114MB/sec
2nd GC 446ms 38,368KB 5,120KB 33,280KB 215MB/sec
3rd GC 829ms 71,680KB 5,120KB 66,560KB 174MB/sec
Total 829ms N/A N/A 133,120KB 161MB/sec

From this information, we can determine that the program had an allocation rate of 16MB/sec during this period.

Significance of Allocation Rate #

Changes in allocation rate can increase or decrease the frequency of GC pauses, which in turn affects throughput. However, only Minor GCs are directly affected by allocation rate, while the frequency and duration of Full GCs are generally not affected by allocation rate (can you think of why?). Instead, they are influenced by the Promotion Rate, which will be discussed later.

For now, let’s focus on Minor GC pauses and examine the three memory pools in the young generation. Since objects are allocated in the Eden space, let’s look at the relationship between Eden space size and allocation rate. Is it possible to reduce the number of Minor GC pauses by increasing the size of the Eden space, thereby allowing the program to maintain a higher allocation rate?

In our experiments, when running the same program with different Eden space sizes using the -XX:NewSize, -XX:MaxNewSize, and -XX:SurvivorRatio parameters, we observed the following:

  • When the Eden space was set to 100MB, the allocation rate was below 100MB/sec.
  • Increasing the Eden space to 1GB resulted in a higher allocation rate of approximately 200MB/sec.

Why does this happen?

Reducing GC pauses is equivalent to reducing thread pauses, allowing more work to be done and more objects to be created. Therefore, a higher allocation rate is preferable for the same application.

Before arriving at the conclusion that a larger Eden space is better, we should also note that the allocation rate may or may not have an impact on actual throughput.

In summary, there is a certain relationship between throughput and allocation rate, as allocation rate can affect Minor GC pauses. However, the overall impact on throughput needs to take into account factors like pauses caused by Full GC.

Example #

Refer to the Demo program. Let’s assume that the system is connected to an external numeric sensor. The application continuously retrieves values from the sensor through a dedicated thread, and other threads call the processSensorValue() method to perform certain operations with the sensor values.

public class BoxingFailure {
  private static volatile Double sensorValue;

  private static void readSensor() {
    while(true) sensorValue = Math.random();
  }

  private static void processSensorValue(Double value) {
    if(value != null) {
      //...
    }
  }
}

As the class name suggests, this demo simulates boxing. To handle null value checks, it uses the wrapper type Double. The program performs calculations based on the latest sensor value. Retrieving a value from the sensor is a time-consuming operation, so an asynchronous approach is used: one thread continuously retrieves new values, while the calculation thread uses the latest stored value directly, avoiding synchronization delays.

During the execution of the demo program, the high allocation rate is affecting the garbage collector (GC). Next, we will identify the issue and provide a solution.

Impact of High Allocation Rates on the JVM #

Firstly, we should check if the program’s throughput is decreasing. If too many temporary objects are created, the number of Minor GCs will increase. If the concurrency is high, GC can severely impact throughput.

In such cases, the GC log will look like this, as generated by the example program:

JVM startup parameters: -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xmx32m

2.808: [GC (Allocation Failure)
        [PSYoungGen: 9760K->32K(10240K)], 0.0003076 secs]
2.819: [GC (Allocation Failure)
        [PSYoungGen: 9760K->32K(10240K)], 0.0003079 secs]
2.830: [GC (Allocation Failure)
        [PSYoungGen: 9760K->32K(10240K)], 0.0002968 secs]
2.842: [GC (Allocation Failure)
        [PSYoungGen: 9760K->32K(10240K)], 0.0003374 secs]
2.853: [GC (Allocation Failure)
        [PSYoungGen: 9760K->32K(10240K)], 0.0004672 secs]
2.864: [GC (Allocation Failure)
        [PSYoungGen: 9760K->32K(10240K)], 0.0003371 secs]
2.875: [GC (Allocation Failure)
        [PSYoungGen: 9760K->32K(10240K)], 0.0003214 secs]
2.886: [GC (Allocation Failure)
        [PSYoungGen: 9760K->32K(10240K)], 0.0003374 secs]
2.896: [GC (Allocation Failure)
        [PSYoungGen: 9760K->32K(10240K)], 0.0003588 secs]

Clearly, the frequency of Minor GC is too high. This indicates the creation of a large number of objects. Additionally, the young generation usage after GC is very low, and there is no Full GC occurring. All these signs indicate that GC is severely impacting throughput.

Solution #

In some cases, increasing the size of the young generation can reduce the impact of high allocation rates. Increasing the young generation space will not slow down the allocation rate but will reduce the frequency of GC. If only a small number of objects survive each GC, the pause time during Minor GC will not significantly increase.

When running the example program, increase the heap memory size (which also increases the size of the young generation) with the JVM parameter -Xmx64m.

2.808: [GC (Allocation Failure)
        [PSYoungGen: 20512K->32K(20992K)], 0.0003748 secs]
2.831: [GC (Allocation Failure)
        [PSYoungGen: 20512K->32K(20992K)], 0.0004538 secs]
2.855: [GC (Allocation Failure)
        [PSYoungGen: 20512K->32K(20992K)], 0.0003355 secs]
2.879: [GC (Allocation Failure)
        [PSYoungGen: 20512K->32K(20992K)], 0.0005592 secs]

However, sometimes increasing the heap size may not solve the problem. With the knowledge we have learned so far, we can use the allocation analyzer to identify the locations where most garbage is generated. In fact, in this example, 99% of the objects belong to the Double wrapper class and are created in the readSensor method.

The simplest optimization is to replace the created Double objects with the primitive type double. For checking null values, we can use Double.NaN.

Since primitive types are not considered objects, they do not generate garbage and do not cause GC events.

After the optimization, instead of allocating new objects on the heap, we directly overwrite an attribute field. By making a simple modification to the example program (see diff), the GC pauses are virtually eliminated.

Sometimes the JVM is smart enough to use escape analysis to avoid excessive allocation.

In simple terms, the JIT compiler can analyze and determine that some objects created in a method will never “escape” the scope of that method. In such cases, there is no need to allocate these objects on the heap, which means no garbage is generated. This is one of the optimization methods used by the JIT compiler: eliminating heap memory allocation (see benchmark).

Premature Promotion #

The promotion rate measures the amount of data promoted from the young generation to the old generation per unit of time. It is generally measured in MB/sec, similar to the allocation rate.

The JVM promotes objects that have been alive for a long time from the young generation to the old generation. According to the generational hypothesis, there may be a situation where not only long-lived objects but also short-lived objects exist in the old generation. This is called premature promotion: objects are promoted to the old generation before their lifetimes become long enough.

Major GC is not designed for frequent collections, but it now has to clean up these short-lived objects, which can lead to excessively long GC pauses. This can significantly affect the system’s throughput.

How to Measure the Promotion Rate #

You can specify JVM arguments -XX:+PrintGCDetails -XX:+PrintGCTimeStamps to measure the promotion rate through the GC logs. The GC log records the following information:

    0.291: [GC (Allocation Failure)
            [PSYoungGen: 33280K->5088K(38400K)]
            33280K->24360K(125952K), 0.0365286 secs]
        [Times: user=0.11 sys=0.02, real=0.04 secs]
    0.446: [GC (Allocation Failure)
            [PSYoungGen: 38368K->5120K(71680K)]
            57640K->46240K(159232K), 0.0456796 secs]
        [Times: user=0.15 sys=0.02, real=0.04 secs]
    0.829: [GC (Allocation Failure)
            [PSYoungGen: 71680K->5120K(71680K)]
            112800K->81912K(159232K), 0.0861795 secs]
        [Times: user=0.23 sys=0.03, real=0.09 secs]

From the logs above, we can obtain the usage of the young generation and heap memory before and after GC, allowing us to calculate the usage of the old generation. The information in the GC log can be described as:

Event Time Young Decreased Total Decreased Promoted Promotion Rate
(event) (duration) (young decreased) (total decreased) (promotion) (promotion rate)
1st GC 291ms 28192K 8920K 19272K 66.2 MB/sec
2nd GC 446ms 33248K 11400K 21848K 140.95 MB/sec
3rd GC 829ms 66560K 30888K 35672K 93.14 MB/sec
Total 829ms 76792K 92.63 MB/sec

Based on this information, we can calculate the promotion rate for the observation period: the average promotion rate is 92MB/sec, with a peak of 140.95MB/sec.

Please note that the promotion rate can only be calculated based on minor GC. The logs of Full GC cannot be used to calculate the promotion rate, as Major GC will clean up a portion of the old generation.

Significance of the Promotion Rate #

Similar to the allocation rate, the promotion rate also affects the frequency of GC pauses. However, while the allocation rate primarily affects minor GC, the promotion rate affects the frequency of major GC. With a large number of object promotions, the old generation will be filled quickly, resulting in a higher frequency of Major GC events.

how-java-garbage-collection-works

As mentioned earlier, Full GC usually takes more time because it needs to process more objects and perform additional complex processes such as defragmentation.

Example #

Let’s consider an example of premature promotion. The program creates/obtains a large number of objects/data and stores them in a collection, performing batch processing once the number reaches a certain point:

    public class PrematurePromotion {
  
       private static final Collection<byte[]> accumulatedChunks
                    = new ArrayList<>();
private static void onNewChunk(byte[] bytes) {
    accumulatedChunks.add(bytes);

    if(accumulatedChunks.size() > MAX_CHUNKS) {
        processBatch(accumulatedChunks);
        accumulatedChunks.clear();
    }
}
}

This [Demo program](https://github.com/gvsmirnov/java-perv/blob/master/labs-8/src/main/java/ru/gvsmirnov/perv/labs/gc/PrematurePromotion.java) is affected by premature promotion. The following will verify and provide a solution.

#### **Effects of Premature Promotion**

Generally, the symptoms of premature promotion are:

  * Frequent Full GCs in a short period of time
  * After every Full GC, the usage of the old generation is low, around 10-20% or below
  * Promotion rate close to allocation rate

To demonstrate this, we need to use special means to promote objects to the old generation at a younger age than the default. Specify the GC parameters `-Xmx24m -XX:NewSize=16m -XX:MaxTenuringThreshold=1`, and after running the program, the following GC logs can be seen:

```console
2.176: [Full GC (Ergonomics)
        [PSYoungGen: 9216K->0K(10752K)]
        [ParOldGen: 10020K->9042K(12288K)]
        19236K->9042K(23040K), 0.0036840 secs]
2.394: [Full GC (Ergonomics)
        [PSYoungGen: 9216K->0K(10752K)]
        [ParOldGen: 9042K->8064K(12288K)]
        18258K->8064K(23040K), 0.0032855 secs]
2.611: [Full GC (Ergonomics)
        [PSYoungGen: 9216K->0K(10752K)]
        [ParOldGen: 8064K->7085K(12288K)]
        17280K->7085K(23040K), 0.0031675 secs]
2.817: [Full GC (Ergonomics)
        [PSYoungGen: 9216K->0K(10752K)]
        [ParOldGen: 7085K->6107K(12288K)]
        16301K->6107K(23040K), 0.0030652 secs]

At first glance, it doesn’t seem to be a case of premature promotion, as the usage of the old generation seems to decrease after each GC. But if you think about it, if there were no object promotions or the promotion rate was very low, there wouldn’t be so many Full GCs.

A simple explanation of the GC behavior here is that many objects are promoted to the old generation, while at the same time many objects in the old generation are being reclaimed, giving the illusion of a decrease in old generation usage. But the fact is that a large number of objects are constantly being promoted to the old generation and triggering Full GCs.

Solution #

In simple terms, to solve this type of issue, we need to make sure the young generation can hold the temporary data. There are two simple methods:

The first is to increase the size of the young generation by setting JVM startup parameters, similar to this: -Xmx64m -XX:NewSize=32m. When the program is executed, the number of Full GCs will naturally be greatly reduced, and it will only affect the duration of Minor GCs:

2.251: [GC (Allocation Failure)
        [PSYoungGen: 28672K->3872K(28672K)]
        37126K->12358K(61440K), 0.0008543 secs]
2.776: [GC (Allocation Failure)
        [PSYoungGen: 28448K->4096K(28672K)]
        36934K->16974K(61440K), 0.0033022 secs]

The second method is to reduce the quantity of each batch process, which will also yield similar results.

As for which solution to choose, it depends on the business requirements.

In some cases, the business logic does not allow for a reduction in the quantity of batch processing, so the only option is to increase the heap memory or re-specify the size of the young generation.

If neither is feasible, the only option is to optimize the data structure and reduce memory consumption. But the overall goal remains the same - to ensure that temporary data can fit in the young generation.