28 Discuss Your Gc Tuning Thoughts

GC（垃圾回收）是优化应用程序性能的重要环节之一。下面是我个人的GC调优思路：

监控GC信息：首先，需要了解应用程序当前的GC行为。通过监控GC日志、GC暂停时间、内存使用情况等指标，可以了解GC过程中的瓶颈，以及是否存在内存泄漏等问题。
选择合适的GC算法：根据应用程序的特点和需求，选择合适的GC算法。在JDK 7之前，Parallel GC和CMS GC是主要的选择。而在JDK 8以后，默认的选择是G1 GC，它能更好地适应各种场景，尤其是大内存应用。
调整堆大小：根据应用程序的内存需求，适当调整堆的大小。过小的堆会导致频繁的GC暂停，而过大的堆则可能造成内存浪费。
设置GC相关参数：根据实际情况，调整GC相关的参数。例如，通过调整GC线程数、堆区比例、GC触发机制等参数，可以更好地平衡吞吐量和延迟。
减少对象创建：尽量避免频繁地创建临时对象，特别是大对象。通过对象池、重用对象等方式，可以减少GC的压力。
手动触发GC：在一些特定的场景下，手动触发GC可以更好地控制GC的行为。例如，在一次重要的业务操作完成后，可以手动触发GC，尽量减少后续操作的影响。
优化代码逻辑：良好的代码逻辑和设计，可以减少不必要的对象创建和使用，从而减轻GC的负担。例如，使用局部变量代替成员变量、合理使用缓存等。

总的来说，GC调优是一个综合考虑各种因素的过程，需要根据实际情况选择合适的策略和方法。同时，注意不要过度调优，只有在真正需要的情况下才进行调整，以免引入其他问题。

Typical Answer #

When it comes to tuning, it is definitely something specific to a particular scenario and purpose. When it comes to tuning GC, the first thing to do is to clarify the goal of tuning. From a performance perspective, there are usually three aspects to focus on: memory usage (footprint), latency, and throughput. In most cases, tuning will focus on one or two of these aspects, and it is rare for all three to be considered. Of course, in addition to the usual three aspects mentioned above, other GC-related scenarios may also need to be considered. For example, an Out Of Memory (OOM) situation may also be related to inappropriate GC-related parameters, or the need for application startup speed may also be a consideration for GC.

The basic approach to tuning can be summarized as follows:

Understand the application’s requirements and problems, and determine the tuning goals. For example, let’s say we have developed an application service and have found occasional performance fluctuations with long service pauses. Evaluate the acceptable response time from users and the business volume, and simplify the goal to aiming for GC pauses to be kept within 200ms and ensuring a certain level of throughput.
Understand the status of the JVM and GC, locate the specific problems, and determine whether GC tuning is really necessary. There are many specific methods, such as using tools like jstat to view GC-related statuses, enabling GC logs, or utilizing diagnostic tools provided by the operating system. For example, by tracking GC logs, you can determine if GC has had a long pause at a specific time, which has caused the application to respond slowly.
Here it is necessary to consider whether the selected GC type is suitable for our application’s characteristics. If it is, then the specific problem lies in where, such as excessively long Minor GC or abnormal pauses during Mixed GC. If not, consider switching to other types, such as CMS and G1, which are options that focus more on low latency.
Determine the specific parameters or software and hardware configurations to adjust through analysis.
Verify whether the tuning goals have been achieved. If the goals have been met, consider ending the tuning process; otherwise, repeat the process of analysis, adjustment, and verification.

Analysis of Topics #

Today’s test on GC tuning is a fundamental aspect of JVM tuning. Many JVM tuning needs ultimately come down to GC tuning or are related to it. I have provided a common train of thought.

To truly quickly locate and solve specific problems, it still requires mastery of JVM and GC knowledge, as well as a summary of practical tuning experience, sometimes even intuitive judgments derived from accumulated experience. The interviewer may continue to ask about real problems encountered in projects. If you can clearly and concisely introduce the context, and then express the diagnostic ideas and tuning practices, it will be a great bonus.

Although this column cannot provide specific project experience, it can help you master common tuning ideas and methods, which is helpful both in interviews and in actual work. In addition, I will also supplement from the following perspectives:

In the previous lecture, I already mentioned that when it comes to specific GC types, the actual performance of the JVM is more complex. Currently, G1 has become the default choice for new versions of JDK, so it is worth understanding it in depth.
Because G1 GC is constantly evolving, I will focus on its evolutionary changes, especially changes related to behavior and configuration. And because of the rapid development of JVM, even aspects such as collecting GC logs have undergone significant improvements. This is why the exercise I gave you in the previous lecture was about log-related options. I believe you will be surprised after reading the explanation.
From the perspective of GC tuning practices, understand the tuning ideas and methods for common problems.

Knowledge Expansion #

First, let’s get an overall understanding of the internal structure and main mechanisms of G1 GC.

From the perspective of memory regions, G1 also has the concept of generations, but it is very different from the memory structure I introduced earlier. Its internal structure is composed of a chessboard-like arrangement of regions, as shown in the diagram below.

The size of each region is consistent and is a power of 2 between 1M and 32M bytes. JVM will try to divide the heap into approximately 2048 regions of equal size. This number can be manually adjusted and G1 will also automatically adjust it based on the heap size.

In the implementation of G1, generations are a logical concept. Specifically, some regions are used as Eden, some as Survivor, and apart from the expected Old region, G1 classifies objects (usually byte or char arrays) that are larger than 50% of a region’s size as Humongous objects, and places them in the corresponding region. Logically, Humongous regions are considered part of the old generation because copying such large objects is an expensive operation that is not suitable for the copying algorithm of the young generation GC.

You can think about the side effects of the region design.

For example, it is difficult to ensure consistency between the region size and large objects, which leads to wasted space. As you can see in my diagram, some regions are colored Humongous, but are not labeled with names. This is to indicate that very large objects can occupy more than one region. Furthermore, if the regions are too small, it becomes more difficult to find contiguous space when allocating large objects, which has been a long-standing issue. You can refer to the discussion in the OpenJDK community for more information. This can also be seen as a bug in the JVM, although the solution is very simple - just set a larger region size using the following parameter:

-XX:G1HeapRegionSize=<N, for example 16>M

From the perspective of GC algorithms, G1 uses a composite algorithm, which can be simplified as follows:

In the young generation, G1 still uses parallel copying algorithm, so it still involves Stop-The-World pauses.
In the old generation, most of the time it performs concurrent marking, and compaction is done concurrently with the young generation GC, but it is not done as a whole, but incrementally.

As I mentioned in the previous lesson, conventionally people like to call young generation GC “Minor GC” and old generation GC “Major GC”, distinguishing them from the whole heap “Full GC”. But in modern GCs, this concept is no longer accurate. For G1, the following holds:

Minor GC still exists, although the specific process is different and involves handling of Remembered Set and other related processing.
Old generation collection relies on Mixed GC. After the concurrent marking ends, the JVM has enough information for garbage collection. Mixed GC not only cleans up the Eden and Survivor regions, but also cleans up parts of the Old region. You can specify the triggering threshold and the maximum percentage of regions to be included in a Mixed GC using the following parameters:

–XX:G1MixedGCLiveThresholdPercent
–XX:G1OldCSetRegionThresholdPercent

From the perspective of G1’s internal operation, the following diagram describes the state transitions during normal operation of G1. Of course, in case of events like evacuation failure, a Full GC will be triggered.

There are many concepts related to G1, and one major focus is the Remembered Set, which is used to record and maintain the reference relationships between regions. Why is this necessary? Consider that the young generation GC uses a copying algorithm, which means that objects are “copied” from Eden or Survivor to the “to” area, effectively creating a new object. In this process, it is essential to ensure that the cross-region references from the old generation to the young generation remain valid. The following diagram illustrates the related design.

Many of G1’s overheads are derived from the Remembered Set. For example, it typically occupies around 20% or more of the heap size, which is a significant proportion. Additionally, during object copying, the speed is affected by the need to scan and modify the Card Table’s information, which in turn affects the pause time.

There is a wealth of information available on the internals of G1, and I won’t repeat it here. If you want to learn more about internal structures and algorithms, I recommend referring to some specific introductions, such as this article. In terms of books, I recommend “Java Performance Companion” by Charlie Hunt and others.

Next, I will introduce some changes in G1’s behavior that you may not be familiar with yet. These changes partly address some of the concerns mentioned in other columns, such as the issue of delayed type unloading.

As mentioned above, the allocation and recycling of Humongous objects are a source of many memory issues. In the new version of G1, Humongous object recycling has adopted a more aggressive strategy. Humongous regions, as part of the old generation, are usually considered for recycling only after concurrent marking.
We know that G1 records object references between old generation regions. Since the number of Humongous objects is limited, it is possible to quickly determine if there are old generation objects referencing them. If there are none, the only possibility that prevents them from being reclaimed is if there are objects in the young generation referencing them. However, this information can be obtained during Young GC, so Humongous objects can be reclaimed during that phase without waiting for concurrent marking like other old generation objects.
In column 5, I mentioned the string deduplication feature introduced in 8u20. During the garbage collection process, G1 puts newly created string objects into a queue and then performs deduplication on strings with consistent internal data (char arrays, byte arrays since JDK 9) in a concurrent manner (without STW) after Young GC, meaning that these strings now reference the same array. You can activate this feature with the following option: -XX:+UseStringDeduplication Note that while this deduplication can save a significant amount of memory space, this concurrent operation consumes some CPU resources and may slightly slow down Young GC.
Type unloading has been a long-standing problem for some Java applications. In column 25, I introduced that a class can only be unloaded when its loading custom class loader is reclaimed. Although the metadata area has improved the situation since replacing the permanent generation, problems may still arise. Is there any improvement in type unloading in G1? Many sources mention that G1 only performs type unloading during Full GC, which is clearly not what we want. You can enable the following option to see type unloading: -XX:+TraceClassUnloading Fortunately, modern G1 is no longer like that. Since 8u40, G1 has added and enabled the following option by default: -XX:+TraceClassUnloading This means that the JVM unloads types after the concurrent marking phase.
We know that reclaiming objects in the old generation usually requires waiting for concurrent marking to complete. This means that if concurrent marking ends too late and the heap becomes full but the old generation space has not completed reclamation, a Full GC will be triggered. Therefore, the timing of triggering concurrent marking is crucial. In the early stages of G1 tuning, the following parameter is usually set, but it is difficult to give a universally applicable value and often needs to be adjusted based on actual runtime results: -XX:InitiatingHeapOccupancyPercent In the G1 implementation after JDK 9, there is less need for such adjustments as JVM only uses this parameter as an initial value. Sampling and collecting statistical data at runtime, JVM dynamically adjusts the start time of concurrent marking. The corresponding JVM parameter is as follows, and it is enabled by default:

-XX:+G1UseAdaptiveIHOP

In the existing information, most of them point out that G1’s Full GC is the worst single-threaded serial GC. However, if you use the latest JDK, you will find that Full GC is also parallelized and performs better in general scenarios than Parallel GC’s Full GC implementation.

Of course, there are many other changes, such as faster Card Table scanning, but I won’t go into detail here because they do not bring behavioral changes and do not fundamentally affect tuning choices.

Earlier, I introduced G1’s internal mechanism and interspersed it with some tuning advice. Now, I will provide some overall tuning recommendations.

First, I recommend upgrading to a newer version of JDK. As you can see from the improvements mentioned earlier, many issues that people often discuss can be resolved by upgrading JDK.

Second, master the ways to collect GC tuning information. Having comprehensive, detailed, and accurate information is the basis of various tuning, not just GC tuning. Let’s take a look at opening GC logs. It seems like a simple thing, but are you sure you truly understand it?

In addition to the two commonly used options,

-XX:+PrintGCDetails -XX:+PrintGCDateStamps

there are some very useful log options that many specific diagnostic issues rely on:

-XX:+PrintAdaptiveSizePolicy // Print G1 Ergonomics related information

We know that some behaviors inside GC are triggered adaptively. By using PrintAdaptiveSizePolicy, we can understand why the JVM made some actions that we may not want to happen. For example, a basic recommendation for G1 tuning is to avoid a large number of Humongous object allocations. If Ergonomics information indicates that this has happened, we can consider either increasing the heap size or directly increasing the region size.

If there is suspicion of delayed reference cleanup, the following option can be activated to understand where the accumulation occurred:

-XX:+PrintReferenceGC

In addition, I recommend enabling parallel reference processing with the following option:

-XX:+ParallelRefProcEnabled

One thing to note is that in JDK 9, JVM and GC log structures have been reconstructed. In fact, PrintGCDetails mentioned earlier has already been marked as deprecated, and PrintGCDateStamps has been removed. Specifying them will cause the JVM to fail to start. You can use the following command to query new configuration parameters.

Finally, let’s look at some general practices. Once you understand the internal structure and mechanism I introduced earlier, many conclusions will become clear. For example:

If you find that Young GC takes a long time, it is likely because the young generation is too large. You can consider reducing the minimum ratio of the young generation.

-XX:G1NewSizePercent

Reducing its maximum value also helps reduce Young GC latency.

If Mixed GC has a long delay, what should we do?

Remember what I said earlier, some old regions are included in Mixed GC. Reducing the number of regions processed at a time is a direct choice.

I have already mentioned G1OldCSetRegionThresholdPercent to control its maximum value. You can also use the following parameter to increase the number of Mixed GCs. The current default value is 8. Increasing the number of Mixed GCs means that the number of regions included each time decreases.

-XX:G1MixedGCCountTarget

Today’s content can be considered as a starting point. You can refer to the G1 Tuning Guide and other resources for more details. It cannot be summed up in a few sentences. It is important to note that you should also avoid excessive tuning. G1 is very friendly to large heaps, and its operation mechanism also requires some wasted space. Sometimes, giving the heap a little more space is more practical than strict tuning.

Today, I have outlined the basic ideas for GC tuning and explained the G1 internal structure and the latest behavioral changes in detail. Overall, G1 tuning is relatively simple and intuitive because you can directly set pause time goals, and it introduces various intelligent adaptive mechanisms. I hope all these efforts can make you more efficient in daily application development.

Practice #

Have you grasped the topic we discussed today? Today’s question for contemplation is: what are the reasons for Full GC occurring, and what are the ways to locate it?

Please share your thoughts on this question in the comments section. I will select the comments that demonstrate careful thinking and reward you with a learning coupon. You are welcome to engage in a discussion with me.

Are your friends also preparing for interviews? You can “share with friends” and share today’s question with them. Perhaps you can help them.