15 Java11 Zgc and Java12 Shenandoah Introduction the Bamboo That Renews Itself Daily Is Ever New

15 Java11 ZGC and Java12 Shenandoah Introduction- The Bamboo that Renews Itself Daily is Ever New #

With the rapid development of the Internet and the iterative updates of computer hardware, more and more business systems are using large amounts of memory. Moreover, these real-time online businesses are sensitive to response time. For example, in payment businesses that require real-time response messages, if the GC pause time of the JVM reaches 10 seconds, it will obviously exhaust the patience of customers.

There are also some systems that are particularly sensitive to latency, generally requiring a response time within 100ms. For example, high-frequency trading systems have some computational time-consuming tasks inherent to their business. If the GC pause time exceeds half (>50ms), it is likely to invalidate certain trading strategies, thereby failing to achieve the specified performance indicators.

In this context, the resources consumed by GC (such as CPU and memory) are relatively less important, and a slightly lower throughput is acceptable. This is because in these types of systems, hardware resources are generally redundant, and single-machine throughput can be limited to a certain range through frequency limitation, routing, and clustering measures. In other words, low latency is the core non-functional requirement of these systems.

How to make the system run stably for a long time in high concurrency, high throughput, and large memory (such as heap memory 64/128G+), and reduce the GC pause delay to the level of 10ms has become a very worth thinking about problem and an issue that the industry urgently needs to solve.

Pauseless GC Overview #

As early as 2005, three engineers from Azul Systems proposed a great solution in their paper “The Pauseless GC Algorithm”, which introduced the Pauseless GC design. They found that the secret to low latency mainly lies in two aspects:

  • Use read barriers
  • Use incremental concurrent garbage collection

After undergoing more than 10 years of research and development, JDK 11 officially introduced the ZGC garbage collector, which is basically implemented according to the algorithm and ideas proposed in this paper. Of course, the Shenandoah GC (pronounced “Shenandoah”) introduced in JDK 12 also has similar design ideas.

Previous implementations of various GC algorithms forcibly added “write barriers” to the code executed by business threads to control modifications to the heap memory and also to track references across memory regions. This implementation method allows generational/partitioned GC algorithms to have excellent performance and is widely used in various product-level JVMs. In other words, “read barriers” were rarely used in production environments before, mainly because the theoretical research and implementation were immature and had no advantages.

A good GC algorithm certainly needs to ensure that the speed of memory cleaning is faster than the speed of memory allocation. In addition to this, Pauseless GC does not specify which stage must be completed quickly. Each stage does not need to compete with business threads for CPU resources, and no stage needs to be completed before the subsequent business operations.

The Pauseless GC algorithm mainly consists of three stages: marking, relocation, and remapping. Each stage is completely parallel and executed concurrently with business threads.

JDK 11 Download and Installation #

JDK 11 is a long-term support (LTS) version, which is the long-term maintenance version after JDK 8, and can be downloaded and installed from the official website.

Oracle official website:

https://www.oracle.com/technetwork/java/javase/downloads/index.html

It can also be directly downloaded from my Baidu Cloud Drive, link:

https://pan.baidu.com/s/1SwEcrPI3srrwEKtb-y5dfQ#list/path=%2F

After installation, configure the Java_HOME and PATH environment variables according to the solution provided in the first lesson, then verify the successful installation:

$ java -version

java version "11.0.5" 2019-10-15 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.5+10-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.5+10-LTS, mixed mode)

If there is similar output, the installation is successful and it can be used.

During development, JDK 11 also provides some convenient improvements. For example, in previous versions of JDK, if you want to compile and execute Java files, you need two steps:

$ javac Hello.java   // Compile to class format first

$ java Hello               // Execute the class

And in JDK 11, only one step is needed:

$ java Hello.java  // Automatically compiles Java source files detected with file extension

Introduction to ZGC: Focus on Low-Latency Garbage Collector #

JDK 11 inherits many excellent features from JDK 9 and JDK 10, such as modularization introduced in JDK 9 and jhsdb debugging tool, etc.

If you have to choose the most exciting feature in JDK 11, it would be none other than ZGC.

ZGC, which stands for Z Garbage Collector (Z meaning Zero), is a low-pause, high-concurrency, region-based, non-generational incremental compacting garbage collector. The average GC pause time is less than 2 milliseconds, and the worst-case pause time does not exceed 10 milliseconds.

Note:

ZGC garbage collector is supported starting from JDK 11, but as of now (Feb 2020), it is only supported on Linux operating system on x64 platform.

ZGC garbage collector can be used on JDK 11 or later versions on Linux x64.

As I looked into the progress of the porting version, I found that the development for macOS system has been completed, but it has not been integrated into JDK yet. According to the official plan, it will be integrated in JDK 14 (it is also not included in the early build versions of current daily JDK 14).

JDK 13 was released in September 2019, following the biannual release schedule, JDK 14 is expected to be released in March 2020.

If ZGC is used with JDK 11 installed on macOS, an error will occur:

$ java -XX:+UnlockExperimentalVMOptions -XX:+UseZGC Hello

Error occurred during initialization of VM
Option -XX:+UseZGC not supported

On Linux system, after JDK 11 is installed, ZGC can be enabled using the following parameters:

-XX:+UnlockExperimentalVMOptions -XX:+UseZGC -Xmx16g

Hmm, the size of the heap memory is also specified here (this parameter is quite important). The parameter -XX:+UnlockExperimentalVMOptions unlocks experimental VM options, as the literal meaning suggests.

Introduction to ZGC Features #

The main features of ZGC include:

  • Maximum pause time of GC does not exceed 10ms
  • Support for a wide range of heap memory sizes, from a few hundred MB to super large heap memory of 4TB (increased to 16TB in JDK 13)
  • Compared to G1, application throughput decreases by no more than 15%
  • Currently only supported on Linux/x64 platform, expected to support macOS and Windows systems after JDK 14

Some places mention “GC pause” while others mention “GC pause time”. Actually, they mean the same thing, but different terms are used for fluent expression. To distinguish more precisely, pause refers to the pause of business threads, while pause time refers to pauses at the application level.

The official documentation states that the pause time is below 10ms, but this value is a very conservative estimate. According to the benchmark test (see the PDF link in the reference material), in a large heap of 128G, the maximum pause time is only 1.68ms, much lower than 10ms. Compared with the G1 algorithm, the improvement is very significant.

Please refer to the following graph:

9324cf7d-ab45-4620-9661-2035e3f1b3d2.png

The graph on the left is a linear coordinate system, and the graph on the right is an exponential coordinate system.

As can be seen, whether it is the average, 95th percentile, 99th percentile, or maximum pause time, ZGC outperforms the G1 and Parallel GC algorithms.

According to our monitoring data in the production environment (16G~64G heap memory), each pause does not exceed 3ms.

For example, the following graph shows monitoring information for a low-latency gateway system. In an environment with tens of GB of heap memory, ZGC performs without any pressure and the pause time is very stable.

68469069.png

Modern GC algorithms like G1 and ZGC generally do not trigger Full GC as long as there is enough idle heap memory.

Therefore, many times, adding memory is the most effective solution as long as conditions permit.

Since low latency is the key point of ZGC, and the key to low latency in JVM is the pause time of GC, let’s take a look at the methods to reduce GC pause time:

  • Use multi-threaded “parallel” heap memory cleaning to fully utilize the resources of multi-core CPUs;
  • Run GC tasks in “stages” to distribute the pause time;
  • Process in an “incremental” manner, with each GC only processing a portion of the heap memory (small heap blocks, regions);
  • Execute GC concurrently with business threads, such as adding concurrent marking and concurrent sweeping stages, so that the pause time is controlled within a very short range (currently still requiring a small amount of STW pause, such as scanning root objects, final marking stages);
  • Do not perform heap memory compaction at all, such as the GC used in Golang (side note).

In the previous lesson, we introduced the basic GC algorithms. It can be seen that the “Mark and Sweep” algorithm does not perform memory compaction, so it will cause memory fragmentation. The “Copy” and “Mark and Compact” algorithms perform memory compaction to eliminate memory fragmentation, but in order to keep the heap structure unchanged, this compaction process requires all user threads to be paused, which is what we call STW (Stop-The-World) phenomenon. Only after the STW ends can the program continue to run. This pause time is generally a few hundred milliseconds, and in some cases it can be as long as several seconds or even tens of seconds. For real-time applications and low-latency business systems today, this is a major challenge.

With G1, the heap memory is divided into many “small heap blocks” (regions, to avoid conflicting with blocks, we will not call them “small blocks”). Just like ConcurrentHashMap divides the hash table into many segments to support finer-grained locks for performance, the finer-grained memory block division also allows the implementation of incremental garbage collection, which means that the pause time for each collection is shorter.

Of course, it has been proven in practice that the estimated value of the maximum pause time set in G1 (-XX:MaxGCPauseMillis) is very inaccurate. In even worse cases, there may be long Full GC pauses (before JDK 10, it was executed in a single-threaded serial manner, and after JDK 10, it was executed in a multi-threaded manner).

ZGC, as well as the Shenandoah GC that will be introduced later, and Azul’s C4 garbage collector, have similar approaches. They focus on reducing pause time while also performing heap memory compaction.

Unlike other GC algorithms mentioned earlier, ZGC is almost concurrent with application threads in all aspects, with only an STW pause during the initial marking phase. Therefore, the pause time of ZGC is basically consumed during the initial marking, which is very short, and this pause time does not increase with the increase of heap memory and live objects.

The process of memory compaction, or relocation, is performed concurrently and uses the “read barrier” that we mentioned earlier. The read barrier is the key weapon of ZGC, and the specific implementation principles will be described in the following sections.

Note: Azul Systems is a company that provides high-performance commercial JVMs/GCs and is a leader in the concept of pauseless GC. They are well-known for Azul VM’s Pauseless GC and the C4 GC in the Zing VM.

Principles of ZGC #

The ZGC cycle is shown in the following graph:

37037772.png

Each GC cycle is divided into 6 small stages:

  1. Pause-Marking Start Stage: The first pause marks the objects pointed to by the root object set;
  2. Concurrent Mark/Remap Stage: Traverse the object graph and mark objects;
  3. Pause-Marking End Stage: The second pause, synchronization point, weak root object cleaning;
  4. Concurrent Prepare Relocation Stage: Reference processing, weak object cleaning, etc.;
  5. Pause-Relocation Start Stage: The third pause, root objects pointing to the relocation set;
  6. Concurrent Relocation Stage: Relocate objects in the relocation set. These 6 stages are executed concurrently for the majority of the time, so they have minimal impact on the GC pause during application runtime.

ZGC adopts a concurrent design approach, which is highly technically challenging:

  • When an object needs to be copied to another address, another thread may read or modify the original old object.
  • Even after the copy is successful, there may still be many references in the heap pointing to the old address, so these references need to be updated to the new address.

To solve these problems, ZGC introduces two key technologies: “colored pointers” and “read barriers”.

Colored Pointers #

ZGC uses colored pointers to mark the GC stage it is in.

Colored pointers are bits extracted from the 64-bit pointer to represent Marked0, Marked1, Remapped, and Finalizable. This means that ZGC does not support 32-bit systems or pointer compression. The upper limit of heap memory is 4TB.

From these markings, the current state of objects can be determined and operations like cleanup and compression can be performed.

0.21570593705169117.png

Read Barriers #

When the GC thread and user threads are executing concurrently, the operations of user threads modifying objects may cause inconsistency issues. ZGC uses read barriers, which differ from write barriers used by other GCs.

With read barriers, other stages can quickly process the objects based on the pointer color. Not all read operations require a barrier. For example, only the first statement (loading a pointer) in the example below requires a read barrier, while the following three statements do not, or when operating on primitive types.

73cd6f97-8730-4aff-ae48-b8a30f3eaae0.jpg

Renowned JVM expert RednaxelaFX mentioned that ZGC’s Load Value Barrier differs from the barrier used by the Shenandoah collector by Red Hat. The latter chose the more basic Brooks Pointer from the 1970s, while ZGC added the self-healing feature based on the ancient Baker barrier.

The “read barrier” can be understood as a piece of code or an instruction with the corresponding processing function attached to it.

For example, in the following code:

Object a = obj.x;
Object b = obj.x;

Both lines of load operations are inserted with read barriers. After the first read barrier is triggered, ZGC updates the value of ‘a’ to the latest value, and through the self-healing mechanism, the pointer of ‘obj.x’ will also be corrected. When the second read barrier is triggered, it directly enters the FastPath with almost no performance overhead. However, Shenandoah does not correct the value of ‘obj.x’, so the second read barrier has to go through the SlowPath again.

Any problem in the field of computer science can be solved by adding an indirect intermediate layer.

Colored pointers and read barriers act as an intermediate layer between memory management and application code, enabling the implementation of more functionalities. However, it can be seen that the algorithm itself carries certain overhead and complexity.

Introducing ZGC Parameters #

In addition to the aforementioned -XX:+UnlockExperimentalVMOptions -XX:+UseZGC parameters for enabling ZGC, the following table lists the parameters available for ZGC:

62790527.png

Here are some commonly used parameters:

  • -XX:ZCollectionInterval: Performs GC at fixed time intervals, with a default value of 0.
  • -XX:ZAllocationSpikeTolerance: A correction factor for estimating memory allocation rate, with a default value of 2, which generally does not need to be changed.
  • -XX:ZProactive: Enables proactive reclamation strategy, with a default value of true, which is recommended to be enabled.
    • -XX:ZUncommit: Returns unused memory to the operating system, can be used from JDK 13 onwards; The JVM will not let the memory drop below Xms, so this parameter will be ineffective if Xmx and Xms are configured the same.
    • -XX:+UseLargePages -XX:ZPath: Uses large memory pages. Large Pages are called Huge Pages in Linux. Configuring ZGC to use Huge Pages can improve performance (throughput, latency, startup time). When configuring Huge Pages, it is generally used together with ZPath. The configuration method can be seen at: https://wiki.openjdk.java.net/display/zgc/Main.
    • -XX:UseNUMA: Enables NUMA support (mounts many CPUs, each CPU specifies a part of the memory system). ZGC has NUMA support enabled by default, which means that when allocating heap memory, it will try to use NUMA-local memory as much as possible. Enable and disable can be done using -XX:+UseNUMA or -XX:-UseNUMA.
    • -XX:ZFragmentationLimit: If the current region has exceeded ZFragmentationLimit, it will be reclaimed. The default value is 25.
    • -XX:ZStatisticsInterval: Sets the interval for printing ZStat statistical data (CPU, memory, etc. logs).

In addition, there is the previously mentioned concurrent thread parameter -XX:ConcGCThreads=<number>, which is important for all concurrent GC strategies. It needs to be considered based on the number of CPU cores. Configuring too many threads will incur high thread context switch overhead, while configuring too few threads will result in garbage collection speed not keeping up with the system usage.

Improvements in ZGC in Java 13 #

In ZGC in Java 11, unused memory was not actively released to the operating system, unlike G1 in this version.

This means that after memory is reclaimed, it is not returned to the operating system and is still managed internally.

For most applications, CPU and memory are limited and scarce resources, so this is not conducive to maximizing resource utilization (this has little impact on systems that deploy only one Java application and have dedicated access to all memory, especially when Xmx and Xms are set to the same value).

In Java 13, ZGC will release pages that have been identified as unused for a long time back to the operating system, allowing them to be used by other processes (consider scenarios where multiple batch processing jobs are executed in rotation, etc.). Returning these unused pages to the operating system will not cause the heap size to shrink below the initial value set by the parameters. If the minimum and maximum heap memory are set to the same value, no memory will be released to the operating system.

The improvements in ZGC in Java 13 are mainly reflected in the following points:

  • Ability to release unused memory to the operating system
  • Maximum heap memory support increased from 4TB to 16TB
  • Added parameter -XX:SoftMaxHeapSize to soft limit the heap size

Note that SoftMaxHeapSize refers to the GC not being the same as the original Xmx and Xms. By default, the GC will try to keep the heap memory from exceeding this size, but it cannot be ruled out that it may exceed this limit in certain circumstances. It can be seen as becoming more flexible. It is mainly used in the following situations:

  • When you want to reduce heap memory usage under normal circumstances while maintaining the ability to handle temporary increases in heap space,
  • Or when you want to keep sufficient memory space to handle memory allocations without getting stuck due to unexpected increases in memory allocations.

Note that do not set SoftMaxHeapSize to a value greater than Xmx, because if Xmx is set, it will be the maximum value, meaning it will never reach SoftMaxHeapSize.

In Java 13, the feature of ZGC releasing memory back to the operating system is enabled by default and can be disabled using the parameter -XX:-ZUncommit. The release of memory can also be configured to have a delay using the parameter -XX:ZUncommitDelay=<seconds>, with the default value being 300 seconds.

Introduction to Shenandoah GC #

Java 12 was officially released on March 19, 2019, and this version introduced a new garbage collector: Shenandoah (pronounced “shenandoah,” a place name in the United States). Please refer to the WIKI for more information:

https://wiki.openjdk.java.net/display/shenandoah/Main

As another option besides ZGC, Shenandoah is an low pause time garbage collector designed for managing large heap memory on large multi-core servers. It allows GC threads to run concurrently with application threads, resulting in very short pause times.

Features of Shenandoah #

Shenandoah GC started earlier than ZGC. Red Hat announced the start of this project as early as 2014 to achieve the low pause time requirement for GC on the JVM.

It is designed to have GC threads run concurrently with application threads, and with the implementation of concurrent processing of garbage collection, it improves pause times. This allows GC threads to perform heap compression, marking, and compaction during the execution of business processing threads, eliminating most of the pause time.

The Shenandoah team claims that the pause time of Shenandoah GC is independent of heap size. It guarantees low pause times for both 200 MB and 200 GB heap memories (note: it does not guarantee pause times within 10ms like ZGC).

Introduction to Shenandoah GC Principles #

The principles of Shenandoah GC are very similar to ZGC. 28583d44-89ad-4196-b96c-dd747dc43c42.png

Some of the log contents are as follows:

GC(3) Pause Init Mark 0.771ms
GC(3) Concurrent marking 76480M->77212M(102400M) 633.213ms
GC(3) Pause Final Mark 1.821ms
GC(3) Concurrent cleanup 77224M->66592M(102400M) 3.112ms
GC(3) Concurrent evacuation 66592M->75640M(102400M) 405.312ms
GC(3) Pause Init Update Refs 0.084ms
GC(3) Concurrent update references 75700M->76424M(102400M) 354.341ms
GC(3) Pause Final Update Refs 0.409ms
GC(3) Concurrent cleanup 76244M->56620M(102400M) 12.242ms

The corresponding working cycle is as follows:

  1. Initial Mark Phase: Prepare for concurrent marking of the heap and the application, then scan the root object set. This is the first pause in the GC cycle and its duration depends on the size of the root object set. Since the root object set is small, the speed is fast and the pause is very short.
  2. Concurrent Mark Phase: The concurrent marking traverses the heap and tracks reachable objects. This phase runs simultaneously with the application and its duration depends on the number of surviving objects and the structure of the object graph in the heap. As the application can freely allocate new data during this phase, the heap occupancy rate will increase during the concurrent marking.
  3. Final Mark Phase: Empty all pending marking/update queues and rescan the root object set to complete the concurrent marking. This is the second pause in the GC cycle and the main time consumption is in emptying the queues and scanning the root object set.
  4. Concurrent Cleanup Phase: Concurrent cleanup recycles immediate garbage regions, i.e., regions that have been detected as inactive objects after the concurrent marking.
  5. Concurrent Evacuation Phase: Concurrent evacuation copies objects from different regions to the specified region. This is the main difference compared to other OpenJDK GCs. This phase runs simultaneously with the application and its duration depends on the size of the collection to be copied, and does not cause program pauses.
  6. Init Update Refs Phase: This phase ensures that all GC and application threads have completed the transfer and prepares for the next GC phase. This is the third pause in the cycle and the shortest of all the pauses.
  7. Concurrent Update References Phase: Traverse the heap and update references concurrently, updating references to objects moved during the concurrent transfer period. This is the main difference compared to other OpenJDK GCs. Its duration depends on the number of objects in the heap and does not care about the object graph structure because it linearly scans the heap. This phase runs simultaneously with the application.
  8. Final Update Refs Phase: Complete the update references phase by updating the existing root object set again. This is the last pause in the GC cycle and its duration depends on the size of the root object set.
  9. Concurrent Cleanup Phase: Recycle regions without references in the current phase.

For detailed principles and what happens in each step, you can refer to the presentation slides by Gu Zhengyu, a member of the development team, at the 2019 Beijing QCon:

[《Shenandoah:Your Next Garbage Collector-古政宇》.pdf](https://gitee.com/kimmking/QConBeijing2019/raw/master/Java生态系统/Shenandoah:Your Next Garbage Collector-古政宇.pdf)

It is worth mentioning that not only GC pauses can cause longer application response times. Along with long GC pauses, other factors such as message queue delays, network delays, complex calculation logic, delays in external services, and scheduler jitter provided by the operating system can also lead to slower response times.

When using Shenandoah, it is necessary to have a comprehensive understanding of the system’s operating conditions and analyze the system’s response time. The following figure shows the comparison of various GC workloads provided by the official documentation:

b319f998-e955-4a1e-8091-9371866633e1.jpg

As seen in the figure, compared to CMS, G1, and Parallel GC, Shenandoah maintains a very low level of latency even as the system load increases, while the other GCs quickly increase in latency.

Common Parameters Introduction #

Here are some recommended configurations or debugging JVM parameters for Shenandoah:

  • -XX:+AlwaysPreTouch: Use all available memory paging to reduce system runtime pauses and avoid performance loss during runtime.
  • Set -Xmx equal to -Xms: Set the initial heap size equal to the maximum size to reduce the pressure of heap memory expansion. Use this parameter in conjunction with AlwaysPreTouch to request all memory at startup and avoid runtime pauses during usage.
  • -XX:+UseTransparentHugePages: significantly improves the performance of large heaps.

Heuristics Parameters #

Heuristics parameters inform Shenandoah GC when to start GC processing and determines which heap blocks to collect. You can use -XX:ShenandoahGCHeuristics=<name> to select different heuristics modes, and some heuristics modes may have some configurable parameters to help us better use GC. The available heuristic modes are as follows:

  1. Adaptive Mode (adaptive): This is the default parameter, which tries to start the next GC cycle before the heap is exhausted by observing some previous GC cycles.
  • -XX:ShenandoahInitFreeThreshold=#: Sets the initial threshold for triggering “learning” collection.
  • -XX:ShenandoahMinFreeThreshold=#: Sets the threshold for triggering GC unconditionally due to available space.
  • -XX:ShenandoahAllocSpikeFactor=#: Determines how much heap to retain to handle memory allocation spikes.
  • -XX:ShenandoahGarbageThreshold=#: Sets the percentage of garbage required before marking a region for collection.

2. Static Mode

In this mode, GC cycles are initiated based on heap utilization and memory allocation pressure.

  • -XX:ShenandoahFreeThreshold=#: Sets the threshold percentage of free heap.
  • -XX:ShenandoahAllocationThreshold=#: Sets the threshold percentage of memory allocation.
  • -XX:ShenandoahGarbageThreshold=#: Sets the threshold percentage of small heap chunks marked as reclaimable.
  • -XX:ShenandoahFreeThreshold=#: Sets the threshold percentage of free heap to initiate GC cycles.
  • -XX:ShenandoahAllocationThreshold=#: Sets the threshold percentage of memory allocation before starting a new GC cycle from the previous cycle.
  • -XX:ShenandoahGarbageThreshold=#: Sets the threshold percentage of garbage required before marking a region for collection.

3. Compact Mode

In this mode, GC runs continuously whenever there is memory allocation and immediately starts the next cycle after the previous one ends. This mode usually incurs throughput overhead but provides the quickest memory space reclamation.

  • -XX:ConcGCThreads=#: Sets the number of concurrent GC threads. Reducing the number of concurrent GC threads can free up more space for the application to run.
  • -XX:ShenandoahAllocationThreshold=#: Sets the threshold percentage of memory allocation before starting a new GC cycle from the previous cycle.

4. Passive Mode

If memory is exhausted, a stop-the-world (STW) event occurs for system diagnostics and functional testing.

5. Aggressive Mode

In this mode, a new GC cycle is started as soon as the previous cycle completes (similar to “compact” mode) and all surviving objects are gathered into one place. This severely impacts performance but can be used to test the GC itself.

Sometimes, the heuristic mode merges the update references phase and concurrent marking phase after evaluation. This feature can be forcibly enabled or disabled using the -XX:ShenandoahUpdateRefsEarly=[on|off] option.

Additionally, for strategies when memory allocation fails, thread adjustments can be made using the ShenandoahPacing and ShenandoahDegeneratedGC parameters. If there is still not enough memory, in the worst case, a Full GC may be triggered to ensure the system has enough memory to avoid an OOM.

For more information on how to configure and debug Shenandoah’s parameters, please refer to the official Shenandoah Wiki page.

Shenandoah Integration in JDK Versions #

The graph below shows the integration status of the Shenandoah GC in various JDK versions. It indicates that Shenandoah GC is available in OpenJDK 12 and 13.

9b62765c-d0ac-436e-999c-53d964e1ca33.png

Shenandoah GC can be used in JDK 8 and JDK 11 in Red Hat Enterprise Linux and Fedora systems (both Linux distributions are from Red Hat, as this GC is also developed and maintained by Red Hat).

  • By default, Shenandoah is typically included in released versions of OpenJDK 12+.
  • Shenandoah is included in OpenJDK 8+ released versions in Fedora 24+.
  • Shenandoah is included as a tech preview in OpenJDK 8+ released versions in RHEL 7.4+.
  • Shenandoah may also be enabled in distributions based on RHEL/Fedora or other distributions using their packaging (such as CentOS, Oracle Linux, Amazon Linux).

Epsilon GC in Java 11 Version #

Actually, Java 11 introduced a new feature called Epsilon GC, which can be enabled by using the -XX:+UseEpsilonGC flag.

The goal of Epsilon GC is to only allocate memory without performing garbage collection. Similar to the mythical beast Pixiu that only eats and doesn’t excrete, Epsilon GC is suitable for performance analysis but not for production environments. It is rarely mentioned, but it provides a good understanding of its purpose.

Since there is no garbage collection, there is no GC overhead during program execution, making performance testing more accurate!

Summary of GC Usage #

So far, we have learned about all the GC algorithms supported by Java, which can be divided into 7 categories:

  • Serial GC: Single-threaded execution with application pauses.
  • Parallel GC (ParNew, Parallel Scavenge, Parallel Old): Multi-threaded parallel garbage collection focused on high throughput.
  • CMS (Concurrent Mark-Sweep): Concurrent marking and sweeping with multiple threads, focused on reducing latency.
  • G1 (G First): Incremental compaction and collection by dividing memory into regions, further reducing latency.
  • ZGC (Z Garbage Collector): Almost all concurrent execution using colored pointers and read barriers, with latency at the millisecond level and linear scalability.
  • Epsilon: An experimental GC for performance analysis.
  • Shenandoah: An improved version of G1, similar to ZGC.

From this list, we can see the evolution of GC algorithms and implementations:

  • Serial -> Parallel: Leveraging the advantages of multi-core CPUs to significantly reduce GC pause time and increase throughput.
  • Parallel -> Concurrent: In addition to parallel GC threads, GC operations are split into multiple steps, allowing heavy tasks to be executed concurrently with application threads, reducing the duration of each GC pause and effectively decreasing system latency.
  • CMS -> G1: G1 can be considered an iteration and optimization of CMS. It addresses some of the issues with CMS and makes significant improvements in GC concepts. By dividing the heap into smaller regions for incremental collection, the duration of each GC pause is further reduced. It is evident that with the increase in hardware performance, the demand for reduced latency is becoming increasingly urgent.
  • G1 -> ZGC: ZGC claims to be a pauseless garbage collector, which is another major improvement. Although ZGC has some similarities to G1, it has made significant breakthroughs in underlying algorithms and concepts. By triggering trap handlers through read barriers for part of the GC work, business threads can also help with garbage collection. This introduces a slight workload for business threads, but they do not have to wait, and the latency is greatly reduced.

At the same time, we should note that concurrent compression seems to be the best solution for reducing pause time. However, experience tells us that “talking about performance without considering the scenarios is just demagoguery.”

Currently, for the majority of Java application systems, the heap size is not large, typically within the range of 2G to 4G. Moreover, low-latency GC pauses such as 10ms are not critical. In other words, it is acceptable for a business operation to take a few hundred milliseconds, and whether the GC pause is 100ms or 10ms does not make much difference. On the other hand, system throughput is often the main focus, so parallel GC or CMS should be considered. For larger heap sizes, G1 or ZGC can be considered. In cases where the memory is very large (e.g., over 16G, or even 64G, 128G) or where latency is critical (e.g., high-frequency quantitative trading systems), the new GC implementations mentioned in this section should be considered.

With the increasing demands of businesses, the evolution of Java ecosystems, advancements in hardware technologies, and the progress of GC theory research, we have an increasing number of GC algorithm options that are suitable for different scenarios. Each GC algorithm has its own range of adaptability; it just depends on how broad that range is.

We are currently living in an era of technological explosions. With the continuous emergence of various business scenarios, new technologies are constantly being developed. Technological upgrades are generally aimed at solving certain problems in existing technologies. So, as long as we can proactively research new technologies, we can apply them to appropriate scenarios within the shortest period of time. This allows us to enjoy the benefits of technology as the primary productivity, enhance our technical skills, and improve our ability to support and serve business development.

References #