27 What Are the Common Garbage Collectors in Java

27 What are the common garbage collectors in Java #

Garbage collection mechanism is a signature feature of Java that greatly improves development efficiency. Nowadays, garbage collection has become a standard feature in modern languages. Despite years of development, Java’s garbage collection mechanism is still evolving. Different sizes of devices and different application scenarios present new challenges to garbage collection. This is also a hot topic in interviews.

Today, I’m going to ask you the question, What are the common garbage collectors in Java?

Typical answer #

In fact, the garbage collector (GC) is closely related to the specific JVM implementation, and different vendors (IBM, Oracle) provide different choices for different versions of JVM. Next, I will talk about the most mainstream Oracle JDK.

  • Serial GC is the oldest garbage collector. “Serial” means that its collection work is single-threaded, and it enters the notorious “Stop-The-World” state during the garbage collection process. Of course, its single-threaded design also means a simplified GC implementation, which does not require the maintenance of complex data structures and has a simple initialization. Therefore, it has always been the default option for the JVM in client mode.

From a historical perspective, the implementation of its old generation is usually referred to as Serial Old separately. It uses the mark-compact algorithm, which is different from the copying algorithm used in the nursery generation.

The corresponding JVM parameter for Serial GC is:

-XX:+UseSerialGC
  • ParNew GC is obviously a nursery GC implementation. It is actually a multi-threaded version of Serial GC and is most commonly used in conjunction with the CMS GC for the old generation. The corresponding parameters are as follows:
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC
  • CMS (Concurrent Mark Sweep) GC is based on the mark-sweep algorithm. Its design goal is to minimize pause time, which is very important for time-sensitive applications such as the web. Even today, many systems still use CMS GC. However, the mark-sweep algorithm used in CMS leads to memory fragmentation, so it is difficult to avoid full GCs during long-term running, resulting in severe pauses. In addition, since concurrency is emphasized, CMS consumes more CPU resources and competes with user threads.

  • Parallel GC, in the early versions of JDK 8 and other versions, was the default GC choice for server-mode JVM and is also known as throughput-oriented GC. Its algorithm is quite similar to Serial GC although the implementation is much more complex. Its characteristic is that both the nursery and old generation GC are performed in parallel, making it more efficient in common server environments.

The enabling option is:

-XX:+UseParallelGC

In addition, Parallel GC introduces developer-friendly configuration options. We can directly set pause time or throughput as targets, and the JVM will automatically adjust adaptively. For example, the following parameters:

-XX:MaxGCPauseMillis=value
-XX:GCTimeRatio=N // GC time to user time ratio = 1 / (N+1)
  • G1 GC is a GC implementation that balances throughput and pause time. It is the default GC option after Oracle JDK 9. G1 allows us to intuitively set the target pause time. Compared to CMS GC, G1 may not achieve as low pause time as CMS in the best case, but it performs much better in the worst case.

G1 GC still has the concept of generations, but its memory structure is not simply divided into stripes, but consists of regions similar to a chessboard. The algorithm used between regions is copying, but overall it can be considered as the mark-compact algorithm, which can effectively avoid memory fragmentation, especially when the Java heap is very large, the advantages of G1 become more apparent.

G1 has good throughput and pause performance, and it is still being continuously improved. At the same time, CMS has been deprecated in JDK 9, so it is worth your deep understanding of G1 GC.

Analysis of Exam Focus #

Today’s question is to test your understanding of GC. GC is a common topic in interviews for Java programmers, but not everyone has the opportunity or the necessity to have an in-depth understanding of JVM and GC. The summary I provided earlier is an overall impression for students who are not familiar with this part of the content.

Regarding garbage collection, the interviewer can gradually delve into it from various perspectives such as theory and practice, and may not necessarily require the interviewee to know everything. However, if you understand the principles, it will definitely be a bonus point in the interview. In today’s explanation, I focus on introducing the more general and fundamental parts:

  • What are the garbage collection algorithms? How to determine whether an object can be collected?
  • The basic process of garbage collector’s work.

In addition, Java has been in a rapidly developing state. In the latest JDK implementations, there are also multiple new GC algorithms. I will provide an additional supplement at the end to see what other choices are worth paying attention to besides the garbage collectors mentioned earlier.

Knowledge Expansion #

Principles and Basic Concepts of Garbage Collection

First, the prerequisite for automatic garbage collection is to determine which memory can be released. This can be considered in conjunction with my previous analysis of Java class loading and memory structure.

There are mainly two aspects: the primary one is object instances, which are all stored in the heap; the other is metadata and other information in the method area, such as unloading a Java class that is no longer used, which seems reasonable.

For object instance collection, there are two basic algorithms: reference counting and reachability analysis.

  • The reference counting algorithm, as the name suggests, adds a reference count to the object to record its references. If the count is 0, it means the object can be reclaimed. This is a resource reclamation option in many languages, such as Python, which is becoming increasingly popular due to artificial intelligence. The choice between these two depends on the scenario. In large-scale practical implementations, some have chosen to retain only the reference counting mechanism to improve throughput.

Java did not choose reference counting because of a fundamental challenge, which is dealing with cyclic reference relationships.

  • Another reason Java chose reachability analysis is that Java’s various reference relationships, to some extent, further complicate the reachability problem. Please refer to the 4th lecture for more details. This type of garbage collection is usually called tracing garbage collection. In simple terms, it treats objects and their reference relationships as a graph, designates active objects as GC Roots, and then traces reference chains. If an object is not reachable from the GC Roots, meaning there is no reference chain, it can be considered for reclamation. JVM treats objects that are referenced by the virtual machine stack and native method stack, objects referenced by static attributes, and constants as GC Roots.

The collection of unused metadata in the method area is more complicated. Let me briefly summarize it. Do you still remember the classification of class loaders I mentioned before? Generally, types loaded by an initializing class loader are not subject to unloading. On the other hand, unloading ordinary types often requires the corresponding custom class loader itself to be reclaimed. So, in scenarios where dynamic types are heavily used, you need to prevent the metadata area (or the old permanent generation) from encountering OOM. In JDK 8u40 and later, the following parameter is already the default:

-XX:+ClassUnloadingWithConcurrentMark

Second, you only need to have a general understanding of the common garbage collection algorithms. By understanding their principles and advantages and disadvantages, it is sufficient. They can be broadly divided into three categories:

  • Copying Algorithm: Most young generation garbage collectors are based on the copying algorithm, as I introduced in the previous lecture. The process involves copying live objects to the “to” space, placing them sequentially to avoid fragmentation.

The cost of doing this is that since copying is required, some memory space needs to be reserved in advance, resulting in waste. In addition, for garbage collectors like G1, which split into a large number of regions, copying instead of moving means that the collector needs to maintain object reference relationships between regions. This incurs additional overhead, whether it be in terms of memory consumption or time consumption.

  • Mark-Sweep Algorithm: It first performs the mark phase to identify all objects to be reclaimed, and then performs the sweep phase to clear them. Apart from the limited efficiency of marking and sweeping, another drawback is the inevitable fragmentation problem, making it unsuitable for very large heaps. Otherwise, once a Full GC occurs, the pause time may be unacceptable.

  • Mark-Compact Algorithm: Similar to the Mark-Sweep algorithm, but to avoid memory fragmentation, it moves objects during the cleaning process to ensure that the moved objects occupy continuous memory space.

Please note that these are just basic algorithm ideas. The actual garbage collection implementation process is much more complex. The cutting-edge garbage collectors currently in development are composite algorithms that combine parallelism and concurrency.

If you are interested in these algorithms, you can refer to an interesting book called “The Garbage Collection Handbook”. Although it does not focus on Java garbage collection, it provides a vivid explanation of general algorithms.

Understanding the Garbage Collection Process In my previous column, I provided a detailed overview of the heap structure. During the garbage collection process, what changes occur in the Eden, Survivor, and Tenured regions?

The specific changes depend on the chosen garbage collection method. Let’s first familiarize ourselves with the typical garbage collection process. I have created a series of diagrams to help you understand this process.

First, objects created by a Java application are usually allocated in the Eden region. When the space occupied in Eden reaches a certain threshold, a minor GC is triggered. Objects that are still referenced (green blocks) survive and are copied to the Survivor region chosen by the JVM, while unreferenced objects (yellow blocks) are reclaimed. Note that I have marked the surviving objects with a “1” to indicate their survival time.

Second, after a minor GC, Eden becomes empty and remains so until it reaches the threshold to trigger another minor GC. At this point, the other Survivor region becomes the “to” region. The surviving objects from Eden and the “from” region are both copied to the “to” region, and the age count of the surviving objects is incremented by 1.

Third, this process repeats several times until an object’s age count reaches a threshold. This triggers a process called promotion, as shown in the following diagram. Objects that exceed the threshold are promoted to the Tenured generation. The threshold can be specified with the parameter:

-XX:MaxTenuringThreshold= **& lt;**N & gt;

The next step is the garbage collection of the Tenured generation, which depends on the chosen GC options and algorithms. The following diagram illustrates a simple mark-compact algorithm. After removing unused objects in the Tenured generation, the GC compacts the objects to prevent memory fragmentation.

Usually, we refer to garbage collection in the Tenured generation as major GC, and the cleanup of the entire heap as full GC. However, this is not always absolute because different algorithms for Tenured GC have significant differences in performance. For example, in Concurrent Mark Sweep (CMS) GC, the “concurrent” aspect means that the cleanup work is done concurrently with the working threads.

Advancements in GC

GC is still rapidly evolving. The default option, G1 GC, has been continuously improved. Many perceived drawbacks, such as serial full GC and inefficiencies in Card Table scanning, have been significantly addressed. For instance, starting from JDK 10, full GC runs in parallel, and in many scenarios, it performs slightly better than the parallel full GC implementation of the Parallel GC.

Even the Serial GC, although relatively old, may not be considered obsolete because its simple design and implementation mean it incurs minimal overhead. This applies to both GC-related data structures and thread overhead. With the rise of cloud computing and new application scenarios like Serverless, the Serial GC has found a new stage.

Unfortunately, the CMS GC has been marked as deprecated due to theoretical flaws in its algorithm, among other reasons. Although it still has a substantial user base, unless an organization voluntarily maintains CMS, it is likely to be removed in future versions.

If you have been following the development of JDK 11, which is still in progress, you will find that two new GC methods have been added:

  • Epsilon GC is a GC implementation that does not perform garbage collection. This might seem strange, but in certain situations, such as performance testing, it may be necessary to accurately measure the overhead caused by GC. This is its typical use case.

  • ZGC is a super GC implementation that Oracle has released as open source. It has astonishing scalability, supporting heap sizes of T-bytes level and guaranteeing that latency will not exceed 10 ms in most cases. Although it is currently in the experimental stage and only supports 64-bit Linux platforms, it has demonstrated significant capabilities and potential.

Of course, other vendors also provide various unique GC implementations. For example, well-known low-latency GC implementations include Zing and Shenandoah. If you are interested, please refer to the provided links.

Today, as the first lecture in the GC series, I have provided an overview of the current mainstream GC implementations, including their basic principles and algorithms. I have also introduced the garbage collection process briefly, considering the memory structure I discussed earlier. I hope this information will be helpful in your practical applications.

Practice Exercise #

Do you have a clear understanding of the topic we discussed today? We talked about a lot of theories today, but let’s think about a practical question. What parameters do you usually use to enable GC logging? Are there any additional options you add?

Please write your thoughts on this question in the comments section. I will select a well-thought-out comment and reward you with a learning voucher. Feel free to discuss it with me.

Are your friends also preparing for interviews? You can “Ask a Friend” and share today’s topic with them. Maybe you can help them out.