25 Q& a Session Module Four Hot Questions Answered

25 Q&A session Module four hot questions answered #

Hello, I’m Liu Chao.

This week, we have finished our study on “JVM Performance Monitoring and Tuning”. For this Q&A session, I have selected 11 comments from Module 4 to provide targeted answers. I hope this will be helpful to you. Additionally, I would like to give a thumbs-up to those who have been following along until now. I look forward to more technical exchanges and mutual growth.

[Lecture 20] #

Many students have asked questions similar to “The Cat in the Dark Night,” so I will reply to them collectively. The JVM’s memory model is only a specification, and the method area is also a specification, a logical partition, not a physical space. When we say that string constants are stored in the heap memory space, we are referring to the actual physical space.

Wen Hao’s question is similar to the previous one, so I will reply to both. The metaspace belongs to the method area, which is just a logical partition, while the metaspace is the concrete implementation. Therefore, the class metadata is stored in the metaspace, which logically belongs to the method area.

[Lecture 21] #

Liam, currently the Hotspot virtual machine does not support object allocation on the stack. W.LI’s comment is worth considering, so I’m including it here.

[Lecture 22] #

This is really great! Jxin’s explanation of the Region part was very accurate. Here, I will summarize some of the key points about CMS and G1.

CMS (Concurrent Mark Sweep) garbage collector is based on the mark-sweep algorithm and is mainly used for garbage collection in the old generation. The GC cycle of the CMS collector consists of seven phases, two of which cause stop-the-world, while the other phases are executed concurrently.

G1 (Garbage-First) garbage collector is based on the mark-compact algorithm and is a generational garbage collector that is responsible for garbage collection in both the young and old generation.

Unlike the previous generations that use continuous virtual memory addresses, G1 uses a method called Region to divide the heap memory. It also has a young generation and an old generation, but each generation uses N non-continuous Region memory blocks, with each Region occupying a continuous virtual memory address.

In G1, there is also a special region called the Humongous region, which is used to store particularly large objects. G1 internally optimizes this by directly reclaiming the Humongous region in the Young GC when there are no references pointing to it.

G1 is divided into Young GC, Mixed GC, and Full GC.

G1 Young GC mainly occurs in the Eden region. When the space in the Eden region is insufficient, a Young GC will be triggered. When moving data from the Eden region to the Survivor space, if the Survivor space is not enough, it will be directly promoted to the old generation. At this time, the data in the Survivor space will also be promoted to the old generation. The execution of Young GC is parallel, and stop-the-world will occur during this process.

When the occupancy of the heap space reaches a certain threshold, G1 Mixed GC will be triggered (the threshold is set by the command-line parameter -XX:InitiatingHeapOccupancyPercent, with a default value of 45). Mixed GC mainly consists of four phases, and only the concurrent marking phase does not cause stop-the-world, while the other phases do cause stop-the-world.

The main differences between G1 and CMS are:

CMS mainly focuses on garbage collection in the old generation, while G1 focuses on generational garbage collection, including Young GC in the young generation and Mixed GC in the old generation.
G1 uses the Region method to divide the heap memory and is based on the mark-compact algorithm, which reduces the generation of garbage fragments overall.
During the initial marking phase, both CMS and G1 search the Card Table used for reachable object traversal. However, the implementation methods are different.

Let me briefly explain the Card Table. During garbage collection, the search starts from the roots, goes through the young generation, and then into the old generation. It is also possible for the old generation to reference objects in the young generation. If a Young GC occurs, in addition to scanning the root objects in the young generation, it is also necessary to scan the root objects in the old generation to confirm the reference to objects in the young generation.

This kind of processing across generations consumes a lot of performance. To avoid scanning the entire old generation during garbage collection in the young generation, both CMS and G1 use the Card Table to record these reference relationships. However, G1 introduces the RSet on top of the Card Table. When each Region is initialized, an RSet is initialized as well, which records the relationship of objects in other Regions referencing objects in this Region.

In addition, CMS and G1 use different methods to address the issue of missed marks during concurrent marking. CMS uses the Incremental Update algorithm, while G1 uses the SATB (Snapshot-At-The-Beginning) algorithm. First of all, we need to understand that both G1 and CMS are based on the three-color marking algorithm in concurrent marking:

Black: Root objects, or objects and their child objects have been scanned.
Grey: The object itself has been scanned, but its child objects have not been scanned.
White: Unreachable objects.

Based on this marking, there is a problem of missed marking. When a white-marked object is cleaned up during garbage collection and there is an object referencing that white-marked object, an object loss problem occurs because it was already collected.

To avoid this problem, CMS uses the Incremental Update algorithm. Whenever a reference to a white object is assigned to a black object field within a write barrier, the white object is turned grey. In G1, the SATB algorithm is used. This algorithm assumes that all objects that can be traversed initially should be marked, which means they are considered alive.

G1 has a Pause Prediction Model, which allows users to set the expected pause time for the entire GC process. The parameter -XX:MaxGCPauseMillis can be used to specify the target pause time for a G1 collection, with a default value of 200ms.

G1 will use historical data collected by this model to predict the number of Regions needed for garbage collection and control the target pause time by controlling the number of Regions.

These are the two great questions pointed out by Liam.

Regardless of the type of GC, a stop-the-world event will occur. The difference lies in the duration of the event. Serial, ParNew, and Parallel Scavenge collectors, whether serial or parallel, will suspend user threads, while CMS and G1 do not suspend user threads during concurrent marking. However, they still suspend user threads at other times, although the duration of the stop-the-world event is relatively shorter.

In many reference materials, Major GC is equivalent to Full GC. We can also see that many performance monitoring tools only distinguish between Minor GC and Full GC. In general, a Full GC will perform garbage collection on the young generation, the old generation, the metaspace, and off-heap memory. There are several triggers for a Full GC:

When the size of objects promoted from the young generation to the old generation is larger than the remaining space in the old generation, a Full GC is triggered.
When the space utilization of the old generation exceeds a certain threshold, a Full GC is triggered.
When the metaspace is insufficient (for JDK 1.7, the permanent generation is insufficient), a Full GC is triggered.
When System.gc() is called, a Full GC is scheduled.

Next, let’s answer nightmare’s question. We can use the command jstat -gc pid interval to view the memory utilization of each partition after each GC. We can also view the specific settings of the garbage collector through JVM configuration parameters. There are many ways to do this, for example, using the command jcmd pid VM.flags can show the relevant setting parameters.

Here is the summary of the garbage collector charts corresponding to various setting parameters from Lecture 22.

[Lecture 23] #

Once again, I want to emphasize that the comments from my classmates are really on point, and they have a good grasp of the details!

The premise is that there is enough space in the old generation to accommodate these objects in order for them to be allocated. If the remaining space in the old generation is less than the average amount promoted from the minor GC to the old generation, a full GC will be initiated.

From what I see here, I find that some students who love to ask questions always ask questions, which is really encouraging. Technology requires communication, and I welcome any questions you may have. Feel free to leave a message anytime, and I will do my best to answer them.

Now, let me answer the question from student W.LI. The memory allocation will be based on the memory usage rate of the objects we create. It will allocate memory reasonably, taking into consideration factors such as object promotion as well as the overall impact on garbage collection pause time. For certain special scenarios, we can manually fine-tune the configuration.

[Lecture 24] #

Below is the answer to Geek_75b4cd’s question.

We know that ThreadLocal is implemented based on ThreadLocalMap, and Entry in this map inherits WeakReference. The key in the Entry object is encapsulated with WeakReference, which means that the key in the Entry is a weak reference type. Weak reference types can only survive until the next garbage collection.

If a thread calls set method on ThreadLocal to set a variable, the current ThreadLocalMap will add a new record. However, due to a garbage collection, the key value at this time will be collected, but the value will still exist in memory. Since the current thread continues to exist, the value will be referenced continuously.

The key that has been garbage collected will continue to exist in a reference chain: Thread -> ThreadLocalMap -> Entry -> Value. This reference chain will prevent the Entry from being garbage collected, and the Value will also not be garbage collected. However, the key in the Entry has already been collected, resulting in a memory leak.

We only need to remove the value using the remove method after we have finished using the key value, which can prevent memory leaks.

The last question is from WL.

Memory leak refers to the situation where objects that are no longer used cannot be garbage collected in a timely manner, leading to waste of memory space. For example, as I mentioned in [Lecture 03], the substring method in Java 6 can cause memory leaks.

When calling the substring method, a new string constructor is called, which will reuse the char array of the original string. If we only use substring to get a small part of the characters, and in the case where the original string is very large, if the substring object is continuously referenced, the original string cannot be garbage collected because the char array in the substring still points to the original string, resulting in a memory leak.

Memory overflow refers to the occurrence of an OutOfMemoryException. There are many situations that can cause memory overflow, such as insufficient heap memory, insufficient stack space, or insufficient method area space.

The relationship between memory leaks and memory overflow: memory leaks can easily cause memory overflow, but memory overflow is not necessarily caused by memory leaks.