19 Advanced Progressive Jvm Common Optimization Parameters

19 Advanced Progressive JVM- Common Optimization Parameters #

Currently, the most widely used Java version is Java 8. If your company is more conservative, then the most commonly used garbage collector is CMS. However, CMS has been officially deprecated in Java 14, and with the emergence of ZGC and the stability of G1, CMS will eventually become a thing of the past.

After Java 9, Java versions have entered a phase of rapid release, with approximately one release every six months. Java 8 and Java 11 are currently supported LTS versions.

Since the JVM is constantly changing, the configuration of some parameters may not always be effective. Sometimes you add a parameter and “feel” that the running speed has increased, but when you check using -XX:+PrintFlagsFinal, you find that this parameter is already set by default.

Therefore, on different JVM versions and different garbage collectors, you need to first see what the default setting of the parameter is and not blindly trust other people’s suggestions. The command line example is as follows:

java -XX:+PrintFlagsFinal -XX:+UseG1GC  2>&1 | grep UseAdaptiveSizePolicy

There is also a similar parameter called PrintCommandLineFlags, through which you can view the current garbage collector being used and some default values.

You can see that the default garbage collector used by the JVM below is the parallel collector:

# java -XX:+PrintCommandLineFlags -version
-XX:InitialHeapSize=127905216 -XX:MaxHeapSize=2046483456 -XX:+PrintCommandLineFlags -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseParallelGC
openjdk version "1.8.0_41"
OpenJDK Runtime Environment (build 1.8.0_41-b04)
OpenJDK 64-Bit Server VM (build 25.40-b25, mixed mode)

There are many JVM parameter configurations, but most of them do not need our attention. Next, we will analyze the JVM parameters of the ES service to see some common optimization points.

ElasticSearch (abbreviated as ES) is a high-performance open-source distributed search engine. ES is developed based on the Java language, and under its conf directory, there is a file called jvm.options where the JVM configuration is placed.

Heap Space Configuration #

Below is the configuration of heap space size for ES.

-Xms1g
-Xmx1g

We discussed in “17 | Advanced: How does JVM perform garbage collection?” that the largest portion of space in the JVM is the heap, and garbage collection mainly targets this area. The Xmx parameter specifies the maximum value of the heap, and the Xms parameter specifies the initial size of the heap. We usually set these two parameters to the same size to avoid the time overhead of dynamic expansion of the heap space.

There is also a parameter called AlwaysPreTouch in the configuration file.

-XX:+AlwaysPreTouch

In fact, the heap memory specified by Xmx is only allocated by the operating system when the JVM is actually used. With this parameter, all its memory is allocated by the operating system when the JVM starts. When the heap is relatively large, it will increase the startup time, but it can reduce the performance overhead of dynamic memory allocation and improve the runtime speed.

As shown in the following figure, the JVM memory is divided into the heap and off-heap memory, and the size of the heap can be configured through Xmx and Xms.

Drawing 1.png

However, when configuring the heap memory for ES, we usually set the initial size of the heap to half of the physical memory. This is because ES is a storage-type service, and we need to reserve half of the memory for file caching (theoretical reference: “07 | Case Study: Ubiquitous Caching, the Magic Weapon of High-Concurrency Systems”). This area is generally referred to as the PageCache and occupies a large space. For computational nodes, such as normal web services, the heap memory is usually set to 2/3 of the physical memory, and the remaining 1/3 is reserved for off-heap memory.

In this diagram, we have made a detailed division of off-heap memory, which is explained as follows:

  • Metaspace: The parameters -XX:MaxMetaspaceSize and -XX:MetaspaceSize specify the maximum and initial memory of the Metaspace, respectively. Since the Metaspace has no upper limit by default, in extreme cases, it can occupy all the remaining memory of the operating system.
  • JIT-compiled code storage: The parameter -XX:ReservedCodeCacheSize is used for storing the binary code generated by the Just-In-Time (JIT) compiler. The JNI (Java Native Interface) code is also stored here.
  • Native memory: Native memory is the collective term for other memory regions that are attached to the JVM process, such as memory occupied by network connections and thread creation. In highly concurrent applications, due to the large number of connections and threads, this part of the memory can still be considerable when accumulated.
  • Direct memory: It is worth mentioning direct memory because it is the only region in the native memory that can be controlled by parameters. By using the parameter -XX:MaxDirectMemorySize, the upper limit of memory allocated by ByteBuffer objects can be set.
  • JNI memory: As mentioned earlier, JNI memory refers to the specific memory allocated by the JNI code stored in the CodeCache. Unfortunately, the JVM cannot control the usage of this memory, as it depends on the specific implementation of the JNI code.

Log parameter configuration #

Below is the log parameter configuration for Elasticsearch (ES), which is divided into two parts for Java 8 and Java 9.

Java 8:

-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime
-Xloggc:logs/gc.log
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=32
-XX:GCLogFileSize=64m

Java 9:

-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=data
-XX:ErrorFile=logs/hs_err_pid%p.log

The above configuration parameters are explained as follows, taking Java 8 as an example:

  • PrintGCDetails: Prints detailed GC logs.
  • PrintGCDateStamps: Prints the current system time for better readability. In contrast, PrintGCDateStamps prints relative time since JVM startup, which is less readable.
  • PrintTenuringDistribution: Prints the age distribution of objects, which is helpful for tuning the MaxTenuringThreshold parameter.
  • PrintGCApplicationStoppedTime: Prints the stop-the-world (STW) time.
  • The next few log parameters configure rolling logging similar to Logback, which are relatively simple and will not be explained in detail.

Starting from Java 9, more than 40 GC log-related parameters have been removed by the JVM, as described in JEP 158. Therefore, there have been significant changes to the log configuration, and the printing format of GC logs has been completely transformed, making the log parameters more structured.

The parameter for Java 9 is as follows:

-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=data
-XX:ErrorFile=logs/hs_err_pid%p.log

Let’s now take a look at ES’s configuration parameters in case of exceptional situations, such as OutOfMemoryError:

-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=data
-XX:ErrorFile=logs/hs_err_pid%p.log

HeapDumpOnOutOfMemoryError, HeapDumpPath, and ErrorFile are parameters that should be configured for every Java application. Under normal circumstances, we use jmap to obtain heap information for an application. However, in exceptional cases such as an OutOfMemoryError, these three configuration parameters allow us to automatically dump a heap information to the specified directory when an OOM occurs. Once we obtain this dump information, we can use tools like MAT to analyze it in detail and find the specific cause of the OOM (Out of Memory) error.

Garbage Collector Configuration #

By default, ES (Elasticsearch) uses the CMS (Concurrent Mark Sweep) garbage collector, which has the following three main configurations:

-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly

Let’s go over these two parameters:

  • UseConcMarkSweepGC indicates that the young generation uses ParNew, while the old generation uses the CMS garbage collector.
  • CMSInitiatingOccupancyFraction - Since CMS needs to keep the user threads running during its execution, it requires sufficient memory space for user utilization. If the old generation space is almost full and this garbage collection process is initiated, the user threads may encounter a “Concurrent Mode Failure” error. In such cases, the Serial Old collector is temporarily used for garbage collection in the old generation, resulting in a lengthy pause time (STW).

It is generally recommended to reserve around 30% of this space, which leaves approximately 70% for use. The parameter -XX:CMSInitiatingOccupancyFraction is used to configure this proportion, but it must first be configured with the -XX:+UseCMSInitiatingOccupancyOnly parameter.

Furthermore, for the CMS garbage collector, the commonly used configuration parameters are as follows:

  • -XX:ExplicitGCInvokesConcurrent - When System.gc() is explicitly invoked in the code to trigger a FullGC, this parameter enables concurrent FullGC. It is recommended to use this parameter.
  • -XX:CMSFullGCsBeforeCompaction - Defaults to 0, which means that every FullGC compacts the old generation to eliminate fragmentation. It is recommended to keep the default value.
  • -XX:CMSScavengeBeforeRemark - Enables or disables scavenging (YGC) before the CMS remark phase. This can reduce the remark time, so it is recommended to enable it.
  • -XX:+ParallelRefProcEnabled - Enables parallel reference processing to speed up processing and reduce time consumption.

Note that the CMS garbage collector has been removed in Java 14. Due to its uncontrolled GC time, it should be avoided whenever possible.

For Java 10 (ordinary Java applications can use G1 in Java 8), ES can use the G1 garbage collector. We have discussed G1 in detail in “17 | Advanced: How Does JVM Perform Garbage Collection?”. G1 can be relatively easily configured by specifying the desired pause time using the MaxGCPauseMillis parameter.

Here are the main configuration parameters:

  • -XX:MaxGCPauseMillis - Sets the target pause time that G1 tries to achieve.
  • -XX:G1HeapRegionSize - Sets the size of the young generation heap region. This value should be a power of 2 and not too large or too small. If unsure, it is recommended to keep the default value.
  • -XX:InitiatingHeapOccupancyPercent - Once the overall heap memory usage reaches a certain percentage (default is 45%), the concurrent mark phase is triggered.
  • -XX:ConcGCThreads - Sets the number of threads used by the concurrent garbage collector. The default value varies depending on the platform the JVM is running on and it is not recommended to modify it.

JVM supports many garbage collectors, and here are a few commonly used ones along with their configuration parameters:

  • -XX:+UseSerialGC - Uses the Serial garbage collector for both the young and old generations.
  • -XX:+UseParallelGC - Uses ParallelGC for the young generation and Serial Old for the old generation.
  • -XX:+UseParallelOldGC - Uses the Parallel garbage collector for both the young and old generations.
  • -XX:+UseG1GC - Uses the G1 garbage collector.
  • -XX:+UseZGC - Uses the ZGC garbage collector.

Additional Configuration #

Let’s take a look at a few additional configurations.

-Xss1m

-Xss sets the capacity of each Java virtual machine stack to 1MB. This parameter is the same as -XX:ThreadStackSize and defaults to 1MB.

-XX:-OmitStackTraceInFastThrow

Changing the ‘-’ to ‘+’ reduces the output of exception stacks by merging them. Although it can cause some difficulties in debugging, it significantly improves performance when exceptions occur. The drawback is that it makes it harder to track down exception information. In order to facilitate problem solving, ES disables this feature.

-Djava.awt.headless=true

Headless mode is a system configuration mode in which the system lacks a display device, keyboard, or mouse. Since servers typically do not have these devices, this parameter instructs the virtual machine to use software to simulate them.

9-:-Djava.locale.providers=COMPAT
-Dfile.encoding=UTF-8
-Des.networkaddress.cache.ttl=60
-Des.networkaddress.cache.negative.ttl=10
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Djava.io.tmpdir=${ES_TMPDIR}
-Djna.nosys=true

These parameters, specified using the ‘-D’ option, set system property values when starting a Java program. These values can be obtained by calling System.getProperties() in the System class.

This part has a high degree of customization and will not be discussed in much detail.

Other optimizations #

The above is the default JVM parameter configuration for ES, which is mostly basic. In normal application services, we hope to have more fine-grained control, and one commonly used approach is to adjust the ratio between various generations.

  • -Xmn sets the size of the young generation, which by default takes up one-third of the heap size. For highly concurrent scenarios with fast dying objects, it is possible to increase this area by half or more. However, for G1, this value does not need to be set anymore, as it will adjust automatically.
  • -XX:SurvivorRatio defaults to 8, indicating the ratio between the eden and survivor regions.
  • -XX:MaxTenuringThreshold defaults to 6 in CMS and 15 in G1. This value is related to the object promotion we mentioned earlier, and making changes to it can have a noticeable effect. The object’s age distribution can be printed using -XX:+PrintTenuringDistribution. If the sizes of the later generations are always similar, it indicates that objects that have reached a certain age always promote to the old generation, and the promotion threshold can be set lower.
  • PretenureSizeThreshold allocates objects directly in the old generation when they exceed a certain size. However, this parameter is not commonly used.

Exercise: Cassandra Configuration #

Now that you understand the configuration parameters we discussed above, you can analyze Cassandra’s configuration file. Cassandra is a high-speed column storage database that uses gossip for cluster maintenance. Its JVM parameter configuration is also located in the jvm.options file.

To facilitate your analysis, I have uploaded the configuration files for ES and Cassandra to the repository. Feel free to practice and discuss any questions you may have in the comments section below.