35 What Does Jvm Do When Optimizing Java Code

35 What does JVM do when optimizing Java code #

In my previous column, I introduced micro-benchmarking and related precautions, with the core goal of avoiding distortions caused by JVM optimization during the execution of Java code. Therefore, having a systematic understanding of the Java code execution process is beneficial for further optimization in practice.

Today, I want to ask you a question: What does JVM do when optimizing Java code?

Unlike my usual approach of giving a typical answer, today I have invited Dr. Zheng Yudi, the author of the neighboring column “Deep Dive into Java Virtual Machine” and also an expert from Oracle, to think about and answer this question from the perspective of a JVM expert.

Answer from Dr. Zheng Yudi, author of the JVM section #

JVM’s optimizations for code execution can be divided into runtime optimizations and just-in-time (JIT) compiler optimizations. Runtime optimizations mainly involve general mechanisms such as interpreted execution and dynamic compilation, including locking mechanisms (such as biased locking) and memory allocation mechanisms (such as TLAB). In addition, there are some optimizations specifically designed to improve the efficiency of interpreted execution, such as template interpreters and inline caches (used to optimize dynamic binding of virtual method calls).

JVM’s JIT compiler optimization refers to converting hot code into machine code on a method-by-method basis, running directly on the underlying hardware. It employs various optimization techniques, including those available to static compilers, such as method inlining and escape analysis, as well as speculative/optimistic optimizations based on program runtime profiles. How should we understand this? For example, if I have an “instanceof” instruction and the class of the tested object remains the same throughout the execution process before compilation, the JIT compiler can assume that the class will still be the same after compilation and directly return the result of “instanceof” based on that class. If another class appears, the compiled machine code will be discarded and the execution will switch back to interpretation.

Of course, JVM optimizations only take effect when running application code. If the application code itself blocks, such as waiting for the result of another thread during concurrency, it is not within the scope of JVM’s optimizations.

Analysis of the Test Points #

I would like to thank Dr. Zheng Yudi for the response he provided from the perspective of JVM. Today, this interview question has been asked by many students in my column and is also a topic that interviewers often delve into during interviews. Dr. Zheng’s answer is already comprehensive and in-depth.

Most Java engineers are not JVM engineers, but the knowledge still needs to be actualized. Interviewers are likely to approach the topic from a practical perspective, such as discussing how to interact with JVM modules like JIT in production practice, and how to actually optimize performance.

In today’s lesson, I will start from the perspective of everyday Java engineers and focus on:

  • Understanding the overall process of Java code compilation and execution, in order to have an intuitive understanding of the basic mechanisms and processes, ensuring the ability to understand the logic behind optimization choices.
  • From the perspective of production system performance optimization, I will discuss how to apply JIT knowledge to practical work. This includes two parts: how to collect information related to JIT, and specific optimization methods.

Knowledge Expansion #

Firstly, let’s take a look at the entire lifecycle of Java code from a holistic perspective, which you can refer to in the diagram I provided.

As I mentioned in Lesson 1 of this column, Java uses bytecode as an intermediate representation to abstract the differences between different hardware platforms, and the JVM is responsible for converting bytecode into machine code.

The term “compilation phase” typically refers to the process of converting source code into bytecode using compilers such as javac or related APIs. During this phase, some optimizations such as constant folding may also be performed, and you can directly view the details using decompilation tools.

There is a relationship between the optimizations performed by javac and those performed internally by the JVM, as javac is responsible for generating bytecode. For example, in Java 9, string concatenation is replaced by calls to StringConcatFactory by javac, which provides a unified entry point for optimizing string concatenation in the JVM. In practical scenarios, this process can also be intervened through different strategies.

Today, I will focus on JVM runtime optimization. In general, the compiler and interpreter work together. You can refer to the following diagram for the specific process.

Based on statistical information, the JVM dynamically determines which methods should be compiled and which methods should be interpreted. Even if the code has already been compiled, it may no longer be a hot spot in different runtime stages, and the JVM needs to remove such code from the Code Cache as its size is limited.

As Dr. Zheng answered, the interpreter and compiler also perform some common optimizations, such as:

  • Lock optimizations, which you can refer to in the runtime analysis of the interpreter I provided in Lesson 16 of this column.
  • Intrinsic mechanism, also known as built-in methods, which are custom implementations provided directly by the JDK team for important basic methods. They are written using assembly language or the intermediate representation of the compiler, and the JVM replaces them directly at runtime.

There are many reasons for doing this. For example, CPUs of different architectures have differences at the instruction level, and customization can fully leverage their hardware capabilities. Hotspot provides built-in implementations for typical string operations, array copying, and other basic methods we use in our daily lives.

On the other hand, Just-In-Time (JIT) compilers are responsible for more optimization work. The JIT treats the entire method as the basic unit of compilation for Java. It identifies hot methods based on the count of method invocations and compiles them into native code. Another optimization scenario specifically targets so-called hot loop code and utilizes stack replacement technology (OSR), which is commonly referred to as On-Stack Replacement. If the method does not meet the compilation criteria based on its invocation frequency, but contains a large loop internally, further optimization is still valuable.

Theoretically, JIT can be regarded as relying on two counters to work: a method counter and a back edge counter that provide statistical data to the JVM for locating hot code. In practice, the JIT mechanism is much more complex. Dr. Zheng mentioned Escape Analysis, Loop Unrolling, method inlining, and other optimizations that also occur during the JIT phase along with the aforementioned common mechanisms such as Intrinsic.

Secondly, what methods can be used to investigate the specific occurrence of these optimizations?

Some of these methods have been introduced in this column, and I will briefly summarize them and provide additional details.

  • Printing detailed information about compilations.

    -XX:UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=<your_file_path>

  • Outputting more details about compilations.

    -XX:UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=<your_file_path>

The JVM generates an XML file, and the LogFile option is optional. If not specified, the output will be written to

hotspot_pid<pid>.log

The specific format can be found in the JitWatch tool and analysis guide provided by Ben Evans.

  • Printing occurrences of method inlining. To use this diagnostic option, the unlocking operation is also required:

    -XX:+PrintInlining

Many tools have already provided specific statistics, such as JMC, JConsole, and others. I have also introduced the use of NMT to monitor its usage.

Thirdly, as application developers, what tuning perspectives and methods are within our reach?

  • Adjusting the hot code threshold. I have previously introduced the default thresholds for JIT, which are 10,000 times for the server mode and 1,500 times for the client mode. The size of the threshold can be fine-tuned using the following parameter. In addition, this parameter can indirectly reduce the warm-up time.
-XX:CompileThreshold=N

You may wonder why, since it is a hot spot, it wouldn’t eventually reach the threshold. Well, that’s not necessarily true because the JVM periodically decays the counter value, resulting in the calling counter never reaching the threshold value. Apart from adjusting the size of the CompileThreshold, another way is to disable counter decay.

-XX:-UseCounterDecay

If you are using a debug version of JDK, you can experiment with the following parameter. However, this option is not supported in production versions.

-XX:CounterHalfLifeTime
  • Adjusting Code Cache Size

We know that the JIT-compiled code is stored in the Code Cache. It is important to note that the Code Cache has a size limit and does not dynamically adjust. This means that if the Code Cache is too small, only a small portion of the code can be JIT-compiled, while the remaining code can only be interpreted. Therefore, a potential tuning point is to adjust its size limit.

-XX:ReservedCodeCacheSize=<SIZE>

Of course, you can also adjust its initial size.

-XX:InitialCodeCacheSize=<SIZE>

Note that in relatively recent versions of Java, due to the presence of tiered compilation, the space requirements for Code Cache have significantly increased, and its default size has also been raised.

  • Adjusting the number of compiler threads or selecting the appropriate compiler mode

The number of compiler threads in the JVM depends on the mode chosen. The client mode has only one compiler thread by default, while the server mode has two. If the commonly used tiered compilation mode is selected, the values of C1 and C2 will be calculated based on the number of CPU cores. You can specify the number of compiler threads using the following parameter.

-XX:CICompilerCount=N

Increasing the number of compiler threads in a powerful multi-processor environment may better utilize CPU resources and make the warm-up process faster. However, it may also lead to excessive competition for resources among compiler threads, especially when the system is very busy. For example, when multiple Java application instances are deployed, reducing the number of compiler threads can be considered.

In production practice, some people also recommend disabling tiered compilation on servers and using the server compiler directly. Although this may result in a slightly slower warm-up speed, it may improve throughput on specific workloads.

  • Some other so-called “optimizations” that are relatively confusing

For example, reducing safepoint intervals. Strictly speaking, it occurs not only during dynamic compilation but also more frequently during GC phases. You can use the following options to diagnose the impact of safepoints.

-XX:+PrintSafepointStatistics -XX:+PrintGCApplicationStoppedTime

Note that starting from JDK 9, PrintGCApplicationStoppedTime has been removed, and you need to use something like “-Xlog:safepoint” to specify it.

Many optimization stages may be related to safepoints, such as:

  • During the JIT process, scenarios like deoptimization require the insertion of safepoints.
  • Common lock optimization stages may also occur. For example, the purpose of biased locking is to avoid synchronization overhead when there is no competition. However, when there is actual contention, revoking biased locks will trigger safepoints, which is a heavy operation. Therefore, the value of biased locks in concurrent scenarios is actually questioned, and it is often explicitly recommended to disable biased locking.
-XX:-UseBiasedLocking

These are the main optimization methods that ordinary Java developers can use. If you want to have a deeper understanding of JVM optimization methods, I recommend subscribing to Dr. Zheng Yudi’s column on JVM.

Practice Questions #

Have you grasped the topic we discussed today? Please think about a question: How can we programmatically verify whether the final keyword will affect performance?

Please write your thoughts on this question in the comments section. I will select a well-considered comment and reward you with a learning voucher. Welcome to discuss it with me.

Are your friends also preparing for interviews? You can “ask a friend to read” and share today’s topic with them. Maybe you can help them.