21 Dive Into Jvm's Just in Time Compiler Jit Optimization of Java Compilation

21 Dive into JVM’s Just-In-Time compiler JIT optimization of Java compilation #

Hello, I’m Liu Chao.

When it comes to compilation, I guess you would immediately think of the process where .java files are compiled into .class files. This compilation is commonly referred to as the front-end compilation. The compilation and execution processes of Java are quite complex. In addition to the front-end compilation, there is also runtime compilation. Since machines cannot directly execute the bytecode generated by Java, the JIT or interpreter will convert the bytecode into machine code at runtime. This process is called runtime compilation.

Class files are further compiled at runtime, and they can be transformed into highly optimized machine code. Unlike C/C++ compilers that complete all optimizations during compilation, runtime performance monitoring serves as basic optimization measures in Java. For example, call frequency prediction, branch frequency prediction, pruning of unselected branches, etc. Therefore, the recompilation at runtime in Java can perform basic optimization measures. Thus, the JIT compiler can be considered one of the most important parts of runtime compilation in the JVM.

However, many Java developers have limited knowledge about the JIT compiler. They do not delve deep into its working principles nor investigate how to detect the situation of just-in-time compilation in their applications. It becomes difficult to handle issues calmly when they occur online. Today, we will learn how to optimize Java code through runtime compilation.

Class Compilation and Execution Process #

Before we delve into the topic, let’s first understand the entire process from compilation to execution in Java, which lays a foundation for our further learning. Please take a look at the diagram below:

img

Class Compilation #

After writing the code, we need to compile the .java files into .class files in order to run the code on the virtual machine. The compilation of files is usually done by the Javac tool that comes with the JDK. For a simple .java file, we can use the javac command to generate the corresponding .class file.

Now let’s decompile a class file using the javap command (as mentioned in [Lesson 12]) to see what information is mainly included:

img

Although the execution of a seemingly simple command is actually a very complex process, we don’t need to focus too much on the details. It is enough to know from the above diagram that the compiled bytecode file mainly consists of the constant pool and the method table.

The constant pool mainly records the literals and symbolic references appearing in the class file. Literal constants include string constants (e.g., String str = “abc”, where “abc” is a constant), attributes declared as final, and some basic types (e.g., integer types within the range -127-128). Symbolic references include the fully qualified names of classes and interfaces, class references, method references, and member variable references (e.g., String str = “abc”, where str is a member variable reference), etc.

The method table collection mainly includes the bytecode of the methods, method access permissions (public, protect, private, etc.), method name index (corresponding to the method reference in the constant pool), descriptor index, JVM execution instructions, and attribute collection.

Class Loading #

When a class is instantiated or referenced by other objects, the virtual machine loads the bytecode file into memory if the class has not been loaded before.

Different implementation classes are loaded by different class loaders. The native method classes in the JDK are generally loaded by the Bootstrap loader. The internally implemented extension classes in the JDK are generally loaded by the ExtClassLoader. The class files in the program are loaded by the System ClassLoader.

After loading the class, the constant pool information and other data from the class file are saved to the method area in the JVM memory.

Class Linking #

After a class is loaded, it undergoes a linking and initialization process before it can be used. The linking process includes three parts: verification, preparation, and resolution.

Verification: Verify that the class complies with the Java specifications and JVM specifications. While ensuring compliance with the specifications, avoid jeopardizing the security of the virtual machine.

Preparation: Allocate memory for the static variables of the class and initialize them to the system’s initial values. For variables modified with final and static, assign them directly with the user-defined values. For example, “private final static int value = 123” will allocate memory and initialize the value to 123 during the preparation phase, while “private static int value = 123” will still have the value of 0 at this stage.

Resolution: The process of converting symbolic references to direct references. As we know, during compilation, a Java class does not know the actual address of the class it refers to. Therefore, symbolic references are used to replace them. The constant pool in the class structure file stores symbolic references, including fully qualified names of classes and interfaces, class references, method references, and member variable references, etc. To use these classes and methods, they need to be converted into memory addresses or pointers that the JVM can directly access, i.e. direct references.

Class Initialization #

The initialization phase is the final stage of the class loading process. In this phase, the JVM first executes the constructor method. The compiler collects all the class initialization code, including static variable assignment statements, static code blocks, and static methods, and puts them together into the () method.

Static variables and static code blocks in the class are initialized with user-defined values. The order of initialization is the same as the order of the Java source code from top to bottom. For example:

private static int i = 1;
static{
  i = 0;
}
public static void main(String [] args){
  System.out.println(i);
}

The result of running this code is:

0

Now let’s look at the following code:

static{
  i = 0;
}
private static int i = 1;
public static void main(String [] args){
  System.out.println(i);
}

The result of running this code is:

1

When a subclass is initialized, the () method of the superclass is called first, followed by the () method of the subclass. Let’s run the following code:

public class Parent{
  public static String parentStr = "parent static string";
  static{
    System.out.println("parent static fields");
    System.out.println(parentStr);
  }
  public Parent(){
    System.out.println("parent instance initialization");
  }
}
 
public class Sub extends Parent{
  public static String subStr = "sub static string";
  static{
    System.out.println("sub static fields");
    System.out.println(subStr);
  }
 
  public Sub(){
    System.out.println("sub instance initialization");
  }
 
  public static void main(String[] args){
    System.out.println("sub main");
    new Sub();
  }
}

The result is:

parent static fields
parent static string
sub static fields
sub static string
sub main
parent instance initialization
sub instance initialization

The JVM ensures the thread safety of the () method, ensuring that only one thread executes it at a time.

When executing the initialization code, if a new object is instantiated, the instance variables are initialized by the () method, and the code within the corresponding constructor is executed.

Just-In-Time Compilation #

After initialization, when a class is called for execution, the execution engine will convert the bytecode into machine code, which can then be executed in the operating system. During the process of converting bytecode into machine code, there is another compilation called just-in-time compilation in the virtual machine.

Initially, the bytecode in the virtual machine is compiled by the interpreter. When the virtual machine finds that certain methods or code blocks are executed frequently, it identifies these codes as “hotspot code”.

In order to improve the execution efficiency of hotspot code, the just-in-time compiler (JIT) will compile these codes into machine code specific to the local platform at runtime and perform various levels of optimization, and then save them in memory.

Types of Just-In-Time Compilers #

In the HotSpot virtual machine, there are two built-in JITs: the C1 compiler and the C2 compiler. These two compilers have different compilation processes.

The C1 compiler is a simple and fast compiler that focuses on local optimization. It is suitable for programs with short execution times or requiring startup performance, such as GUI applications that have certain requirements for interface startup speed.

The C2 compiler is a compiler for performance tuning of long-running server applications. It is suitable for programs with long execution times or requiring peak performance. Based on their respective adaptability, these two types of just-in-time compilers are also referred to as the Client Compiler and Server Compiler.

Before Java 7, it was necessary to choose the corresponding JIT based on the characteristics of the program. The virtual machine defaults to working with the interpreter and one of the compilers.

In Java 7, tiered compilation was introduced, which combines the startup performance advantage of C1 and the peak performance advantage of C2. We can also specify the just-in-time compilation mode of the virtual machine by using the parameters “-client” or “-server”. Tiered compilation divides the execution status of the JVM into 5 levels:

  • Level 0: Program interpreted execution. Performance monitoring (profiling) is enabled by default. If not enabled, it triggers the second level of compilation.
  • Level 1: Also known as C1 compilation, it compiles bytecode into native code, performs simple and reliable optimizations, and does not enable profiling.
  • Level 2: Also known as C1 compilation, profiling is enabled, and only C1 compilation with profiling of method invocation counts and loop back edge execution counts is performed.
  • Level 3: Also known as C1 compilation, it executes all C1 compilations with profiling.
  • Level 4: Also known as C2 compilation, bytecode is compiled into native code, but some time-consuming optimizations are enabled, and even some unreliable aggressive optimizations are performed based on performance monitoring information.

In Java 8, tiered compilation is enabled by default, and the settings “-client” and “-server” are no longer effective. If you only want to enable C2, you can turn off tiered compilation ("-XX:-TieredCompilation"). If you only want to use C1, you can open tiered compilation and use the parameter “-XX:TieredStopAtLevel=1”. In addition to this default mixed compilation mode, we can also use the “-Xint” option to force the virtual machine to run in an interpreter-only compilation mode, where the JIT is completely excluded; we can also use the “-Xcomp” option to force the virtual machine to run in a JIT-only compilation mode.

You can directly view the current compilation mode used by the system by using the java -version command, as shown in the figure below:

img

HotSpot Compilation Detection #

HotSpot’s JIT optimization is based on hotspot detection, which is based on counter-based hotspot detection. A virtual machine that uses this method will establish counters for each method to count the number of times the method is executed. If the number of executions exceeds a certain threshold, it is considered a “hotspot method”.

The virtual machine prepares two types of counters for each method: invocation counters and back edge counters. Given the determined virtual machine runtime parameters, both counters have a fixed threshold. When a counter exceeds its threshold, JIT compilation is triggered.

Invocation Counter: Used to count the number of times a method is invoked. The default threshold for the invocation counter is 1500 times in C1 mode and 10000 times in C2 mode. This can be adjusted using -XX:CompileThreshold. In the case of tiered compilation, the specified threshold using -XX:CompileThreshold will be ignored, and it will be dynamically adjusted based on the number of methods waiting to be compiled and the number of compilation threads. When the sum of the method invocation counter and the back edge counter exceeds the threshold of the invocation counter, JIT compilation will be triggered.

Back Edge Counter: Used to count the number of times a loop body code is executed. In bytecode, a backward jump instruction in the control flow is called a “back edge”. This value is used to calculate the threshold for triggering C1 compilation. In non-tiered compilation, the default thresholds are 13995 for C1 and 10700 for C2. This can be adjusted using -XX:OnStackReplacePercentage. In tiered compilation, the specified threshold using -XX:OnStackReplacePercentage will also be ignored, and it will be dynamically adjusted based on the number of methods waiting to be compiled and the number of compilation threads.

The main purpose of the back edge counter is to trigger On Stack Replacement (OSR) compilation. In code segments with long loop cycles, when the loop reaches the back edge counter threshold, the JVM considers this segment as hot code, and the JIT compiler will compile this segment into machine code and cache it. During this loop period, the execution code will be directly replaced with the cached machine code.

Compilation Optimization Techniques #

JIT compilation applies some classical compilation optimization techniques to optimize code by intelligently compiling the most optimal performance code at runtime. Today, we will focus on the following two optimization techniques:

1. Method Inlining

Calling a method usually involves pushing and popping the stack. Invoking a method means transferring the program execution sequence to the memory address where the method is stored. After the method’s content is executed, the execution will resume from the original saved address.

This type of execution operation requires saving the context before execution and remembering the execution address. After the execution, the context needs to be restored, and the execution needs to continue from the originally saved address. Therefore, method invocation incurs certain time and space overhead. So for methods whose method body code is not large and are frequently called, the time and space consumption will be significant. The optimization behavior of method inlining is to copy the code of the target method into the calling method to avoid actual method calls.

For example, the following methods:

private int add1(int x1, int x2, int x3, int x4) {
    return add2(x1, x2) + add2(x3, x4);
}
private int add2(int x1, int x2) {
    return x1 + x2;
}

Will ultimately be optimized as:

private int add1(int x1, int x2, int x3, int x4) {
    return x1 + x2 + x3 + x4;
}

The JVM automatically identifies hot methods and optimizes them using method inlining. We can set the threshold for hot methods using -XX:CompileThreshold. But it should be emphasized that hot methods may not be inlined by the JVM if the method body is too large. We can optimize the size threshold of the method body by setting a parameter:

  • For frequently executed methods, by default, methods with a body size less than 325 bytes will be inlined. We can set the size value using -XX:MaxFreqInlineSize=N.
  • For methods that are not frequently executed, by default, methods with a size less than 35 bytes will be inlined. We can reset the size value using -XX:MaxInlineSize=N.

Afterwards, we can use JVM parameters to view the inlining of methods:

-XX:+PrintCompilation // Print compilation information to the console
-XX:+UnlockDiagnosticVMOptions // Unlock diagnostic options for the JVM. It is disabled by default, and when enabled, it supports diagnosing the JVM using some specific parameters.
-XX:+PrintInlining // Print inlined methods

When we set the VM parameters -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining and run the following code:

public static void main(String[] args) {
    for(int i=0; i<1000000; i++) { // The default method invocation counter threshold is 1500 times in C1 mode and 10000 times in C2 mode. Here, we iterate more than the required threshold.
        add1(1,2,3,4);
    }
}

We can see in the output that the logs of method inlining are displayed:

img

Optimizing hot methods can effectively improve system performance. Generally, we can improve method inlining in the following ways:

  • Set JVM parameters to reduce the hot method threshold or increase the method body threshold so that more methods can be inlined. However, this method means occupying more memory.
  • In programming, avoid writing a large amount of code in a single method and use smaller method bodies.
  • Use final, private, and static keywords to modify methods as much as possible, as coding methods require additional type checks due to inheritance. \2. Escape Analysis

Escape Analysis is a technique used to determine if an object is referenced by external methods or accessed by external threads. The compiler will optimize the code based on the results of escape analysis.

Stack Allocation

By default, objects in Java are allocated in the heap, and when objects in the heap are no longer in use, they need to be reclaimed by the garbage collection mechanism. This process is more time-consuming and performance-intensive compared to the creation and destruction of objects allocated on the stack. In this case, if escape analysis finds that an object is used only within a method, it will allocate the object on the stack.

Let’s take an example of looping through student ages. In the method, a student object is created. Now, let’s see the comparison of the number of objects created in the heap when escape analysis is enabled or disabled.

public static void main(String[] args) {
    for (int i = 0; i < 200000 ; i++) {
        getAge();
    }
}

public static int getAge(){
    Student person = new Student("小明",18,30);   
    return person.getAge();
}

static class Student {
    private String name;
    private int age;

    public Student(String name, int age) {
        this.name = name;
        this.age = age;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public int getAge() {
        return age;
    }

    public void setAge(int age) {
        this.age = age;
    }
}

Then, we can set the VM parameters as follows: -Xmx1000m -Xms1000m -XX:-DoEscapeAnalysis -XX:+PrintGC and -Xmx1000m -Xms1000m -XX:+DoEscapeAnalysis -XX:+PrintGC. By using the VisualVM tool mentioned previously, we can check the number of objects created in the heap.

However, the results did not achieve the optimization effect we expected. You may suspect that it is due to the JDK version, but I have tested it on versions 1.6 to 1.8, and the results are the same:

(-server -Xmx1000m -Xms1000m -XX:-DoEscapeAnalysis -XX:+PrintGC)

img

(-server -Xmx1000m -Xms1000m -XX:+DoEscapeAnalysis -XX:+PrintGC)

img

This is actually because the implementation of stack allocation is quite complex in HotSpot virtual machine, and this optimization has not been implemented yet. With the development of just-in-time compilers and the maturation of escape analysis technology, it is believed that HotSpot will also implement this optimization in the near future.

Lock Elimination

In non-thread-safe situations, it is best not to use thread-safe containers such as StringBuffer. Since the append method in StringBuffer is synchronized, it uses locks, which can lead to performance degradation.

However, in the following code test, there is basically no difference in performance between StringBuffer and StringBuilder. This is because objects created in a local method can only be accessed by the current thread and cannot be accessed by other threads. This variable will not have any contention in terms of read and write operations, so the JIT compiler will eliminate the lock for this object’s methods.

public static String getString(String s1, String s2) {
    StringBuffer sb = new StringBuffer();
    sb.append(s1);
    sb.append(s2);
    return sb.toString();
}

Scalar Replacement

Escape analysis proves that an object will not be accessed externally. If this object can be split, when the program is executed, this object may not be created, and its member variables will be created directly instead. After splitting the object, the member variables can be allocated on the stack or in registers, eliminating the need to allocate memory space for the original object. This compilation optimization is called scalar replacement.

Let’s validate this with the following code:

public void foo() {
    TestInfo info = new TestInfo();
    info.id = 1;
    info.count = 99;
    ...//to do something
}

After escape analysis, the code will be optimized as:

public void foo() {
    id = 1;
    count = 99;
    ...//to do something
}

We can enable or disable escape analysis, lock elimination, and scalar replacement separately by setting JVM parameters. In JDK 1.8, these operations are enabled by default.

  • -XX:+DoEscapeAnalysis enables escape analysis (enabled by default in JDK 1.8, not tested in other versions)

  • -XX:-DoEscapeAnalysis disables escape analysis

  • -XX:+EliminateLocks enables lock elimination (enabled by default in JDK 1.8, not tested in other versions)

  • -XX:-EliminateLocks disables lock elimination

  • -XX:+EliminateAllocations enables scalar replacement (enabled by default in JDK 1.8, not tested in other versions)

  • -XX:-EliminateAllocations disables scalar replacement

Summary #

Today we mainly learned about the compilation and loading process of JKD1.8 and earlier classes. Java source programs are compiled into .class files by the Javac compiler, and the code format contained in the files is called Java bytecode.

This code format cannot be directly executed, but it can be interpreted and executed by the interpreter in different platform JVMs. Due to the low efficiency of the interpreter, the JIT in the JVM selectively compiles methods with high runtime frequency into binary code, which runs directly on the underlying hardware.

Before Java 8, HotSpot integrated two JITs, C1 and C2, to complete the just-in-time compilation in the JVM. Although JIT optimizes the code, collecting monitoring information consumes runtime performance, and the compilation process takes up the program’s execution time.

In Java 9, the AOT compiler was introduced. Unlike JIT, AOT performs static compilation before program execution, which avoids runtime compilation and memory consumption, and .class files can be compiled into binary .so files by the AOT compiler.

In Java 10, a new JIT compiler called Graal was introduced. Graal is a compiler predominantly written in Java and it is targeted at Java bytecode. Compared to C1 and C2 implemented in C++, Graal is more modular and easier to maintain. Graal can be used as both a dynamic compiler to compile hot methods at runtime, and as a static compiler to achieve AOT compilation.

Reflection Question #

We know that both Class.forName and ClassLoader.loadClass can load classes. Do you know the difference between these two when loading classes?