20 to Know Jvm Optimization, Start by Understanding Jvm's Memory Model

20 To know JVM optimization, start by understanding JVM’s memory model #

Hello, I’m Liu Chao.

Starting from today, I will explore the performance optimization of the Java Virtual Machine (JVM) with you. JVM is considered a frequently asked question in interviews, and usually someone will ask: Can you explain the JVM memory model and have you done JVM performance optimization before?

Why is JVM so important in Java? #

First of all, you should know that to run a Java application, we must install the JDK or JRE package. This is because Java applications are compiled into bytecode, which is then executed in the JVM. The JVM is a core component of the JRE.

The JVM not only handles the analysis (JIT compiler) and execution (Runtime) of Java bytecode, but also incorporates an automatic memory allocation management mechanism. This mechanism greatly reduces the risk of memory leaks and memory overflow that manual allocation and deallocation mechanisms may cause. As a result, Java developers do not need to worry about the memory allocation and deallocation of each object, allowing them to focus more on the business itself.

Starting with Understanding the Memory Model #

There are many benefits to the JVM’s automatic memory allocation management mechanism, but it is actually a double-edged sword. While this mechanism improves the efficiency of Java development, it also makes Java developers overly reliant on automation, weakening their ability to manage memory. This can easily lead to JVM heap memory exceptions, inappropriate garbage collection (GC) methods, and excessive frequency of GC, all of which directly affect the performance of application services.

Therefore, to carry out JVM-level optimization, it is necessary to deeply understand the principles of JVM memory allocation and garbage collection. This way, when problems occur, we can quickly locate them through log analysis. We can also optimize system performance by analyzing JVM optimization when the system encounters performance bottlenecks. This is also the focus of Module 4. Today, we will start with understanding the memory model of JVM, laying a solid foundation for subsequent learning.

Specific Design of JVM Memory Model #

Let’s familiarize ourselves with the specific design of the JVM memory model through a diagram. In Java, the JVM memory model mainly consists of the heap, program counter register, method area, virtual machine stack, and native method stack.

How are the 5 partitions of the JVM implemented? Let’s analyze them one by one.

1. Heap #

The heap is the largest memory space in the JVM and is shared by all threads. Almost all objects and arrays are allocated in the heap memory. The heap is divided into the young generation and the old generation. The young generation is further divided into the Eden space and the Survivor space, with the Survivor space consisting of the “From Survivor” and “To Survivor” spaces.

In Java 6, the permanent generation (PermGen) is located in the non-heap memory area. In Java 7, the static variables and runtime constant pool of the permanent generation are merged into the heap. In Java 8, the permanent generation is replaced by the metaspace. The structure is shown in the following diagram:

2. Program Counter Register #

The program counter register is a small memory space used to record the address of the bytecode being executed by each thread. Branching, looping, jumping, exceptions, and thread restoration all rely on the program counter.

Since Java is a multithreaded language, when the number of executing threads exceeds the number of CPUs, the threads will compete for CPU resources through time slicing. If a thread’s time slice is exhausted or its CPU resources are preempted for other reasons, the exiting thread needs a separate program counter register to record the next instruction to be executed.

3. Method Area #

Many developers are used to calling the method area the “permanent generation”, but they are not equivalent.

HotSpot JVM uses the permanent generation to implement the method area, but in other JVMs such as Oracle’s JRockit and IBM’s J9, the concept of the permanent generation does not exist. Therefore, the method area is just part of the JVM specification. It can be said that in the HotSpot JVM, the designers used the permanent generation to implement the method area specified in the JVM specification.

The method area is mainly used to store information about the classes that have been loaded by the JVM, including class information, runtime constant pool, and string constant pool. Class information includes version, fields, methods, interfaces, and the superclass, among other information. JVM must go through the stages of loading, linking, and initialization when executing a class, and linking includes the three stages of verification, preparation, and resolution. When loading a class, the JVM first loads the class file. In addition to the description information of the class version, fields, methods, and interfaces, the class file also contains a constant pool table, which is used to store various literals and symbolic references generated during compilation.

Literals include strings (String a = “b”) and constants of primitive types (variables modified by final), while symbolic references include fully qualified names of classes and methods (for example, the fully qualified name of the String class is Java/lang/String), names and descriptors of fields, and names and descriptors of methods.

After the class is loaded into memory, the JVM will place the contents of the constant pool in the runtime constant pool. During the resolution stage, the JVM replaces the symbolic references with direct references (object index values).

For example, when a string constant in a class is in the class file, it is stored in the constant pool of the class file. After the JVM loads the class, it will put this string constant in the runtime constant pool and, during the resolution stage, specify the index value of the string object. The runtime constant pool is globally shared, so multiple classes share one runtime constant pool, and multiple identical strings in the constant pool of the class file will only exist once in the runtime constant pool.

The method area is similar to the heap space and is a shared memory area, so the method area is thread-shared. If two threads both try to access the same class information in the method area and the class has not been loaded into the JVM yet, at this time, only one thread is allowed to load it, and the other thread must wait.

In the HotSpot virtual machine, in the Java 7 version, static variables in the permanent generation and the runtime constant pool have been moved to the heap. The remaining parts are stored in the non-heap memory of the JVM. In the Java 8 version, the implementation of the permanent generation in the method area has been removed, and it has been replaced by metaspace (class metadata), and the storage location of metaspace is native memory. The metadata of the class stored in the permanent generation of the permanent generation before is stored in metaspace, and the static variables of the permanent generation (class static variables) and the runtime constant pool (runtime constant pool) are moved to the heap as in Java 7.

You may have a question again: why did Java 8 replace the permanent generation with metaspace, and what are the benefits of doing so?

The official explanation is:

*  The removal of the permanent generation is an effort to integrate the HotSpot JVM and JRockit VM because JRockit does not have a permanent generation, so there is no need to configure a permanent generation.
*  The memory in the permanent generation is often insufficient or causes memory overflow, resulting in an exception java.lang.OutOfMemoryError: PermGen. This is because in the JDK 1.7 version, the specified size of the PermGen area is 8M, and the collection of the metadata information of the classes in PermGen may occur during each FullGC, with a low recycling rate, making it difficult to be satisfactory. In addition, it is difficult to determine how much space to allocate for PermGen. The size of PermSize depends on many factors, such as the total number of classes loaded by the JVM, the size of the constant pool, and the size of methods.

4. Virtual Machine Stack #

The Java Virtual Machine Stack is a thread-private memory space created together with a Java thread. When a thread is created, a thread stack is allocated in the virtual machine stack to store information such as local variables, operand stacks, dynamically linked methods, and return addresses of methods, and participate in method invocation and return. Each method call is accompanied by the push operation of a stack frame, and method returns are the pop operation of a stack frame.

5. Native Method Stack #

The functionality of the native method stack is similar to that of the Java virtual machine stack. The Java virtual machine stack is used to manage the invocation of Java functions, while the native method stack is used to manage the invocation of native methods. However, native methods are not implemented in Java, but in C.

How JVM works #

By now, I believe you have a good understanding of the JVM memory model. Next, let’s use a case study to understand how code and objects are allocated and how Java code runs in the JVM.

public class JVMCase {
 
    // Constant
    public final static String MAN_SEX_TYPE = "man";
 
    // Static variable
    public static String WOMAN_SEX_TYPE = "woman";
 
    public static void main(String[] args) {
        
        Student stu = new Student();
        stu.setName("nick");
        stu.setSexType(MAN_SEX_TYPE);
        stu.setAge(20);
        
        JVMCase jvmcase = new JVMCase();
        
        // Call static method
        print(stu);
        // Call non-static method
        jvmcase.sayHello(stu);
    }
 
 
    // Regular static method
    public static void print(Student stu) {
        System.out.println("name: " + stu.getName() + "; sex:" + stu.getSexType() + "; age:" + stu.getAge()); 
    }
 
 
    // Non-static method
    public void sayHello(Student stu) {
        System.out.println(stu.getName() + " say: hello"); 
    }
}
 
class Student{
    String name;
    String sexType;
    int age;
    
    public String getName() {
        return name;
    }
    public void setName(String name) {
        this.name = name;
    }
    
    public String getSexType() {
        return sexType;
    }
    public void setSexType(String sexType) {
        this.sexType = sexType;
    }
    public int getAge() {
        return age;
    }
    public void setAge(int age) {
        this.age = age;
    }
}

When we run the above code in Java, the entire process of JVM is as follows:

JVM requests memory from the operating system. The first step of JVM is to request memory space from the operating system through configuration parameters or default configuration parameters. Based on the memory size, it finds the specific memory allocation table and assigns the starting address and ending address of the memory segment to the JVM. Then the JVM performs internal allocation.
After obtaining memory space, JVM allocates memory size for the heap, stack, and method area according to the configuration parameters.
Class file loading, verification, preparation, and resolution. In the preparation phase, memory is allocated for static variables of the class and initialized to the system’s initial value (this part will be discussed in detail in the 21st lecture).

After completing the previous step, the final initialization phase will be performed. In this phase, JVM first executes the constructor method. When the .java file is compiled into .class file, the compiler collects all the initialization code of the class, including static variable assignment statements, static code blocks, and static methods, and puts them together into <clinit>() method.

Execute methods. Start the main thread and execute the main method, starting with the first line of code. At this time, a student object will be created in the heap memory, and the object reference student will be stored in the stack.

Then, create another JVMCase object, call the non-static method sayHello, which belongs to the JVMCase object. At this time, the sayHello method will be pushed onto the stack and called using the student reference in the stack to access the student object in the heap. Afterwards, the static method print is called, which belongs to the JVMCase class and is accessed from the static method. It is also pushed onto the stack and called using the student reference to access the student object in the heap.

By understanding the memory space allocated to the actual code and its execution process in the JVM, I believe you will have a clearer understanding of the responsibilities and division of labor in each area of the memory model.

Summary #

In this lecture, we mainly delved into the design of the most basic memory model, understanding the roles and implementation principles of each partition.

Today, the JVM to a large extent alleviates the effort Java developers need to put into managing object lifecycles. When using objects, the JVM automatically allocates memory to objects, and when they are no longer in use, the garbage collector automatically recycles the objects and frees up the occupied memory.

However, in certain situations, the normal lifecycle is not the optimal choice, and some objects created in the default way by the JVM have high creation costs. For example, as I mentioned in [Lecture 03], the String object can greatly save memory costs in specific scenarios by using String.intern. We can change the normal lifecycle of an object by using different reference types, thereby improving the efficiency of JVM garbage collection. This is also a way to optimize JVM performance.

Thought Question #

In this lecture, I only mentioned the process of allocating memory space for objects in the heap memory. However, if a class defines String a = “b” and String c = new String(“b”), where will these two objects be created in the JVM memory model respectively?