23 Please Introduce the Class Loading Process and What Is the Double Parent Delegation Model

类加载过程指的是将类的字节码文件加载到内存中,并进行解析、验证、准备和初始化的过程。

Java中的类加载过程主要包括以下几个步骤:

  1. 加载:查找字节码文件并将其加载到内存中。可以从本地文件系统、网络等位置加载字节码文件。

  2. 验证:对加载的字节码文件进行验证,确保其符合Java语言规范和安全要求。验证过程包括文件格式验证、字节码验证、符号引用验证和访问权限验证等。

  3. 准备:为类的静态变量分配内存,并设置初始值。静态变量所使用的内存被分配在方法区中。

  4. 解析:将类的二进制符号引用转换为直接引用的过程。符号引用包括类、方法、字段等在内存中的存储方式。解析可以简单理解为将符号引用转换为对应的内存地址。

  5. 初始化:对类进行初始化,包括执行类的静态初始化器和静态初始化块。静态初始化器由编译器自动生成,负责将类的静态变量赋初值和执行静态初始化块。

双亲委派模型是Java类加载过程中的一种机制。按照双亲委派模型,当一个类加载器接收到加载类的请求时,会先委派给其父类加载器进行加载。如果父类加载器能够加载成功,则直接返回;否则,将请求传递给自己进行加载。通过这种层层委派的方式,保证了类的加载具有优先级和层次关系。

双亲委派模型的好处是可以有效地避免类的重复加载,确保了类的统一性和安全性。在双亲委派模型中,引导类加载器(Bootstrap Class Loader)位于加载器层次的最顶端,负责加载Java核心类库,如java.lang.Object等。而应用类加载器(Application Class Loader)位于最底端,负责加载应用程序的类和资源。

双亲委派模型还可以自定义类加载器,继承自ClassLoader基类,并重写其中的findClass方法,从而实现对特定类的加载。这种方式被广泛应用于Java的插件机制、热部署等场景。

Typical Answer #

Generally speaking, we divide the class loading process of Java into three main steps: loading, linking, and initialization. The specific behavior is defined very detailed in the Java Virtual Machine Specification.

First is the loading phase, in which Java reads bytecode data from different data sources and maps it into JVM-recognized data structures (Class objects). The data sources can be various forms, such as jar files, class files, or even network data sources. If the input data does not conform to the structure of ClassFile, a ClassFormatError will be thrown.

The loading phase is the stage where users can participate, and we can customize class loaders to implement our own class loading process.

The second phase is linking. This is the core step where the original class definition information is smoothly transformed into the process running in the JVM. This can be further divided into three steps:

  • Verification: This is an important guarantee for the security of the virtual machine. The JVM needs to verify that the byte information complies with the Java Virtual Machine Specification. Otherwise, a VerifyError will occur. This prevents malicious or non-compliant information from harming the JVM. The verification phase may trigger the loading of more classes.
  • Preparation: This step creates static variables in classes or interfaces and initializes the initial values ​​for static variables. However, the “initialization” here is different from the explicit initialization stage below. The focus is on allocating the required memory space, and no further JVM instructions will be executed.
  • Resolution: In this step, symbolic references in the constant pool are replaced with direct references. In the Java Virtual Machine Specification, it provides detailed explanations of resolution for various aspects such as classes, interfaces, methods, and fields.

Finally, there is the initialization phase, where the actual class initialization code logic is executed, including the assignment of static fields and the execution of logic inside the static initialization block in the class definition. The compiler will organize this logic during the compilation phase. The initialization logic of the parent type takes precedence over the current type’s logic.

Now let’s talk about the delegation model known as the “parent delegation”. In short, when a class loader attempts to load a certain type, it delegates this task to its parent loader if the parent loader can find the corresponding type. The purpose of using the delegation model is to avoid redundant loading of Java types.

Analysis of Key Points #

Today’s question is about the basic concept of JVM class loading. The answer I provided earlier referred to the main clauses in the Java Virtual Machine Specification. If you encounter this question in an interview, you can also give examples based on this answer.

Let’s look at a classic follow-up question. If the preparation stage involves static variables, what are the differences between constants and different static variables?

It should be noted that no one can understand and remember all information accurately. If you encounter such a question and you have a direct answer, that would be best. If not, you can explain your own thoughts.

We define a type like this, which provides ordinary static variables, static constants, and considers the possibility of differences between primitive types and reference types:

public class CLPreparation {
  public static int a = 100;
  public static final int INT_CONSTANT = 1000;
  public static final Integer INTEGER_CONSTANT = Integer.valueOf(10000);
}

Compile and decompile it:

Javac CLPreparation.java
Javap –v CLPreparation.class

You can see the following additional initialization logic in the bytecode:

       0: bipush      100
       2: putstatic   #2                  // Field a:I
       5: sipush      10000
       8: invokestatic  #3                // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
      11: putstatic   #4                  // Field INTEGER_CONSTANT:Ljava/lang/Integer;

This allows us to see more clearly that ordinary primitive type static variables and reference types (even constants) require additional JVM instructions such as putstatic. These instructions are executed during the explicit initialization stage, not during the preparation stage. On the other hand, primitive type constants do not require such steps.

There are many excellent materials that introduce more details about the class loading process. You can refer to the famous book “Understanding the Java Virtual Machine”, which is a very good introductory book. My suggestion is not only to read tutorials, but also to come up with code examples to verify your understanding and judgment of a certain aspect. This not only deepens your understanding, but also allows you to use it in future application development.

In fact, the scope of class loading mechanisms is very large. From different perspectives of development and deployment, I have selected a typical extension question for your reference:

  • If you want to truly understand the delegation model of class loaders, you need to understand the architecture and responsibilities of class loaders in Java, at least you need to know which built-in class loaders exist, which I did not mention in my previous answer, and how to customize class loaders.
  • From an application perspective, solving certain class loading issues, such as slow startup of my Java program, is there a way to minimize the overhead of Java class loading as much as possible?

In addition, it should be noted that in Java 9, the Jigsaw project provides native modular support for Java, and the built-in class loader structure and mechanism have undergone significant changes. I will explain this to avoid possible issues that may occur in future upgrades.

Knowledge Expansion #

First of all, let’s take a look at the structure of various class loaders before Java 8 from an architectural perspective. The following are three built-in class loaders in Oracle JDK.

  • Bootstrap Class Loader: It loads the jar files under jre/lib, such as rt.jar. It is a super citizen, and even when the Security Manager is enabled, the JDK still grants it the AllPermission to load programs.

For engineers who are engaged in low-level development, sometimes it may be necessary to attempt to modify the underlying code of the JDK, which is commonly referred to as the core class library. We can use the following command line parameters.

# Specify a new bootclasspath to replace the internal implementation of java.* packages
java -Xbootclasspath:<your_boot_classpath> your_App

# 'a' means append, adds the specified directory to the bootclasspath at the end
java -Xbootclasspath/a:<your_dir> your_App

# 'p' means prepend, adds the specified directory to the bootclasspath at the beginning
java -Xbootclasspath/p:<your_dir> your_App

The usage is quite straightforward. For example, using the most common /p option, since it is a prefix, there is an opportunity to replace the implementation of individual base classes.

We usually can use the following method to get the parent class loader, but in the typical JDK/JRE implementation, the getParent() method of the Extension Class Loader can only return null.

public final ClassLoader getParent()
  • Extension Class Loader: It is responsible for loading the jar packages placed in the jre/lib/ext/ directory, which is the so-called extension mechanism. This directory can also be overridden by setting the system property java.ext.dirs.
java -Djava.ext.dirs=your_ext_dir HelloWorld
  • Application Class Loader: It loads the contents of the classpath that we are most familiar with. There is a confusing concept here, the System Class Loader. Usually, it defaults to the built-in application class loader in the JDK. However, it can also be modified, such as:
java -Djava.system.class.loader=com.yourcorp.YourClassLoader HelloWorld

If we specify this parameter, the built-in application class loader of the JDK will become the parent of the custom loader. This approach is often used in scenarios that require changing the delegation model of class loaders.

Please refer to the following diagram for more details:

Class Loaders Diagram

As for the previously mentioned delegation model, it is easier to understand with this diagram. Imagine if different class loaders all load a certain type they need on their own, multiple redundant loading would occur, which would be a waste.

Typically, class loading mechanisms have three basic features:

  • Delegation model: Not all class loaders follow this model. Sometimes, the types loaded by the bootstrap class loader may need to load user code, such as the ServiceProvider/ServiceLoader mechanism inside the JDK. Users can provide their own implementation on top of the standard API framework, and the JDK also needs to provide some default reference implementations. For example, many aspects of JNDI, JDBC, file system, Cipher, etc., in Java utilize this mechanism. In these cases, the delegation model is not used for loading but rather a context loader is used.

  • Visibility: Child class loaders can access types loaded by parent loaders, but not vice versa. Otherwise, due to the lack of necessary isolation, we will not be able to use class loaders to implement container logic.

  • Uniqueness: Since types loaded by parent loaders are visible to child loaders, the same type will not be loaded again in the child loaders. However, it is important to note that neighbors among class loaders can still load the same type multiple times because they are not visible to each other.

In JDK 9, due to the introduction of the Java Platform Module System (JPMS) by the Jigsaw project, the source code of Java SE has been divided into a series of modules.

JPMS Modules Diagram

Class loaders, class file containers, etc. have undergone significant changes. Here is a summary:

  • The previously mentioned -Xbootclasspath parameter is no longer available. The API has been divided into specific modules. Therefore, replacing the code of a certain Java core type using -Xbootclasspath/p has actually turned into patching the corresponding module. You can use the following solution:
java --patch-module <module-name>=<path-to-patched-module> your_App

For example, to replace the java.base/java.lang.String class, you can use the following command:

java --patch-module java.base=<path-to-patched-jar> your_App

This way, when the JDK searches for the java.lang.String class, it will find the patched version before the original version in the java.base module. First, confirm that the class files you want to modify have been compiled and stored in the corresponding module structure (assuming it is java.base). Then, apply a patch to the module:

java --patch-module java.base=your_patch yourApp
  • The Extension Class Loader has been renamed to the Platform Class Loader, and the extension mechanism has been removed. This means that if we specify the java.ext.dirs environment variable or if the lib/ext directory exists, the JVM will return an error! The recommended solution is to include it in the classpath.
  • Some Java base modules that do not require AllPermission have been downgraded to the Platform Class Loader, and their corresponding permissions have been more finely restricted.
  • rt.jar and tools.jar have also been removed! The core Java class library and related resources are stored in jimage files and accessed through the new JRT file system, instead of the original JAR file system. Although this may seem surprising, the compatibility impact on most software is actually limited. The most direct impact is on IDEs and other software, which usually only require upgrading to a new version.
  • The abstraction of Layer has been added. The JVM now starts with the BootLayer by default, and developers can define and instantiate their own layers, making it easier to implement logic abstractions similar to containers.

With the addition of Layer, the current internal structure of the JVM is layered as follows: the built-in class loaders are all in the BootLayer, and other layers have their own custom class loaders. Different versions of modules can work in different layers at the same time.

JVM Layer

When it comes to class loaders, one topic that cannot be avoided is custom class loaders. Common scenarios include:

  • Implementing process isolation, where class loaders are actually used as different namespaces to provide container-like, modular effects. For example, two modules depend on different versions of a library, and if they are loaded by different containers, they can coexist without interference. The masters in this aspect are Java EE, OSGI, JPMS, and other frameworks.
  • Applications need to retrieve class definition information from different data sources, such as network data sources rather than the local file system.
  • Or there is a need to manipulate bytecode and dynamically modify or generate types.

We can generally understand the process of custom class loading as follows:

  • By specifying the name, find its binary implementation. This is often the part that custom class loaders will “customize”, such as getting bytecode based on name from a specific data source, or modifying or generating bytecode.
  • Then create a Class object and complete the class loading process. The conversion from binary information to a Class object usually depends on defineClass, which we don’t need to implement ourselves because it is a final method. With the Class object, the subsequent loading process proceeds smoothly.

I recommend referring to this example for specific implementations.

In Section 1 of the series, I mentioned that because bytecode is a platform-independent abstraction, not machine code, Java needs class loading, interpretation, and compilation, all of which contribute to slower startup. After talking so much about class loading, is there a general way to reduce the overhead of class loading without writing code or incurring additional workload?

Yes, there is.

  • The Ahead-of-Time Compilation (AOT), mentioned in Section 1, is equivalent to directly compiling to machine code, mainly reducing the overhead of interpretation and compilation. But it is currently an experimental feature with limited platform support. For example, JDK 9 only supports Linux x64. Therefore, its limitations are too great to discuss for now.
  • There is also the lesser-known Application Class-Data Sharing (AppCDS). CDS was introduced in Java 5 but was limited to the Bootstrap Class Loader. In 8u40, AppCDS was implemented to support other class loaders. It has been open-sourced in JDK 10 released in early 2018.

In simple terms, the basic principle and working process of AppCDS are as follows:

First, the JVM loads class information, resolves it into metadata, and categorizes it into Read-Only and Read-Write parts based on whether modification is required. Then, these metadata are directly stored in the file system as a so-called Shared Archive. The command is straightforward:

java -Xshare:dump -XX:+UseAppCDS -XX:SharedArchiveFile=<jsa> \
     -XX:SharedClassListFile=<classlist> -XX:SharedArchiveConfigFile=<config_file>

Second, when the application starts, specify the archive file and enable AppCDS.

java -Xshare:on -XX:+UseAppCDS -XX:SharedArchiveFile=<jsa> yourApp

With these commands, the JVM directly maps the archive file to the corresponding address space using memory-mapping techniques, bypassing various overheads such as class loading and resolution.

AppCDS significantly improves startup speed. Traditional Java EE applications typically see improvements of more than 20% to 30%; in experiments using the Spark KMeans load with 20 slaves, a startup speed improvement of 11% was achieved.

At the same time, it reduces memory footprint, as Java processes in the same environment can share some data structures. The two experiments mentioned earlier can reduce memory consumption by an average of more than 10%.

Of course, there are limitations. If a large number of runtime dynamic class loading is used, its usefulness is limited.

Today, I have summarized the class loading process and the changes in the class loading mechanism in the newer versions of Java. Finally, I introduced a feature that improves class loading speed, hoping it will be helpful to you.

Practice Exercise #

Do you have a clear understanding of the topic we discussed today? Today’s reflection question is: What is the Jar Hell problem? Have you encountered a similar situation before, and if so, how did you solve it?

Please share your thoughts on this question in the comments section. I will select the most thoughtful comments and send you a learning reward voucher. I welcome your participation in the discussion.

Are your friends also preparing for interviews? You can “invite friends to read” and share today’s topic with them. Perhaps you can help them.