07 Java Memory Model the Sea Doesn't Reject Water, Thus It Can Reach Its Depth

07 Java Memory Model- The Sea Doesn’t Reject Water, Thus It Can Reach Its Depth #

Students who are familiar with computer history should know that there was no concept of memory when computers were first invented, and the speed was so slow that it was unbearable. It wasn’t until John von Neumann proposed a genius design that this problem was solved. Yes, this design added memory, so modern electronic computers are also called “Von Neumann machines”.

JVM is a complete computer model, so naturally it needs a corresponding memory model, which is called the “Java Memory Model” in English, abbreviated as JMM.

The Java Memory Model specifies how the JVM should use computer memory (RAM). Broadly speaking, the Java Memory Model can be divided into two parts:

JVM memory structure
JMM and thread specifications

Among them, the JVM memory structure is the underlying implementation and the foundation of our understanding of JMM. The well-known division of runtime data areas such as heap memory and stack memory can be attributed to the JVM memory structure.

Just like many books on JVM start by explaining how to compile JVM, the introduction to JMM immediately introduces the synchronization mechanism of CPU registers. Although it looks high-end and mysterious, it is difficult for people to understand.

So in this lesson, we will start with the basics, avoiding some too low-level and technical terms, to learn about the basic JVM memory structure. After understanding these basic knowledge points, we can then learn about JMM and thread-related knowledge.

6.1 JVM Memory Structure #

Let’s first take a look at the overall concept of JVM memory:

The Java memory model used internally by the JVM logically divides the memory into two parts: thread stack and heap memory. As shown in the following figure:

In the JVM, each running thread has its own thread stack. The thread stack contains the status information of all methods in the current method chain/call chain that is being executed.

Therefore, the thread stack is also called the “method stack” or “call stack”. The information on the call stack keeps changing as the thread executes code.

The thread stack stores the local variables of all methods currently being executed on the call chain.

Each thread can only access its own thread stack.
Each thread cannot access (see) the local variables of other threads.

Even if two threads are executing exactly the same code, each thread will create a copy of the local variables declared in the corresponding code on its own thread stack. So each thread has its own copy of the local variables.

All local variables of primitive types are stored on the thread stack and are therefore not visible to other threads.
A thread can pass a copy of a value of a primitive variable to another thread, but it cannot share the original primitive local variable itself.
The heap memory contains all objects created in Java code, regardless of which thread created them. This also includes wrapper types (such as Byte, Integer, Long, etc.).
Whether an object is created and assigned to a local variable or assigned to a member variable of another object, the created object will be stored in the heap memory.

The following diagram illustrates the call stack and local variables on the thread stack, as well as the objects stored in the heap memory:

If it is a local variable of a primitive data type, then its contents are all kept on the thread stack.
If it is an object reference, then the local variable slot in the stack contains the reference address of the object, and the actual object content is stored in the heap.
Object member variables are stored in the heap along with the object itself, whether the member variable is of primitive numeric type or object reference type.
Static variables of a class are stored in the heap just like the class definition. Summary: Primitive data types and object reference addresses are stored on the stack; objects, object members, class definitions, and static variables are stored on the heap.

The heap memory is also known as “shared heap”, where all objects can be accessed by all threads as long as they have the reference address of the object.

If a thread can access an object, it can also access the object’s member variables.
If two threads simultaneously invoke the same method of an object, they can both access the object’s member variables, but each thread has its own independent local variable copy.

Here is a diagram to illustrate:

In summary, although each thread has its own local variables on its own stack, they can share objects on the heap. Specifically, when different threads access member variables of the same object instance of a base type, each thread will have a copy of the variable.

6.2 Structure of Stack Memory #

Based on the above content and understanding of JVM memory allocation, here are some logical diagrams for your reference.

Let’s first take a look at the overall structure of the stack memory (Stack):

When a thread is started, JVM allocates a corresponding thread stack in the stack space, such as 1MB of space (-Xss1m).

The thread stack is also called the Java method stack. If JNI methods are used, a separate native method stack will be allocated.

During the thread execution process, there are usually multiple methods forming a call stack (Stack Trace), such as A calls B, B calls C… Each time a method is executed, a corresponding stack frame (Frame) is created.

The stack frame is a logical concept, and its specific size can be determined basically after a method is completed. For example, there needs to be a space for the return value, each local variable needs a corresponding address space, in addition to an operand stack for instructions to use, and a class pointer (to identify which class’s method this stack frame corresponds to, pointing to the Class object inside the non-heap).

6.3 Structure of Heap Memory #

Besides the stack memory, the most important memory area in a Java program is the heap memory.

The heap memory is a shared memory space for all threads, and in theory, everyone can access its contents.

However, JVM implementations usually have various optimizations. For example, logically, the Java heap is divided into two parts: Heap and Non-Heap. This division is based on the fact that the Java code we write can only use the Heap space, and the main area for memory allocation and collection is also in this part. Therefore, there is a saying that this Heap is also called the GC-managed heap (GC Heap).

In GC theory, there is an important concept called generational. After research, it has been found that the objects allocated in a program are either used and then discarded, or they can live for a long time.

Therefore, JVM divides the Heap memory into two parts: young generation and old generation (also called Tenured).

The young generation is further divided into three memory pools: Eden Space and Survivor Spaces. In most GC algorithms, there are two survivor spaces (S0, S1), and at any given time, one of them is always empty. They are generally small and do not waste much space.

There are also optimizations for the young generation, such as TLAB (Thread Local Allocation Buffer), which allocates a small area for each thread to allocate objects. When it is full, it will switch to another area. This can greatly reduce the overhead of concurrent resource locking.

Non-Heap is essentially still part of the Heap, but it is generally not managed by GC. It is divided into three memory pools.

Metaspace (formerly known as Permanent Generation): In Java 8, the method area was moved to the Meta area.
- CCS, Compressed Class Space, is used to store class information and has intersections with Metaspace.
- Code Cache, is used to store native machine code compiled by the JIT compiler.

This is the rough structure of JVM’s memory. With a grasp of these basic knowledge, let’s take a look at JMM.

6.4 CPU Instructions #

As we know, computers can be roughly divided into two categories based on the supported instructions:

Reduced Instruction Set Computer (RISC), represented by the ARM chip that is widely known today, has low power consumption but relatively weak computing power.
Complex Instruction Set Computer (CISC), represented by Intel’s X86 chip series, such as Pentium, Core, Xeon, and AMD’s CPU. They are characterized by strong performance but high power consumption. (In fact, starting from the Pentium 4 architecture, it is a complex instruction set externally, but internally implemented as a reduced instruction set, which is why the frequency can be greatly increased.)

People who have written programs know that the same calculation can be implemented in different ways. Hardware instruction design is also the same. For example, if our system needs to implement a certain functionality, a more complex approach is to encapsulate a logical arithmetic unit in the CPU to perform this operation and expose a dedicated instruction externally.

Of course, you can also take shortcuts and not implement this instruction, but let the program compiler figure out how to simulate and assemble this functionality using the existing basic, general instructions. As time goes on, the CPU instruction set that implements dedicated instructions will become more and more complex and is called a complex instruction set. On the other hand, the CPU instruction set that takes shortcuts will be relatively smaller, and even many instructions have been cut, so it is called a reduced instruction set computer.

Regardless of which instruction set, CPU implementations use a pipelining approach. If the CPU executes instructions one by one, many pipelines are actually idle. Simply put, you can think of a KFC’s order pickup window as a pipeline. So hardware designers came up with a good solution: “Instruction reordering”. The CPU can completely rearrange and execute these instructions according to its needs through internal scheduling, fully utilizing pipeline resources. As long as the final result is equivalent, there is no problem with the correctness of the program. However, in today’s era of multiple CPU cores, with the increase in complexity, programs that are executed concurrently face many problems.

The CPU executes with multiple cores, and there are also multiple threads in the JVM executing concurrently. This many-to-many situation makes the situation extremely complex. If it is not controlled properly, the program’s execution result may be incorrect.

6.5 Background of JMM #

The current JMM specifications correspond to the “JSR-133. Java Memory Model and Thread Specification”. After some content refinement, this specification became the “$17.4. Memory Model chapter” of the “Java Language Specification” (https://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html#jls-17.4). It can be seen that the final version of JSR133 was revised in 2014. This is because there were some pitfalls in the previous Java memory model, so it was redesigned in the Java 1.5 version and has been used ever since.

The JMM specification clearly defines how different threads can see the values saved in shared variables through certain means and at what time; and how to synchronize access to shared variables when necessary. The advantage of this is that it hides the differences in memory access between different hardware platforms and operating systems, achieving true cross-platform concurrency in Java programs.

With the large-scale application of Java in the web field, multi-threaded programming has become increasingly popular in order to fully utilize the computing power of multi-core processors. At this time, many problems related to thread safety arise. In order to truly master concurrent program design, it is necessary to understand the Java memory model. It can be said that the knowledge we have learned about “heap memory”, “stack memory”, and other concepts in the JVM memory structure, as well as the terms related to synchronization, locks, and threads in Java are closely related to JMM.

6.6 Introduction to JMM #

JVM supports the execution of multi-threaded programs, and each thread is a Thread. If explicit synchronization measures are not specified, then when multiple threads access the same shared variable, strange problems may occur. For example, Thread A reads a variable a=10 and wants to perform an operation of subtracting 2 as long as the value is greater than 9. At the same time, Thread B sets a=8 before the operation of Thread A. At this point, the condition for the operation of Thread A is not met, but Thread A is unaware and still performs a-2. As a result, a ends up being 6; in fact, the correct value of a should be 8. This lack of synchronization mechanism in a multi-threaded environment leads to incorrect final results.

Therefore, JMM defines some semantic issues in a multi-threaded execution environment, that is, it defines which methods are allowed.

Next, we will briefly introduce what is in the JMM specification.

Given a program and a sequence of its execution traces, the memory model describes whether this execution trace is a legal execution of the program. For Java, the memory model checks each read operation in the execution trace and verifies whether the observed write in that operation is legal according to specific rules. The memory model describes the possible behavior of a program. JVM implementations are free to generate any code they want, as long as all the results of the final execution of the program can be predicted by the memory model. This provides ample freedom for a large number of code transformations, including reordering of actions and unnecessary synchronization removal. A high-level, informal description of the memory model “shows that it is a set of rules that specifies when a write operation in one thread will become visible to another thread”. Informally, a read operation r generally sees the value written by any write operation w, meaning that w does not occur after r, and it does not appear to be overwritten by another write w’ (from the perspective of r).

JMM defines some terms and rules that everyone should have a basic understanding of.

Memory that can be shared and used by multiple threads is called “shared memory” or “heap memory”.
All objects (including internal instance variables), static variables, and arrays must be stored in heap memory.
Local variables, method parameters, and exception handling statement parameters are not allowed to be shared between threads, so they are not affected by the memory model.
When multiple threads access a variable at the same time [read/write], this phenomenon is called “conflict” as long as one of the threads performs a write operation.
Operations that can be affected or perceived by other threads are called inter-thread behaviors, including: reading, writing, synchronization operations, external operations, etc. Synchronization operations include: reading and writing to volatile variables, locking and unlocking monitors (monitors), thread startup and termination operations, thread start and end, etc. External operations refer to operations beyond the thread execution environment, such as stopping other threads.

JMM specifies inter-thread operations, but does not consider operations on local variables within a thread.

Interested students can refer to the Chinese version of JSR133 translated by ifeve: JSR133 Chinese Version.pdf

Introduction to Memory Barriers #

Earlier, we mentioned that the CPU may reorder operations at the appropriate time according to its needs. However, sometimes this reordering can lead to code that doesn’t behave as expected.

So what can we do? JMM introduces the concept of memory barriers.

Memory barriers can be divided into read barriers and write barriers, which are used to control visibility. Common memory barriers include:

#LoadLoad
#StoreStore
#LoadStore
#StoreLoad

The main purpose of these barriers is to temporarily block the CPU’s instruction reordering function. According to the agreement with the CPU, when encountering these instructions, the corresponding operations before and after the barrier must be guaranteed not to be disrupted.

For example, when encountering #LoadLoad, the Load instruction before the barrier must be executed first before executing the Load instruction after the barrier.
For example, if I want to write the value of a to the A field first, and then write the value of b to the memory address corresponding to the B field, if I want to strictly guarantee this order, I can insert a #StoreStore barrier between these two Store instructions.
When encountering a #LoadStore barrier, the CPU temporarily disables its instruction reordering capability.
The #StoreLoad barrier ensures that all store operations executed before the barrier are visible to other processors, and the load instructions executed after the barrier can obtain the latest values. In other words, it effectively prevents the store instructions before the barrier from being reordered with the load instructions after the barrier. Even on multi-core processors, the order of these operations is consistent when executed.

The most expensive barrier is the #StoreLoad barrier, which has the effects of the other three memory barriers combined and can be used as a replacement for the other three.

How to understand it?

It means that as long as one CPU core receives this type of instruction, it will perform some operations and broadcast a signal to mark a certain memory address. When other CPU cores interact with their caches, they will know that this cache is not up-to-date and needs to be reloaded from the main memory.

Summary #

In this section, we have explained a series of knowledge about JMM, so that everyone can understand the Java memory model, including:

The memory regions of the JVM are divided into: Heap and Stack.
The implementation of the heap memory can be divided into two parts: Heap and Non-Heap.
The heap memory is mainly managed by GC and is generally divided into: Old Generation + Young Generation; Young Generation = Eden + Survivor space.
CPU has a performance-enhancing feature: instruction reordering.
JMM corresponds to JSR133, which is now maintained by the Java language specification and JVM specification.
Classification and effects of memory barriers.