16 How Is Synchronized Implemented at the Bottom Level and What Is Lock Promotion and Demotion

synchronized 底层的实现主要依赖于 Java 对象头和 Monitor。每个 Java 对象都包含一个对象头，用于存储对象的元数据信息。在对象头中，有一个存储锁信息的字段。当一个线程需要访问被 synchronized 修饰的代码块时，它会首先尝试获取对象的锁。

如果该对象的锁没有被其他线程持有，则该线程可以顺利获取锁并执行代码。反之，如果锁已经被其他线程持有，则该线程会进入到 Monitor 阻塞队列中等待，直到获取到锁后才能继续执行。

锁的升级和降级是指锁的状态随着线程的执行过程而变化。在 HotSpot 虚拟机中，锁一共有四种状态，它们按照升级的顺序为：无锁状态、偏向锁状态、轻量级锁状态和重量级锁状态。

无锁状态：当对象的锁没有被任何线程占用时，处于无锁状态。此时，任何线程都可以通过 CAS 操作将自己的线程 ID 设置为锁的持有者，获取锁。
偏向锁状态：当一个线程获取到锁后，会将锁的持有者线程 ID 设置到对象头中。这时，其他线程再次请求锁时，只需要检查对象头中的线程 ID 是否与当前线程的 ID 相同即可。如果相同，则认为该对象处于偏向锁状态，无需再次获取锁。这样可以节省获取锁的操作，提高性能。但当其他线程请求锁时，持有偏向锁的线程会被撤销偏向锁。
轻量级锁状态：当多个线程竞争同一个锁时，偏向锁会升级为轻量级锁。此时，持有锁的线程会在对象头中记录锁的指针，并将对象数据结构中的 Mark Word 复制到线程的栈帧中。锁的竞争通过 CAS 操作来实现，效率高于重量级锁。如果 CAS 操作失败，则升级为重量级锁。
重量级锁状态：当多个线程竞争同一个锁，并且轻量级锁升级失败时，锁会升级为重量级锁。此时，其他线程再次请求锁时，会进入到等待队列中等待唤醒。重量级锁通过操作系统的互斥量来实现，效率较低。

锁的升级和降级的主要目的是为了在不同场景下提供更高的性能。偏向锁和轻量级锁适用于线程间竞争不激烈、锁占用时间短暂的情况。而重量级锁适用于线程间竞争激烈、锁占用时间较长的情况。

需要注意的是，锁的升级和降级过程会带来一定的性能开销。因此，在使用 synchronized 进行同步时，需要根据具体的使用场景和性能需求进行选择。

Typical answer #

Before answering this question, let’s briefly review the knowledge points from the previous lecture. The synchronized block in Java is implemented by a pair of monitorenter/monitorexit instructions. The Monitor object is the basic implementation of synchronization.

Before Java 6, the implementation of Monitor relied entirely on internal mutex locks of the operating system. Since it required a context switch from user mode to kernel mode, the synchronization operation was an indiscriminate heavyweight operation.

In modern (Oracle) JDK, the JVM has made significant improvements and provided three different implementations of Monitor, which are commonly referred to as three different locks: biased locking, lightweight locking, and heavyweight locking. These improvements greatly enhance performance.

Lock upgrading and downgrading refer to the JVM optimizing the mechanism of synchronized execution. When the JVM detects different competition situations, it will automatically switch to the appropriate lock implementation. This switching is called lock upgrading and downgrading.

When no competition occurs, biased locking is used by default. The JVM utilizes the CAS (compare and swap) operation to set the thread ID in the Mark Word section of the object header, indicating that this object is biased towards the current thread. This does not involve real mutual exclusion locks. The assumption behind this approach is that in many application scenarios, most objects will be locked by a single thread during their life cycle. Using biased locking can reduce the overhead of no competition.

If another thread attempts to lock an object that has already been biased, the JVM needs to revoke the biased lock and switch to the lightweight locking implementation. Lightweight locking relies on CAS operations on the Mark Word to try to acquire the lock. If the retry is successful, a normal lightweight lock is used; otherwise, it further upgrades to a heavyweight lock.

I noticed that some opinions believe that Java does not perform lock downgrading. In fact, to my knowledge, lock downgrading does occur. When the JVM enters a safe point, it checks whether there are idle Monitors and attempts to downgrade.

Analysis of the Test Points #

Today’s question mainly tests your understanding of the implementation of built-in locks in Java, and it’s a classic question on concurrency. The typical answer I provided earlier covers some basic concepts. If your foundation is not solid, some of these concepts may be difficult to understand. I suggest trying your best to understand and master them. Even if you don’t understand something, there’s no need to worry. Your understanding will gradually deepen in future learning.

In my opinion, having a basic understanding of these concepts and mechanisms is already sufficient for most concurrent programming. After all, most engineers may not need to go into deeper and more fundamental development. Many times, it’s about knowing or not knowing, and real improvement comes from practice and overcoming challenges.

Later on, I will further analyze:

At the source code level, I will delve a little deeper into the underlying implementation of synchronized and supplement some details that were missing in the answers given above. Some students have provided feedback that this part is easy to be asked about. If you are interested in the underlying source code of Java but haven’t found a starting point yet, this could be a good entry point.
Understanding the other lock implementations provided by the java.util.concurrent.lock package in the concurrent package. After all, Java doesn’t only have the ReentrantLock as an explicit lock type. I will analyze their usage with code examples.

Knowledge Expansion #

In the previous lesson, I mentioned that synchronized is an Intrinsic Lock in the JVM, so the code implementation of biased lock, lightweight lock, and heavyweight lock is not in the core class library but in the JVM code.

Java code can be executed in either interpreted mode or compiled mode (if you don’t remember, please review Lesson 1 of this column), so the corresponding synchronization logic implementation is also scattered in different modules. For example, for the interpreter version, it is src/hotspot/share/interpreter/interpreterRuntime.cpp.

To simplify and facilitate understanding, I will focus on the implementation of the common base class: src/hotspot/share/runtime/

Please note that the links point to the latest JDK code repository, so some implementations may differ from historical versions.

First of all, the behavior of synchronized is part of the JVM runtime, so we need to find the implementation of the runtime-related functions. By searching for keywords like “monitor_enter” or “Monitor Enter” in the code, we can easily locate:

sharedRuntime.cpp/hpp, which is the base class for the interpreter and compiler runtime.
synchronizer.cpp/hpp, which contains various basic logic related to JVM synchronization.

In sharedRuntime.cpp, the following code reflects the main logic of synchronized:

Handle h_obj(THREAD, obj);
if (UseBiasedLocking) {
    // Retry fast entry if bias is revoked to avoid unnecessary inflation
    ObjectSynchronizer::fast_enter(h_obj, lock, true, CHECK);
} else {
    ObjectSynchronizer::slow_enter(h_obj, lock, CHECK);
}

The implementation can be broken down as follows:

UseBiasedLocking is a check because during JVM startup, we can specify whether to enable biased locking.

Biased locking is not suitable for all application scenarios, and the revoke operation is relatively heavy. It can only demonstrate obvious improvement when there are many synchronized blocks that do not actually compete. In practice, the use of biased locking has been controversial. Some people even believe that when you need to use concurrent libraries extensively, it often means that you don’t need biased locking. In terms of specific choice, I still recommend testing in practice and deciding whether to use it based on the results.

On the other hand, biased locking can delay the JIT warm-up process, so in many performance tests, biased locking is explicitly disabled with the following command:

-XX:-UseBiasedLocking

fast_enter is the familiar complete lock acquisition path, while slow_enter bypasses biased locking and directly enters the lightweight lock acquisition logic.

So, how is fast_enter implemented? By searching the codebase again, we can locate synchronizer.cpp. Implementations like fast_enter are copied by the interpreter or dynamic compiler, so if we modify this part of the logic, we need to ensure consistency. This code is very sensitive, and even tiny issues can cause deadlocks or correctness problems.

Here is an analysis of this logic implementation:

biasedLocking defines operations related to biased locking. revoke_and_rebias is the entry method for acquiring biased locks, and revoke_at_safepoint defines the processing logic when a safepoint is detected.
If acquiring the biased lock fails, it enters slow_enter.
This method also checks whether biased locking is enabled, but from the code path, it will not enter this method if biased locking is disabled, so it can be considered as an additional sanity check.

Furthermore, if you carefully examine synchronizer.cpp, you will find that it not only contains the logic for synchronized but also includes Monitor actions triggered by native code, i.e., JNI (jni_enter/jni_exit).

I won’t go into more details about biasedLocking. Understanding that it is enough to use CAS to set the Mark Word, you can refer to the structure of the Mark Word in the object header in the following figure:

Mark Word Structure

Following the lock escalation process, how is biased locking escalated to lightweight locking?

Let’s take a look at what slow_enter actually does.

void ObjectSynchronizer::slow_enter(Handle obj, BasicLock* lock, TRAPS) {
    markOop mark = obj->mark();
    if (mark->is_neutral()) {
        // Copy the current Mark Word to the Displaced Header
        lock->set_displaced_header(mark);
        // Use CAS to set the Mark Word of the object
        if (mark == obj()->cas_set_mark((markOop) lock, mark)) {
            TEVENT(slow_enter: release stacklock);
            return;
        }
        // Check for competition
    } else if (mark->has_locker() &&
                THREAD->is_lock_owned((address)mark->locker())) {
        // Clear
        lock->set_displaced_header(NULL);
        return;
    }
     
    // Reset the Displaced Header
    lock->set_displaced_header(markOopDesc::unused_mark());
}

Let’s analyze this logic implementation:

biasedLocking defines operations related to biased locking. revoke_and_rebias is the entry method for acquiring biased locks, and revoke_at_safepoint defines the processing logic when a safepoint is detected.
If acquiring the biased lock fails, it enters slow_enter.
This method also checks whether biased locking is enabled, but from the code path, it will not enter this method if biased locking is disabled, so it can be considered as an additional sanity check.

Mark Word Structure

Following the lock escalation process, how is biased locking escalated to lightweight locking?

Let’s take a look at what slow_enter actually does.

void ObjectSynchronizer::slow_enter(Handle obj, BasicLock* lock, TRAPS) {
    markOop mark = obj->mark();
    if (mark->is_neutral()) {
        // Copy the current Mark Word to the Displaced Header
        lock->set_displaced_header(mark);
        // Use CAS to set the Mark Word of the object
        if (mark == obj()->cas_set_mark((markOop) lock, mark)) {
            TEVENT(slow_enter: release stacklock);
            return;
        }
        // Check for competition
    } else if (mark->has_locker() &&
                THREAD->is_lock_owned((address)mark->locker())) {
        // Clear
        lock->set_displaced_header(NULL);
        return;
    }
     
    // Reset the Displaced Header
    lock->set_displaced_header(markOopDesc::unused_mark());
}

Let’s analyze this logic implementation:

If the object’s mark is neutral, it means no lock is held. In this case, the current Mark Word is copied to the Displaced Header, and CAS is used to set the object’s Mark Word to the lock. If the CAS operation is successful, the method returns. Otherwise, it indicates competition for the lock.
If the object’s mark has a locker and the current thread owns the lock, it means the lock is already acquired by the current thread. In this case, the Displaced Header is cleared, and the method returns.
Finally, if none of the above cases matches, the Displaced Header is reset to indicate that the lock is unused.

ObjectSynchronizer::inflate(THREAD,
                            obj(),
                            inflate_cause_monitor_enter)->enter(THREAD);
}

Please refer to the comments I added in the code to understand the process of gradually entering the lock expansion when trying to acquire a lightweight lock. You may find that this processing logic is highly consistent with the process I introduced in the previous lecture.

Set the Displaced Header and use cas_set_mark to set the Mark Word of the object. If successful, the lightweight lock is acquired.
Otherwise, Displaced Header and enter the lock expansion phase, implemented in the inflate method.

Today, I won’t explain the details of expansion. I have provided the analysis approach and examples of the source code. Considering practical application, providing further explanation of the source code is not very meaningful. Interested students can refer to the synchronizer.cpp link I provided, for example:

deflate_idle_monitors is the entry point for analyzing the logic of lock downgrade. This part of behavior is continuously improved because its logic runs within a safe point, and improper handling may prolong the pause time of the JVM (STW, stop-the-world).
fast_exit or slow_exit correspond to the lock release logic.

In the previous analysis, I explained the underlying implementation of synchronized, which may be somewhat difficult to understand. Now let’s take a look at something relatively easier. In the previous lecture, I compared synchronized with ReentrantLock. There are also other special lock types in the Java core class library, please refer to the following diagram.

Lock types diagram

You might have noticed that these locks don’t all implement the Lock interface. ReadWriteLock is a separate interface that often represents a pair of locks for read and write operations. The standard library provides a reentrant version of the read-write lock implementation (ReentrantReadWriteLock), which has similar semantics to ReentrantLock.

StampedLock is also a separate type, and from the class diagram, you can see that it does not support the semantics of reentrancy. This means it is not based on the holding thread of the lock.

Why do we need other locks like ReadWriteLock?

This is because, although ReentrantLock and synchronized are simple and practical, they have certain limitations in terms of behavior, or in other words, they are “too domineering” - either exclusive or non-exclusive. In real-world application scenarios, sometimes there is not much contention for write operations, but mainly concurrent reads. How can we further optimize the concurrency granularity?

The extended capabilities provided by the Java Concurrency API, such as ReadWriteLock, expand the lock’s capabilities. It is based on the principle that multiple read operations do not need mutual exclusion because they do not modify the data, so there is no interference. Write operations, however, will cause concurrency consistency issues, so mutual exclusion logic needs to be carefully designed between write threads and read-write threads.

Here is an example of a data structure implemented using a read-write lock. When the data size is large and there are many concurrent reads and few concurrent writes, it can demonstrate advantages over the pure synchronized version.

public class RWSample {
  private final Map<String, String> m = new TreeMap<>();
  private final ReentrantReadWriteLock rwl = new ReentrantReadWriteLock();
  private final Lock r = rwl.readLock();
  private final Lock w = rwl.writeLock();

  public String get(String key) {
      r.lock();
      System.out.println("Read lock acquired!");
      try {
          return m.get(key);
      } finally {
          r.unlock();
      }
  }

  public String put(String key, String entry) {
      w.lock();
      System.out.println("Write lock acquired!");
      try {
          return m.put(key, entry);
      } finally {
          w.unlock();
      }
  }
  // ...
}

During runtime, if the read lock attempts to lock and the write lock is held by another thread, the read lock will be unable to acquire and will have to wait for the other operation to finish. This ensures that inconsistent data is not read.

The granularity of the read-write lock seems to be finer than synchronized, but in practical applications, its performance might not be ideal, mainly due to the relatively large overhead.

Therefore, in later versions of the JDK, StampedLock was introduced. While providing similar functionality as the read-write lock, it also supports optimized read mode. The optimization for reads is based on the assumption that in most cases, read operations do not conflict with write operations. The logic is to try reading first, and then use the validate method to confirm if it has entered the write mode. If not, the overhead is successfully avoided, otherwise, it attempts to acquire the read lock. Please refer to my sample code below.

public class StampedSample {
  private final StampedLock sl = new StampedLock();

  void mutate() {
      long stamp = sl.writeLock();
      try {
          write();
      } finally {
          sl.unlockWrite(stamp);
      }
  }

  Data access() {
      long stamp = sl.tryOptimisticRead();
      Data data = read();
      if (!sl.validate(stamp)) {
          stamp = sl.readLock();
          try {
              data = read();
          } finally {
              sl.unlockRead(stamp);
          }
      }
      return data;
  }
  // ...
}

Note that the writeLock and unlockWrite must be called in pairs.

You might be curious about the implementation mechanism of these explicit locks. The various synchronization utilities in the Java Concurrency API, not only various types of locks but also others such as Semaphore, CountDownLatch, and even earlier versions like FutureTask, are all based on a framework called AbstractQueuedSynchronizer (AQS).

Today, I comprehensively analyzed the implementation and internal workings of synchronized. I briefly introduced other explicit locks provided in the concurrency package and explained their usage with sample code. I hope this is helpful to you.

Practice Exercise #

Have you understood what we discussed today? Think about a question: do you know what a “spin lock” is? What are its use cases?

Please write your thoughts on this question in the comments section. I will select a well-thought-out comment and reward you with a learning coupon. Feel free to discuss with me.

Are your friends also preparing for interviews? You can “ask friends to read” and share today’s question with them. Maybe you can help them out.