13 Multithreading Lock Optimization Dive Deeper Into the Optimization Methods of Lock Synchronization Locks

13 Multithreading Lock Optimization Dive deeper into the optimization methods of Lock synchronization locks #

Hello, I’m Liu Chao.

In today’s lecture, let’s continue discussing lock optimization. In the previous lecture, I mainly introduced the optimization methods for Synchronized synchronization locks implemented at the JVM level. In addition to that, since JDK 1.5, Java has also provided Lock synchronization locks. So what advantages does it have?

Compared to the Synchronized synchronization locks which require the JVM to implicitly acquire and release the lock, the Lock synchronization lock (referred to as Lock lock) requires explicit acquisition and release of the lock, providing more flexibility for acquiring and releasing the lock. The basic operation of the Lock lock is implemented using optimistic locks, but because the Lock lock can also be suspended when blocked, it still belongs to pessimistic locks. We can compare the characteristics of the two synchronization locks with a simple diagram:

In terms of performance, in cases where the concurrency level is not high and the competition is not intense, the Synchronized synchronization lock, due to its advantage of having a hierarchical lock, has similar performance to the Lock lock. However, in high load and high concurrency scenarios, the Synchronized synchronization lock, due to intense competition, may escalate to heavyweight locks, resulting in lower performance compared to the stability of the Lock lock.

We can intuitively compare the performance of the two locks through a simple set of performance tests, as shown in the results below, and the code can be downloaded and viewed on Github.

Based on the above data, we can see that the performance of the Lock lock is relatively more stable. But how does its implementation principle compare to the Synchronized synchronization lock discussed in the previous lecture?

Implementation Principle of Lock 锁 #

Lock is a lock implemented based on Java. Lock is an interface class, and commonly used implementation classes are ReentrantLock and ReentrantReadWriteLock (RRW). They are implemented relying on the AbstractQueuedSynchronizer (AQS) class.

The AQS class structure includes a waiting queue implemented based on a linked list (CLH queue) to store all blocked threads. AQS also has a state variable, which represents the lock state for ReentrantLock.

The operations on this queue are implemented using CAS operations. We can see the entire process of acquiring a lock through a diagram.

Lock Separation Optimization - Lock Synchronization #

Although the performance of Lock locks is stable, it is not necessary to use a ReentrantLock exclusive lock to achieve thread synchronization in all scenarios.

We know that if one thread is reading data while another thread is writing data to the same piece of data, the data read will be inconsistent with the final data. Similarly, if one thread is writing data while another thread is also writing data, the data seen by these threads will also be inconsistent. In this case, we can add a mutual exclusion lock to the read and write methods to ensure that only one thread can read or write the data at any given time.

In most business scenarios, there are far more read operations than write operations. In multithreaded programming, read operations do not modify the shared resource data. Therefore, if multiple threads are only reading the shared resource, it is unnecessary to lock the resource. If a mutual exclusion lock is used, it will actually affect the concurrency performance of the business. In this scenario, is there any way to optimize the implementation of locks?

1. Read-Write Lock - ReentrantReadWriteLock #

For scenarios with many reads and few writes, Java provides another implementation of the Lock interface called the read-write lock - RWL. We know that ReentrantLock is an exclusive lock which only allows one thread to access it at a time, while RWL allows multiple reading threads to access it at the same time, but it does not allow writing threads and reading threads or writing threads and writing threads to access it simultaneously. The read-write lock maintains two locks internally: a ReadLock for read operations and a WriteLock for write operations.

How does the read-write lock achieve lock separation to ensure the atomicity of shared resources?

The RWL is also implemented based on AQS. Its custom synchronizer (extends AQS) needs to maintain the state of multiple reading threads and one writing thread on the synchronization state. The design of this state is crucial to implementing the read-write lock. RWL ingeniously uses the high and low bits to implement a function that controls two states using an integer value. The read-write lock splits the variable into two parts: the high 16 bits represent read operations, and the low 16 bits represent write operations.

When a thread attempts to acquire the write lock, it first checks whether the synchronization state is 0. If the state equals 0, it means that no other threads currently hold the lock. If the state is not equal to 0, it means that another thread has acquired the lock.

At this point, it checks whether the low 16 bits (w) of the synchronization state are 0. If w is 0, it means that another thread has acquired the read lock, and the current thread enters the CLH queue to wait for blocking. If w is not 0, it means that another thread has acquired the write lock. At this point, it checks whether the thread that acquired the write lock is the current thread. If it is not, the current thread enters the CLH queue to wait for blocking. If it is, it checks whether the current thread has acquired the write lock more than the maximum number of times. If it has, an exception is thrown; otherwise, it updates the synchronization state.

When a thread attempts to acquire the read lock, it also checks whether the synchronization state is 0. If the state equals 0, it means that no other threads currently hold the lock. At this point, it checks whether the thread needs to be blocked. If it does, the current thread enters the CLH queue to wait for blocking; otherwise, it uses CAS to update the synchronization state to the read state.

If the state is not equal to 0, it checks the low 16 bits of the synchronization state. If there is a write lock, it fails to acquire the read lock and enters the CLH blocking queue. Otherwise, it checks whether the current thread should be blocked. If it should not be blocked, it attempts to CAS-update the synchronization state to acquire the read state.

Let’s take a look at an example of calculating a square to experience the implementation of RWL. The code is as follows:

public class TestRTTLock {

	private double x, y;

	private ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
	// Read lock
	private Lock readLock = lock.readLock();
	// Write lock
	private Lock writeLock = lock.writeLock();

	public double read() {
		// Acquire read lock
		readLock.lock();
		try {
			return Math.sqrt(x * x + y * y);
		} finally {
			// Release read lock
			readLock.unlock();
		}
	}

	public void move(double deltaX, double deltaY) {
		// Acquire write lock
		writeLock.lock();
		try {
			x += deltaX;
			y += deltaY;
		} finally {
			// Release write lock
			writeLock.unlock();
		}
	}
}

			writeLock.unlock();
		}
	}
 
}

2. Optimization of Read-Write Lock: StampedLock #

RRW has been well applied in concurrent scenarios where reads are more frequent than writes. However, RRW still has room for performance improvement. In the case of frequent reads and infrequent writes, RRW can cause starvation for write threads, meaning that write threads may wait for a long time without being able to compete for the lock.

In JDK 1.8, Java provides the StampedLock class to solve this problem. StampedLock is not implemented based on AQS, but its implementation principle is similar to AQS, which is based on queues and lock states. Unlike RRW, StampedLock controls the lock in three modes: write, pessimistic read, and optimistic read. When acquiring the lock, StampedLock returns a ticket called a “stamp”. The acquired stamp needs to be validated when releasing the lock. In optimistic read mode, the stamp also serves as a secondary validation after reading the shared resource. I will explain the working principle of the stamp later.

Let’s first understand how StampedLock is used through an official example, as shown below:

public class Point {
    private double x, y;
    private final StampedLock s1 = new StampedLock();

    void move(double deltaX, double deltaY) {
        // Acquire the write lock
        long stamp = s1.writeLock();
        try {
            x += deltaX;
            y += deltaY;
        } finally {
            // Release the write lock
            s1.unlockWrite(stamp);
        }
    }

    double distanceFromOrigin() {
        // Optimistic read operation
        long stamp = s1.tryOptimisticRead();
        // Copy variables
        double currentX = x, currentY = y;
        // Check if there were any write operations during the read
        if (!s1.validate(stamp)) {
            // Upgrade to pessimistic read
            stamp = s1.readLock();
            try {
                currentX = x;
                currentY = y;
            } finally {
                s1.unlockRead(stamp);
            }
        }
        return Math.sqrt(currentX * currentX + currentY * currentY);
    }
}

We can observe that in the process of a write thread acquiring the write lock, it first obtains a ticket called “stamp” through writeLock(). WriteLock is an exclusive lock, and only one thread can acquire this lock at a time. Other threads requesting this lock must wait until no thread holds a read or write lock. After successfully acquiring the lock, a stamp variable is returned to represent the version of the lock. When releasing the lock, unlockWrite() is called and the stamp parameter is passed.

Next is the process of a read thread acquiring the lock. First, the thread uses tryOptimisticRead() to obtain an optimistic lock stamp. If no thread currently holds a write lock, a non-zero stamp version information will be returned. After acquiring this stamp, the thread will make a copy of the shared resource to the method stack. Before the validate() method, all the operations are based on the copied data in the method stack.

The method then needs to call validate() to check if any other thread has acquired a write lock since tryOptimisticRead() was called. If there is a write lock, validate() will return 0 and the lock will be upgraded to a pessimistic lock. Otherwise, the lock with the stamp version can be used to operate on the data.

Compared to RRW, StampedLock only uses bitwise OR operations to check for the read lock, without involving CAS operations. Even if the optimistic lock acquisition fails for the first time, it will immediately upgrade to a pessimistic lock. This avoids the performance issue of continuous CAS operations and CPU occupancy. Therefore, StampedLock is more efficient than RRW.

Summary #

Whether using the Synchronized synchronization lock or the Lock synchronization lock, thread blocking will occur as long as there is lock contention, leading to frequent thread switching and ultimately increasing performance overhead. Therefore, how to reduce lock contention becomes the key to optimizing locks.

In the Synchronized synchronization lock, we learned that lock contention can be reduced by decreasing lock granularity and reducing lock holding time. In this lecture, we learned that the Lock lock can be used to reduce lock contention by using lock separation.

The Lock lock implements read-write lock separation to optimize scenarios where reading is greater than writing. From the basic RRW implementation to read locks and write locks, to StampedLock implementing optimistic read locks, pessimistic read locks, and write locks, they are all designed to reduce lock contention and achieve optimal concurrency performance in the system.

Thought Question #

Just like RRW, StampedLock is suitable for scenarios where read operations are more frequent than write operations. StampedLock may have emerged from RRW, but it is hard to say whether it is better. After all, RRW is still widely used, which indicates that it has advantages that StampedLock cannot replace. Do you know why StampedLock is not widely used? Or what are the drawbacks that prevent it from being widely used?