13 Case Analysis Optimization of Multi Threaded Locking

13 Case Analysis- Optimization of Multi-threaded Locking #

In the previous lesson, we learned that we can use ThreadLocal to avoid the time confusion issue caused by SimpleDateFormat in a concurrent environment. Another solution is to lock the parse method, which also ensures the correct operation of the date processing class. The code is shown below (available in the repository):

public class ParseDate {
    // 日期格式化器
    private static final SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");

    public Date parse(String strDate) throws ParseException {
        return sdf.parse(strDate);
    }
}

Locking a resource can have a significant impact on performance. When a resource is locked, it is exclusively held by the locking thread, and other threads can only wait in line for the lock. This sequential execution in a concurrent program naturally reduces the execution speed.

The following is a comparison of performance using ThreadLocal and synchronized locks with 50 threads:

Benchmark                                 Mode  Cnt     Score      Error   Units
SynchronizedNormalBenchmark.sync         thrpt   10  2554.628 ± 5098.059  ops/ms
SynchronizedNormalBenchmark.threadLocal  thrpt   10  3750.902 ±  103.528  ops/ms
========Removing Business Impact========
Benchmark                                 Mode  Cnt        Score        Error   Units
SynchronizedNormalBenchmark.sync         thrpt   10    26905.514 ±   1688.600  ops/ms
SynchronizedNormalBenchmark.threadLocal  thrpt   10  7041876.244 ± 355598.686  ops/ms

As we can see, using synchronized locks has lower performance. If we remove the impact of the business logic (delete the execution logic), the difference will be even greater. The more times the code is executed, the greater the cumulative impact of the lock, making speed optimization of the lock itself very important.

We all know that there are two ways to lock in Java: one is the common synchronized keyword, and the other is to use the Lock from the concurrent package. JDK has made a lot of optimizations for both types of locks, and their implementation methods are different. In this lesson, we will start with these two types of locks and look at some optimization methods for locks.

synchronized #

When locking code or methods with the synchronized keyword, there is an explicit or implicit locking object. When a thread tries to access synchronized code, it must obtain the lock first, and it must release the lock when it exits or throws an exception.

  • When locking a regular method, the lock is the instance of the class (this).
  • When locking a static method, the lock is the Class object.
  • When locking a code block, a specific object can be specified as the lock.

1. Monitor Mechanism #

In interviews, interviewers are likely to ask you: How does synchronized appear in bytecode? Refer to the code below, execute javac in the command line, and then execute javap -v -p to see its specific bytecode.

As we can see from the bytecode, it only adds a flag, ACC_SYNCHRONIZED, to the method declaration.

synchronized void syncMethod() {
    System.out.println("syncMethod");
}
synchronized void syncMethod();
    descriptor: ()V
    flags: ACC_SYNCHRONIZED
    Code:
      stack=2, locals=1, args_size=1
         0: getstatic     #4
         3: ldc           #5
         5: invokevirtual #6
         8: return

Now let’s look at the bytecode of a synchronized code block. We can see that it is controlled by the monitorenter and monitorexit instructions.

void syncBlock(){
    synchronized (Test.class){
    }
}
void syncBlock();
    descriptor: ()V
    flags:
    Code:
      stack=2, locals=3, args_size=1
         0: ldc           #2 
         2: dup
         3: astore_1
         4: monitorenter
         5: aload_1
         6: monitorexit
         7: goto          15
        10: astore_2
        11: aload_1
        12: monitorexit
        13: aload_2
        14: athrow
        15: return
      Exception table:
         from    to  target type
             5     7    10   any
            10    13    10   any

Although they have different representations, both of them achieve synchronization through the monitor. The diagram below illustrates the mechanism of the monitor.

Note: This is a commonly asked interview question. For example, can you describe the implementation principle of the monitor lock?

monitor.PNG

As shown in the diagram, we can abstract the object lock at runtime into three parts. The EntrySet and WaitSet are two queues, and the dashed line represents the thread that currently holds the lock. We can imagine the execution process of the thread.

When the first thread arrives, it finds that no thread holds the object lock, so it becomes the active thread and enters the RUNNING state.

Then three more threads arrive and try to compete for the object lock. At this time, these three threads find that the lock has already been acquired, so they enter the EntrySet temporarily and enter the BLOCKED state. From the jstack command, we can see that the displayed information for these threads is “waiting for monitor entry”.

"http-nio-8084-exec-120" #143 daemon prio=5 os_prio=31 cpu=122.86ms elapsed=317.88s tid=0x00007fedd8381000 nid=0x1af03 waiting for monitor entry  [0x00007000150e1000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:263)
    - waiting to lock <0x0000000782e1b590> (a java.io.BufferedInputStream)
    at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
    at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
    at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
    at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
    at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)

The thread in the active state finishes execution and exits. Alternatively, due to some reason such as calling the wait method, the thread releases the object lock and enters the WaitSet queue. This is why you need to acquire the object lock before calling wait.

For example, consider the following code:

synchronized (lock){
    // do something
}

At this point, the thread is in the WAITING state according to the jstack output, with the reason being in Object.wait().

"wait-demo" #12 prio=5 os_prio=31 cpu=0.14ms elapsed=12.58s tid=0x00007fb66609e000 nid=0x6103 in Object.wait()  [0x000070000f2bd000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x0000000787b48300> (a java.lang.Object)
    at java.lang.Object.wait(Object.java:326)
    at WaitDemo.lambda$main$0(WaitDemo.java:7)
    - locked <0x0000000787b48300> (a java.lang.Object)
    at WaitDemo$$Lambda$14/0x0000000800b44840.run(Unknown Source)
    at java.lang.Thread.run(Thread.java:830)

In the two situations mentioned above, the object lock is released, causing the threads in the WaitSet to reacquire the object lock. The thread that successfully acquires the lock becomes the active thread, and this process repeats in a loop.

How are the threads in the WaitSet reactivated? Next, at some point, the notify or notifyAll command on the lock is executed, causing the thread in the WaitSet to move to the EntrySet and reacquire the lock.

In this way, the threads can execute in order.

2. Hierarchical Locks #

In JDK 1.8, there have been significant improvements in the speed of the synchronized keyword. What optimizations has it made? The answer is hierarchical locks. The JVM upgrades the synchronized lock based on usage, and it can generally be upgraded through the following paths: biased lock - lightweight lock - heavyweight lock.

Locks can only be upgraded, not downgraded, so once it is upgraded to a heavyweight lock, it can only rely on the operating system for scheduling.

To understand the lock upgrade process, let’s first look at the structure of the object in memory.

2.png

As shown in the above figure, the object consists of four parts: MarkWord, Class Pointer, Instance Data, and Padding.

The MarkWord is most relevant to lock upgrading. It has a length of 24 bits, and we’ll focus on that. It includes four parts: Thread ID (23 bits), Age (6 bits), Biased (1 bit), and Tag (2 bits). Lock upgrading is based on the values of Thread ID, Biased, and Tag.

  • Biased Lock

In the case where only one thread uses the lock, biased locking can ensure higher efficiency.

The process is as follows: when the first thread accesses the synchronized block for the first time, it first checks if the Tag in the object’s MarkWord is 01, to determine if the object lock is in an unlocked state or biased state (anonymous bias lock) at this time.

01 is also the default state of the lock. Once a thread acquires this lock, it writes its own thread ID to the MarkWord. Before another thread attempts to acquire this lock, the lock remains in a biased state.

When another thread joins the competition for the biased lock, it first checks if the thread ID saved in the MarkWord is equal to its own thread ID. If they are not equal, the biased lock is immediately revoked and upgraded to a lightweight lock.

  • Lightweight Lock

How is lightweight locking acquired? They use spin locking. #

Each thread participating in the competition will generate a Lock Record (LR) on its own thread stack. Then, each thread tries to set the MarkWord of the lock object header to point to its own LR using the CAS (spin) mechanism. The thread that successfully sets the MarkWord is considered to have acquired the lock.

When the lock is in the lightweight locking state, it cannot simply judge by comparing the value of the tag. Every time the lock is acquired, it needs to spin.

Of course, spinning is also for scenarios where there is no lock contention, such as when one thread has finished running and another thread tries to acquire the lock. However, if spinning fails a certain number of times, the lock will escalate to a heavyweight lock.

  • Heavyweight Lock

Heavyweight locks, which are what we commonly understand as synchronized, in this case, the thread will be suspended and enter the kernel state, waiting for the operating system’s scheduling, and then return to the user state. System calls are expensive, so heavyweight locks get their name from this.

If the shared variable in the system is highly contested, the lock will quickly escalate to a heavyweight lock, rendering these optimizations useless. If concurrency is very severe, you can disable biased locking by using the parameter -XX:-UseBiasedLocking, which theoretically may improve performance, but in practice, it is uncertain.

Lock #

In the concurrent package, we can find two classes, ReentrantLock and ReentrantReadWriteLock. “Reentrant” means reentrant, and like the synchronized keyword, they are both reentrant locks.

It is necessary to explain the concept of “reentrant,” which is a frequently tested point in interviews. It means that when a thread is running, it can acquire the same object lock multiple times. This is because Java’s locks are based on threads, not calls.

For example, in the following code, since methods a, b, and c lock the current object, when a thread calls method a, it does not need to acquire the object lock multiple times.

public synchronized void a(){
    b();
}
public synchronized void b(){
    c();
}
public synchronized void c(){
}

1. Main Methods #

Lock is implemented based on AQS (AbstractQueuedSynchronizer), and AQS is implemented based on volatile and CAS (more about CAS will be explained in the next lesson).

Lock is used differently from synchronized. It needs to be locked manually and unlocked in the finally block. The Lock interface is more flexible than synchronized. Let’s take a look at the key methods.

  • Lock: The Lock method is no different from synchronized. If it cannot acquire the lock, it will be blocked.
  • tryLock: This method attempts to acquire the lock. Whether it can acquire the lock or not, it will immediately return without blocking. It has a return value, and it will return true if the lock is acquired.
  • tryLock(long time, TimeUnit unit): Similar to tryLock, but it waits for a certain amount of time in case the lock cannot be acquired until timeout.
  • LockInterruptibly: Similar to Lock, but it allows lock waiting and can be interrupted. It will return InterruptedException after being interrupted.

In general, using the Lock method is sufficient, but if the business request requires prompt response, it is better to use tryLock with a timeout. Our business can directly return failure without blocking and waiting. This optimization method of tryLock, which sacrifices request success rate to ensure service availability, is commonly used in highly concurrent scenarios.

2. Read-Write Lock #

However, for some businesses, using Lock, which is a coarse-grained lock, is still too slow. For example, for a HashMap, if the business has a lot of read operations and few write operations, if the same lock is used for both reads and writes, the efficiency will be slow.

ReentrantReadWriteLock is a lock that separates read and write operations. It allows multiple threads to read at the same time, but read and write, and write and write are mutually exclusive. The usage is as shown below: acquire a read-write lock, acquire a write lock for write operations, acquire a read lock for read operations, and release the lock in the finally block.

ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
Lock readLock = lock.readLock();
Lock writeLock = lock.writeLock();

public void put(K k, V v) {
    writeLock.lock();
    try {
        map.put(k, v);
    } finally {
        writeLock.unlock();
    }
}
...

Here’s a homework for you: Can we have a faster read-write separation pattern other than ReadWriteLock? What API was introduced in JDK 1.8? (Feel free to answer in the comments, I’ll discuss with you one by one)

3. Fair Locks and Nonfair Locks #

  • Nonfair Locks

The locks we commonly use are nonfair locks. Let’s take a look back at the principle of monitors. When the thread holding a lock releases it, the threads in the entry set will contend for this lock. This contention process is random, which means you don’t know which thread will acquire the object lock. Whoever gets it, gets it.

There is a certain probability that a thread may not be able to acquire the lock. For example, a thread sets a low priority through setPriority, and this thread that cannot acquire the lock will stay in a hungry state. This is the concept of thread starvation.

  • Fair Locks

Fair locks can solve this problem by making the contention ordered rather than random. Synchronized does not have this capability, but it can be set to a fair lock through a constructor parameter in Lock. The code is as follows:

public ReentrantReadWriteLock(boolean fair) {
    sync = fair ? new FairSync() : new NonfairSync();
    readerLock = new ReadLock(this);
    writerLock = new WriteLock(this);
}

Since all threads need to queue up, a synchronized queue needs to be maintained in a multi-core scenario. In the case of multiple threads contending for the lock, the throughput is very low.

The following is the JMH test result for locks under 20 concurrencies. Nonfair locks perform two orders of magnitude better than fair locks.

Benchmark                      Mode  Cnt      Score      Error   Units
FairVSNoFairBenchmark.fair    thrpt   10    186.144 ±   27.462  ops/ms
FairVSNoFairBenchmark.nofair  thrpt   10  35195.649 ± 6503.375  ops/ms

Lock Optimization Techniques #

1. Deadlocks #

Let’s take a look at the most serious situation of lock contention: deadlocks. The following sample code shows two threads holding the locks that the other thread needs, and they enter a state of mutual waiting, causing a deadlock.

During interviews, candidates are often asked to write the following code:

public class DeadLockDemo {
    public static void main(String[] args) {
        Object object1 = new Object();
        Object object2 = new Object();
        Thread t1 = new Thread(() -> {
            synchronized (object1) {
                try {
                    Thread.sleep(200);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
                synchronized (object2) {
                }
            }
        }, "deadlock-demo-1");

        t1.start();
        Thread t2 = new Thread(() -> {
            synchronized (object2) {
                try {
                    Thread.sleep(200);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
                synchronized (object1) {
                }
            }
        }, "deadlock-demo-2");
        t2.start();
    }
}

The code creates two object locks. Thread 1 first acquires the lock of object1 and then tries to acquire the lock of object2 after 200ms. However, at this time, Thread 2 has already acquired the lock of object2. These two threads enter a state of mutual waiting, resulting in a deadlock.

Using the tryLock method with a timeout mentioned earlier, if one party times out and yields, it can avoid deadlocks to a certain extent.

2. Optimization Techniques #

The theory of lock optimization is actually very simple, which is to reduce lock contention. Whether it is lock read-write separation or lock striping, they all fundamentally aim to avoid multiple threads from acquiring the same lock at the same time.

So we can summarize the general ideas of optimization: reducing lock granularity, reducing lock holding time, lock hierarchies, lock separation, lock elimination, optimistic locks, lock-free, etc.

3.png

  • Reducing Lock Granularity

By reducing the lock granularity, conflicts can be dispersed, reducing the possibility of conflicts and thereby improving concurrency. In simple terms, it means abstracting resources and using separate locks to protect each class of resources.

For example, in the code below, since list1 and list2 belong to two different resources, there is no need to use the same object lock to handle them.

public class LockLessDemo {
    List<String> list1 = new ArrayList<>();
    List<String> list2 = new ArrayList<>();
    final Object lock1 = new Object();
    final Object lock2 = new Object();
    public void addList1(String v) {
        synchronized (lock1) {
            this.list1.add(v);
        }
    }
    public void addList2(String v) {
        synchronized (lock2) {
            this.list2.add(v);
        }
    }
}

You can create two different locks to improve the situation as follows:

public class LockLessDemo {
    List<String> list1 = new ArrayList<>();
    List<String> list2 = new ArrayList<>();
    final Object lock1 = new Object();
    final Object lock2 = new Object();
    public void addList1(String v) {
        synchronized (lock1) {
            this.list1.add(v);
        }
    }
    public void addList2(String v) {
        synchronized (lock2) {
            this.list2.add(v);
        }
    }
}
  • Reducing lock holding time

By releasing the lock resource as soon as possible, the lock holding time is reduced, allowing other threads to acquire the lock resource more quickly and perform other business operations.

Consider the following code snippet. Since slowMethod() is not within the lock scope and takes a long time, it can be moved outside the synchronized block to accelerate lock release.

public class LockTimeDemo {
    List<String> list = new ArrayList<>();
    final Object lock = new Object();
    public void addList(String v) {
        synchronized (lock) {
            slowMethod();
            this.list.add(v);
        }
    }
    public void slowMethod(){
    }
}
  • Lock upgrading

Lock upgrading refers to the lock escalation of the synchronized lock mentioned at the beginning of our article. It is an internal optimization of the JVM, which starts with biased locking and gradually escalates to lightweight locks and heavyweight locks. This process is irreversible.

  • Lock separation

The read-write lock we mentioned above is a technique of lock separation. This is because read operations generally do not affect resources and can be executed concurrently, while write operations and other operations are mutually exclusive and can only be executed sequentially. Therefore, read-write locks are suitable for scenarios with more reads and fewer writes.

  • Lock elimination

Through the JIT compiler, the JVM can eliminate locking operations for certain objects. For example, we all know that StringBuffer and StringBuilder are used for string concatenation, and the former is thread-safe.

However, if these two string concatenation objects are used within a function, the JVM, through escape analysis, determines that the scope of this object is within this function and eliminates the impact of the lock.

For example, the following code snippet has the same effect as StringBuilder.

String m1(){
    StringBuffer sb = new StringBuffer();
    sb.append("");
    return sb.toString();
}

Of course, for internet scenarios with more reads and fewer writes, the most effective approach is to use optimistic locking or even lock-free techniques, which we will introduce in the next lesson, “Lesson 14 | Case Study: Optimistic Locking and Lock-Free”.

Summary #

There are two ways to lock in Java: using the synchronized keyword and the Lock interface in the concurrent package.

In this lesson, we have discussed their characteristics in detail, including implementation principles. The comparison is as follows:

Category Synchronized Lock
Implementation mechanism Monitor AQS
Underlying details JVM optimization Java API
Lock upgrading Yes No
Functional features Single Rich
Lock separation None Read-write lock
Lock timeout None tryLock with timeout
Interruptible No lockInterruptibly

Lock provides more functionalities than synchronized and allows for more fine-grained control of thread behavior.

However, if only the basic functionality of locking is required, it is recommended to use synchronized for two reasons:

  • The programming model of synchronized is simpler and easier to use.
  • synchronized introduces optimized features such as biased locking and lightweight locks, which can be optimized at the JVM level. The JIT compiler also performs some lock elimination actions on it.

We have also learned about fair locks and unfair locks, as well as the concept of reentrant locks and some common optimization techniques. Optimization space exists only when conflicts occur. So what is a lock-free queue and how is it implemented? We will answer these questions in the next lesson, “Lesson 14 | Case Study: Optimistic Locking and Lock-Free”.