14 Case Analysis the Optimistic Lock and Lock Free

14 Case Analysis- The Optimistic Lock and Lock-Free #

In the previous lesson, we mentioned the Lock under the concurrent package, which allows for finer-grained control of shared resources at the API level. Lock is implemented based on AbstractQueuedSynchronizer (AQS), which is used to build Lock and other synchronization components. It uses an int member variable to represent the state of synchronization and uses a built-in FIFO queue to handle thread queuing for resource acquisition.

Locking using the synchronized keyword causes frequent switching between the BLOCKED and RUNNABLE states of a thread, resulting in frequent transitions between user mode and kernel mode in the operating system, which leads to lower efficiency.

In contrast to the implementation of synchronized, many data structure changes in AQS are performed based on Compare and Swap (CAS), which is one implementation of optimistic locking.

CAS #

CAS stands for Compare And Swap.

In the CAS mechanism, three basic operands are used: the memory address V, the expected value E, and the new value N. When updating a variable, the memory address V is only modified to N if the actual value associated with the memory address V matches the expected value E.

If the modification is not successful, what should be done? In many cases, it will keep retrying until the modification matches the expected value.

Taking the AtomicInteger class as an example, the relevant code is as follows:

public final boolean compareAndSet(int expectedValue, int newValue) {
    return U.compareAndSetInt(this, VALUE, expectedValue, newValue);
}

Comparing and replacing are two separate actions. How does CAS ensure the atomicity of these two operations?

By further tracing, we find that it is implemented in the jdk.internal.misc.Unsafe class, and the loop retry occurs here:

@HotSpotIntrinsicCandidate
public final int getAndAddInt(Object o, long offset, int delta) {
    int v;
    do {
        v = getIntVolatile(o, offset);
    } while (!weakCompareAndSetInt(o, offset, v, v + delta));
    return v;
}

Tracing to the internals of the JVM, referring to os_cpu/linux_x86/atomic_linux_x86.hpp on a Linux machine, we can see that the lowest-level call is made in assembly language, and the most important instruction is cmpxchgl. At this point, we can’t find any more code because the atomicity of CAS is actually guaranteed by the hardware CPU.

template<>
template<typename T>
inline T Atomic::PlatformCmpxchg<4>::operator()(T exchange_value,
                                                T volatile* dest,
                                                T compare_value,
                                                atomic_memory_order /* order */) const {
  STATIC_ASSERT(4 == sizeof(T));
  __asm__ volatile ("lock cmpxchgl %1,(%3)"
                    : "=a" (exchange_value)
                    : "r" (exchange_value), "a" (compare_value), "r" (dest)
                    : "cc", "memory");
  return exchange_value;
}

So, how much can the performance of atomic classes implemented by CAS be improved? We conducted a test with 20 threads performing increment operations on a shared variable.

From the test results, it can be seen that for frequent write operations, the performance of atomic classes is three times that of the synchronized approach.

The principle of CAS has been increasingly emphasized in recent years in interviews, mainly due to the increasing usage of optimistic locking in Internet scenarios where reading is more frequent than writing.

You may have noticed some variations of optimistic locks, but the basic idea is the same: based on the basic operations of compare, replace, and substitute.

Regarding the Atomic class, there is a small detail: its main variable is annotated with the volatile keyword. Can you guess what it is used for?

private volatile int value;

Answer: A variable annotated with the volatile keyword ensures that whenever the variable’s value changes, the change is immediately synchronized to the main memory. If a thread wants to use this variable, it must first refresh it from the main memory to the working memory, ensuring the visibility of the variable. With this annotation, it ensures that the value obtained during each comparison is always the latest.

Optimistic Lock #

From the description above, optimistic lock is not strictly a lock, but rather a mechanism for detecting conflicts and completing an operation by retrying when conflicts occur. Without retrying, optimistic lock is simply a judgment logic.

From this, we can see some differences between optimistic lock and pessimistic lock. Pessimistic lock assumes that others will modify the data every time they operate on the data, so it locks every time it operates on the data, unless someone releases the lock.

When optimistic lock detects conflicts, it will retry multiple times. Therefore, as we mentioned before, optimistic lock is suitable for scenarios where there are more reads and fewer writes. In scenarios where resource conflicts are more severe, optimistic lock may fail multiple times, causing the CPU to idle. Therefore, pessimistic lock performs better in such scenarios.

Why is optimistic lock suitable for scenarios with more reads and fewer writes? Doesn’t pessimistic lock also have fewer conflicts in this scenario?

In fact, the problem lies in the locking action itself.

Pessimistic lock needs to follow the following three patterns: one lock, two reads, three updates, even if there are no conflicts, the execution will be very slow.
As mentioned earlier, optimistic lock is not essentially a lock, it is just a judgment logic. When there are fewer resource conflicts, it does not incur any overhead.

The CAS operation we mentioned earlier is a typical implementation of optimistic lock. Let’s also take a look at the disadvantages of CAS, which are also some disadvantages of optimistic lock.

In high-concurrency scenarios, some threads may keep trying to modify a resource, but due to severe conflicts, the update is not successful. This will put great pressure on the CPU. LongAdder, newly added in JDK 1.8, reduces the probability of CAS operation conflicts by splitting the original value and then reducing it in the form of sum. Its performance is about 10 times higher than AtomicLong.
CAS operation can only be performed on a single resource. If you want to ensure the atomicity of multiple resources, it’s best to use classic locking methods like synchronized.
The ABA problem refers to the situation where, during a CAS operation, another thread changes the value of a variable from A to B, and then changes it back to A. When the current thread performs the operation, it finds that the value is still A, so it performs the exchange operation. This situation may not be a concern in some scenarios, such as AtomicInteger, because it has no impact. However, in some other operations, such as in a linked list, problems may occur and must be avoided. AtomicStampedReference can be used to mark the reference with an integer version stamp to ensure atomicity.

Optimistic Lock Implementation for Balance Update #

Manipulating the balance is one of the most common operations in a transaction system. The value of the balance is read first, then modified, and finally written back.

Any update to the balance needs to be locked. Because reading and writing operations are not atomic, if there are multiple operations on the balance at the same time, inconsistencies may occur.

Let’s take a clear example. You initiate a request to withdraw 80 yuan and another request to withdraw 5 yuan at the same time. After the operations, both payments are successful, but the balance is only reduced by 5 yuan. It means that you spent 5 yuan to buy something worth 85 yuan. Please take a look at the timeline below:

Request A: Read balance 100
Request B: Read balance 100
Request A: Spend 5 yuan, temporary balance is 95
Request B: Spend 80 yuan, temporary balance is 20
Request B: Write balance 20 successfully
Request A: Write balance 95 successfully

I once encountered a P0-level bug online. Users constructed requests to initiate frequent withdrawals of 100 yuan and 1 cent, resulting in serious consequences. You can analyze this process on your own.

Therefore, locking the balance operation is necessary. This process is similar to multi-threaded operations, but multi-threaded operations are single-machine operations, while the balance scenario is distributed.

For databases, row locking can be used to solve this problem. Taking MySQL as an example, MyISAM does not support row locking, so we can only use InnoDB. A typical SQL statement is as follows:

select * from user where userid={id} for update

With this simple SQL statement, three locks are actually added at the underlying level, which is very expensive.

By default, it locks the primary key index, which is ignored here; A next key lock (record + gap lock) for the secondary index userid={id}; A gap lock for the next record with the secondary index userid={id}.

Therefore, in real-world scenarios, this pessimistic lock is no longer used because it is not very versatile and very expensive.

A good solution is to use optimistic lock. Based on our definition of optimistic lock above, we can abstract two concepts:

Mechanism for detecting conflicts: First, retrieve the balance of the current operation and check if it is the same as the current value in the database during the update. If it is the same, perform the update action.
Retry strategy: Fail directly if there is a conflict, or retry 5 times before failing.

The pseudocode is as follows, and you can see that this is essentially CAS.

# Get old_balance
select balance from user where userid={id}
# Update action
update user set balance = balance - 20
    where userid={id} 
    and balance >= 20
    and balance = $old_balance

There is another variant of CAS that uses version number mechanism. By adding an additional field “version” in the table, it replaces the judgement of the balance. This approach does not need to focus on specific business logic, and it can control the update of multiple variables and have stronger scalability. The typical pseudocode is as follows:

version, balance = dao.getBalance(userid)
balance = balance - cost
dao.exec("
    update user 
    set balance = balance - 20
    version = version + 1
    where userid=id 
    and balance >= 20
    and version = $old_version
")

Redis Distributed Lock #

Redis distributed lock is a commonly used solution in the internet industry. Many people know that it is implemented using the setnx or set method with parameters. However, Redis distributed lock actually has many pitfalls.

In “08 | Case Study: How Redis Helps Seckill Business”, we demonstrated how to use Lua script to implement a seckill scenario. However, in reality, the seckill business is usually not that simple. It needs to perform some other business operations between query and user deduction.

For example, performing some product verification, order generation, etc. At this time, using distributed lock can achieve more flexible control, primarily depending on the SETNX command or the SET command with parameters.

Lock creation: SETNX [KEY] [VALUE] is an atomic operation, meaning that it creates a key and returns 1 if the specified key does not exist; otherwise, it returns 0. We usually use the more complete command SET key value [EX seconds] [PX milliseconds] [NX|XX] to set a timeout for the key.
Lock query: GET KEY can simply determine if the key exists or not.
Unlocking: DEL KEY can delete the corresponding key.

Based on the native semantics, we have the following simple lock and unlock methods. The lock method keeps retrying until it acquires the distributed lock, and then destroys the distributed lock by using the delete command.

public void lock(String key, int timeOutSecond) {
    for (;;) {
        boolean exist = redisTemplate.opsForValue().setIfAbsent(key, "", timeOutSecond, TimeUnit.SECONDS);
        if (exist) {
            break;
        }
    }
}

public void unlock(String key) {
    redisTemplate.delete(key);
}

There are many problems in this code, and we only point out one of the most severe ones. In a multi-threaded environment, only the current thread should execute the unlock method. However, in the above implementation, the lock is released prematurely due to timeouts. Consider the timeline of three requests:

Request A: Acquires the lock for resource x, with a timeout of 5 seconds.
Request A: Due to the long running time of the business logic, it is blocked and exceeds 5 seconds.
Request B: Sends a request in the 6th second and finds that the lock x has expired, so it successfully acquires the lock.
Request A: In the 7th second, request A finishes executing and then releases the lock.
Request C: Request C sends a request just after the lock is released and successfully obtains the lock resource.

At this point, both request B and request C have successfully acquired lock x, and our distributed lock has failed. This can easily lead to problems when executing business logic.

Therefore, when deleting the lock, it is necessary to determine whether the requestor is correct. First, get the current identifier in the lock, and then, when deleting, check if this identifier is the same as that in the unlock request.

It can be seen that reading and checking are two different operations, and there may be a gap between these two operations, which can cause execution confusion in high concurrency. The safe solution is to use a Lua script to encapsulate them into atomic operations.

The modified code is as follows:

public String lock(String key, int timeOutSecond) {
    for (;;) {
        String stamp = String.valueOf(System.nanoTime());
        boolean exist = redisTemplate.opsForValue().setIfAbsent(key, stamp, timeOutSecond, TimeUnit.SECONDS);
        if (exist) {
            return stamp;
        }
    }
}

public void unlock(String key, String stamp) {
    redisTemplate.execute(script, Arrays.asList(key), stamp);
}

The corresponding Lua script is as follows:

local stamp = ARGV[1]
local key = KEYS[1]
local current = redis.call("GET",key)
if stamp == current then
    redis.call("DEL",key)
    return "OK"
end

As you can see, implementing distributed locks in Redis can be challenging. It is recommended to use the Redisson Java client implementation of Redlock, which is based on the distributed lock management method proposed by Redis’s official document.

This lock algorithm handles the problem of distributed locks in a multi-Redis instance scenario and addresses various exceptional situations, providing higher fault tolerance. For example, the lock expiration issue mentioned earlier is addressed by the Redisson library using a watchdog mechanism that automatically extends the lock expiration to ensure normal business operations.

Let’s take a look at typical code that uses Redisson distributed locks:

String resourceKey = "goodgirl";
RLock lock = redisson.getLock(resourceKey);
try {
    lock.lock(5, TimeUnit.SECONDS);
    // actual business logic
    Thread.sleep(100);
} catch (Exception ex) {
    ex.printStackTrace();
} finally {
    if (lock.isLocked()) {
        lock.unlock();
    }
}

By using the monitor command in Redis, you can see the specific execution steps, which can be quite complex.

Lock-Free #

Lock-Free refers to the situation where, in a multi-threaded environment, accessing a shared resource does not block the execution of other threads.

In Java, the most typical implementation of a lock-free queue is ConcurrentLinkedQueue. However, it is unbounded and cannot specify its size. ConcurrentLinkedQueue uses Compare-and-Swap (CAS) to handle concurrent access to data, which forms the basis for lock-free algorithms.

CAS instructions do not cause context switching or thread scheduling and are very lightweight for multi-threaded synchronization. It further breaks down atomic operations on head and tail nodes, such as enqueueing and dequeueing, into finer-grained steps, narrowing the scope of CAS control.

ConcurrentLinkedQueue is a non-blocking queue with high performance, but it is not commonly used. Be sure not to confuse it with the blocking queue LinkedBlockingQueue, which is based on locks.

The Disruptor framework is a lock-free, bounded queue that achieves exceptionally high performance. It uses techniques such as RingBuffer, lock-free algorithms, and cache line padding to achieve optimal performance. It can be used as a replacement for traditional blocking queues in highly concurrent scenarios, such as logging and messaging (Storm uses it to implement inter-process communication mechanisms). However, it is rarely used in business systems unless it is a scenario similar to seconds killing. This is because its programming model is complex, and the main bottleneck in business systems is usually slow I/O rather than the queue.

Conclusion #

In this lesson, we started with CAS and gradually explored some concepts and use cases of optimistic locking.

Optimistic locking, strictly speaking, is not a type of lock. It provides a mechanism for detecting conflicts and uses a retry approach to complete certain operations when conflicts occur. Without the retry operation, optimistic locking is just a logical judgment.

Pessimistic locking assumes that others will modify the data every time it operates on the data, so it locks it each time it operates on the data unless someone releases the lock.

Optimistic locking is faster than pessimistic locking in situations where there are more reads than writes because pessimistic locking requires many additional operations. Additionally, when there is no conflict, optimistic locking does not consume any resources. However, in situations with severe conflicts, optimistic locking’s performance is generally inferior to pessimistic locking due to constant retries.

Due to these characteristics of optimistic locking, it is widely used in read-mostly, write-seldom internet scenarios.

In this lesson, we mainly looked at an optimistic locking implementation at the database level and the implementation of Redis distributed locks. The latter has many details to pay attention to, and it is recommended to use the RLock in the Redisson library.

Of course, optimistic locking has its use cases. When conflicts are severe, it can perform a lot of unnecessary calculations. It can only protect a single resource and struggles to handle multiple resources. It also has the ABA problem, which can be solved by using optimistic locks with version numbers.

We can learn from these experiences through CAS. There are many similarities between multi-threaded environments and distributed environments. For optimistic locking, once we find a conflict detection mechanism, we have basically implemented it.

Now, let’s leave a question for you to analyze and answer:

A write operation of an interface takes about 5 minutes. During the write process, a field in the database is updated to “start”, and after the write is completed, it is updated to “done”. Another user wants to write some data but needs to wait until the status is “done”.

So, the developer uses polling on the web end, querying the field value every 5 seconds, and once the correct value is obtained, the data can be written.

Does the developer’s method belong to optimistic locking? What are the potential issues, and how can they be avoided? Please leave your answers in the comments section below, and I will address and discuss them one by one.