30 How to Implement Distributed Locks With Redis

30 How to Implement Distributed Locks with Redis #

In the previous lesson, I mentioned that besides atomic operations, Redis clients can also control concurrent write operations to shared data by using locks to ensure data correctness.

However, Redis is a distributed system. When multiple clients need to contend for a lock, it is crucial that this lock is not a lock local to a specific client. Otherwise, other clients cannot access this lock and, of course, cannot acquire this lock.

Therefore, in a distributed system, when multiple clients need to acquire a lock, we need a distributed lock. In this case, the lock is stored in a shared storage system and can be accessed and acquired by multiple clients.

Redis itself can be accessed and shared by multiple clients, making it an ideal shared storage system for storing distributed locks. Moreover, Redis has high read/write performance, making it suitable for high-concurrency lock operations. So, in this lesson, I will discuss how to implement distributed locks based on Redis.

In our daily programming, we often use locks on a single machine, and you should be familiar with them. While distributed locks and locks on a single machine have similarities, distributed locks have some special requirements because they are used in distributed scenarios.

Therefore, next, let me first compare distributed locks with locks on a single machine, and identify their similarities and differences. This will deepen your understanding of the concept of distributed locks and the requirements for their implementation.

Connection and Difference between Locks on a Standalone Machine and Distributed Locks #

Let’s start by looking at locks on a standalone machine.

For a multi-threaded program running on a standalone machine, a lock can be represented by a variable.

When the variable value is 0, it means that no thread has acquired the lock.
When the variable value is 1, it means that a thread has already acquired the lock.

When we talk about the operation of acquiring and releasing a lock by a thread, what does it mean exactly? Let me explain. In fact, when a thread acquires a lock, it actually checks whether the lock variable value is 0. If it is 0, the lock variable value is set to 1, indicating that the lock is acquired. If it is not 0, an error message is returned to indicate that acquiring the lock failed because another thread has already acquired it. Releasing a lock by a thread means setting the lock variable value to 0 so that other threads can acquire the lock.

I’ll use a code snippet to demonstrate the operations of acquiring and releasing a lock, where lock is the lock variable.

acquire_lock(){
  if lock == 0
     lock = 1
     return 1
  else
     return 0
} 

release_lock(){
  lock = 0
  return 1
}

Similar to locks on a standalone machine, a variable can also be used to implement distributed locks. The operations of acquiring and releasing a lock by the client in distributed locks have the same logic as those on a standalone machine: when acquiring a lock, it also needs to check the value of the lock variable to determine whether it can acquire the lock successfully; when releasing a lock, it needs to set the value of the lock variable to 0 to indicate that the client no longer holds the lock.

However, unlike threads operating locks on a standalone machine, in a distributed scenario, the lock variable needs to be maintained by a shared storage system. Only in this way can multiple clients access the lock variable by accessing the shared storage system. Consequently, the operations of acquiring and releasing a lock become reading, checking, and setting the value of the lock variable in the shared storage system.

In this way, we can derive two requirements for implementing distributed locks.

Requirement 1: The process of acquiring and releasing a lock in distributed locks involves multiple operations. Therefore, when implementing distributed locks, we need to ensure the atomicity of these lock operations.
Requirement 2: The shared storage system stores the lock variable. If the shared storage system fails or crashes, the clients will be unable to perform lock operations. When implementing distributed locks, we need to consider ensuring the reliability of the shared storage system, thereby ensuring the reliability of the lock.

Now that we know the specific requirements, let’s move on to learning how Redis implements distributed locks.

In fact, we can implement it based on a single Redis node or using multiple Redis nodes. The reliability of the lock differs in these two cases. Let’s first look at the implementation method based on a single Redis node.

Implementing Distributed Lock Based on a Single Redis Node #

As a shared storage system in the process of implementing distributed locks, Redis can use key-value pairs to store lock variables and receive and process lock and unlock requests from different clients. So, how are the keys and values of the key-value pairs determined?

We need to assign a variable name to the lock variable and use this variable name as the key of the key-value pair. The value of the lock variable is then the value of the key-value pair. This way, Redis can store the lock variable and clients can use Redis commands to perform lock operations.

To help you understand, I have created an image that illustrates how Redis uses key-value pairs to store lock variables and the process of two clients simultaneously requesting a lock operation.

As you can see, Redis can use a key-value pair “lock_key:0” to store the lock variable. The key is “lock_key”, which is also the name of the lock variable, and the initial value of the lock variable is 0.

Now let’s analyze the lock operation.

In the image, clients A and C simultaneously request a lock. Since Redis processes requests with a single thread, even if clients A and C send lock requests to Redis at the same time, Redis will process their requests sequentially.

Let’s assume that Redis processes client A’s request first, reads the value of “lock_key”, and finds that “lock_key” is 0. So Redis sets the value of “lock_key” to 1, indicating that the lock has been acquired. Then Redis processes client C’s request and finds that the value of “lock_key” is already 1, so it returns a failure message for the lock acquisition.

So far, we have discussed the lock acquisition operation. Now, how do we release the lock? Actually, releasing the lock is as simple as setting the value of the lock variable to 0.

Let me explain this with another image. This image shows the process of client A requesting to release the lock. When client A holds the lock, the value of the lock variable “lock_key” is 1. After client A performs the unlock operation, Redis sets the value of “lock_key” to 0, indicating that no client holds the lock anymore.

Because the lock operation involves three steps (reading the lock variable, checking the value of the lock variable, and setting the value of the lock variable to 1), it is important to ensure atomicity when executing these steps. How can we ensure atomicity?

In the previous lesson, we learned that there are two common methods to ensure atomicity: using Redis single command operations and using Lua scripts. So, how can we apply these two methods in the context of distributed lock acquisition?

Let’s first look at which single command operations Redis provides for lock acquisition.

The first one is the SETNX command, which is used to set the value of a key-value pair. Specifically, this command checks if the key-value pair exists when executing. If the pair does not exist, it sets the value of the pair. If the pair already exists, it does nothing.

For example, if you execute the following command and the key does not exist, the key will be created and its value will be set to “value”. If the key already exists, SETNX does not change the value.

SETNX key value

For the unlock operation, we can simply use the DEL command to delete the lock variable after performing the business logic. However, you don’t have to worry that other clients will be unable to acquire the lock after the lock variable is deleted. This is because when the SETNX command is executed, if the key-value pair (i.e., the lock variable) to be set does not exist, SETNX first creates the pair and then sets its value. So, after releasing the lock, if another client requests to acquire the lock, the SETNX command will create the key-value pair to store the lock variable and set its value, completing the lock acquisition.

In summary, we can use the combination of SETNX and DEL commands to implement the lock acquisition and release operations. The following pseudocode example demonstrates the process of lock operations.

// Acquire lock
SETNX lock_key 1
// Perform business logic
DO THINGS
// Release lock
DEL lock_key

However, using the combination of SETNX and DEL commands to implement distributed locking has two potential risks. The first risk is that if a client executes the SETNX command to lock, but then encounters an exception when operating on the shared data, it may never execute the final DEL command to release the lock. As a result, the lock will be held by this client, preventing other clients from acquiring the lock and accessing the shared data and subsequent operations, which will have an impact on the business application.

To address this issue, an effective solution is to set an expiration time for the lock variable. This way, even if the client holding the lock encounters an exception and cannot release the lock actively, Redis will delete it based on the expiration time of the lock variable. After the lock variable expires, other clients can request the lock again, avoiding the problem of being unable to acquire the lock.

Now let’s look at the second risk. Suppose client A executes the SETNX command to acquire the lock, and then client B executes the DEL command to release the lock. At this point, client A’s lock will be mistakenly released. If client C happens to be applying for the lock, it can successfully acquire the lock and start operating on the shared data. As a result, both client A and C will operate on the shared data at the same time, causing incorrect data modification, which is unacceptable at the business level.

To address this issue, we need to differentiate lock operations from different clients, and one way to do this is to work with the lock variable value.

In the method of acquiring a lock using the SETNX command, we set the lock variable value to 1 or 0 to indicate whether the lock acquisition is successful. 1 and 0 only have two states and cannot indicate which client performed the lock operation. Therefore, when acquiring a lock, we can let each client set a unique value for the lock variable, which can be used to identify the current client’s operation. When releasing the lock, the client needs to check whether the current lock variable value is equal to its own unique identifier. Only in this case can the lock be released. This way, the problem of mistaken lock release can be avoided.

Now that we know the solution, how is it actually implemented in Redis? Let’s take a look.

Before looking at the specific code, let me first introduce the Redis SET command.

When we mentioned the SETNX command just now, we mentioned that it creates and sets the value for a non-existent key-value pair (i.e., “set if not exists”). To achieve the same effect as SETNX, Redis provides a similar option NX for the SET command, which is used to implement “set if not exists”. If the NX option is used, the SET command only performs the setting when the key-value pair does not exist; otherwise, no assignment operation is performed. In addition, the SET command can also be accompanied by the EX or PX option when executed, which is used to set the expiration time of the key-value pair.

For example, when executing the following command, SET will create the key only if it does not exist, and assign a value to it. Furthermore, the key’s lifetime is determined by the seconds or milliseconds option value.

SET key value [EX seconds | PX milliseconds] [NX]

With the NX and EX/PX options of the SET command, we can use the following command to implement the lock acquisition operation.

// Acquire lock, unique_value is the client's unique identifier
SET lock_key unique_value NX PX 10000

Here, unique_value is the client’s unique identifier, which can be represented by a randomly generated string, and PX 10000 indicates that the lock_key will expire after 10 seconds, so that the lock can be automatically released if the client encounters an exception and fails to release the lock during this period.

Since a unique identifier is used for each client in the lock acquisition operation, when releasing the lock, we need to check whether the value of the lock variable is equal to the unique identifier of the client performing the release operation, as shown below:

// Release lock: Compare whether unique_value is equal to the lock variable, to avoid mistaken release
if redis.call("get",KEYS[1]) == ARGV[1] then
    return redis.call("del",KEYS[1])
else
    return 0
end

This is the pseudocode for the lock release operation implemented using a Lua script (unlock.script). Here, KEYS[1] represents lock_key, and ARGV[1] is the current client’s unique identifier, both of which are passed as parameters when executing the Lua script.

Finally, we can execute the following command to complete the lock release operation:

redis-cli --eval unlock.script lock_key , unique_value

You may have noticed that we used a Lua script to release the lock. This is because the lock release logic includes multiple operations such as reading the lock variable, checking its value, and deleting the lock variable. When Redis executes a Lua script, it can perform these operations atomically, ensuring the atomicity of the lock release operation.

So far, you have learned how to use the SET command and Lua scripts to implement distributed locks in Redis on a single node. However, if this Redis instance fails and crashes, the lock variable will be lost. At this point, clients will also be unable to perform lock operations, which will affect the normal execution of the business. Therefore, when implementing distributed locks, we also need to ensure the reliability of the locks. How can we achieve this? This leads to the topic of implementing distributed locks based on multiple Redis nodes.

Implementing a highly reliable distributed lock based on multiple Redis nodes #

When we need to implement a highly reliable distributed lock, we cannot solely rely on a single command operation. Instead, we need to follow certain steps and rules for locking and unlocking operations, otherwise, the lock may fail to work. These “certain steps and rules” refer to the algorithm of distributed locks.

In order to avoid the problem of the lock not working due to Redis instance failures, the developer of Redis, Antirez, proposed the Redlock distributed lock algorithm.

The basic idea of the Redlock algorithm is to let the client and multiple independent Redis instances sequentially request locks. If the client can successfully acquire the lock with more than half of the instances, then we consider that the client has successfully obtained the distributed lock. Otherwise, the lock attempt fails. This way, even if a single Redis instance encounters a failure, the lock variable is still saved on other instances, allowing the client to perform lock operations normally without losing the lock variable.

Let’s take a closer look at the steps of executing the Redlock algorithm. The implementation of the Redlock algorithm requires N independent Redis instances. Next, we can divide the process into 3 steps to complete the locking operation.

The first step is for the client to obtain the current time.

The second step is for the client to sequentially execute the lock operation on N Redis instances.

The lock operation here is the same as the lock operation performed on a single instance, using the SET command with the NX, EX/PX options, and the unique identifier of the client. Of course, if a Redis instance encounters a failure, in order to ensure the continued operation of the Redlock algorithm under such circumstances, we need to set a timeout for the lock operation.

If the client fails to successfully acquire the lock with a Redis instance until the timeout, the client will continue to request the lock with the next Redis instance. The timeout for the lock operation needs to be much smaller than the lock’s expiration time, typically set to a few tens of milliseconds.

The third step is for the client to calculate the total time it takes for the entire locking process once it has completed the lock operation with all Redis instances.

The client can only consider the lock acquisition successful if it meets the following two conditions:

Condition 1: The client successfully acquires the lock from more than half (greater than or equal to N/2+1) of the Redis instances.
Condition 2: The total time taken by the client to acquire the lock does not exceed the lock’s expiration time.

After satisfying these two conditions, we need to recalculate the expiration time of the lock. The result is obtained by subtracting the total time taken by the client to acquire the lock from the initial expiration time of the lock. If the expiration time is not enough to complete the shared data operation, we can release the lock to prevent the lock from expiring before the data operation is completed.

Of course, if the client fails to simultaneously satisfy these two conditions after completing the lock operation with all instances, the client initiates a lock release operation to all Redis nodes.

In the Redlock algorithm, the lock release operation is the same as the lock release operation on a single instance, simply by executing the Lua script for releasing the lock. This way, as long as more than half of the instances (N/2+1) in the N Redis instances can function properly, the distributed lock can be guaranteed to work properly.

Therefore, in practical business applications, if you want to improve the reliability of distributed locks, you can implement them using the Redlock algorithm.

Summary #

Distributed locks are variables maintained by a shared storage system, and multiple clients can send commands to the shared storage system to perform lock or unlock operations. Redis, as a shared storage system, can be used to implement distributed locks.

When implementing distributed locks based on a single Redis instance, there are three conditions that need to be met for the lock operation.

The lock operation includes three operations: reading the lock variable, checking the lock variable value, and setting the lock variable value. These operations need to be performed atomically, so we use the SET command with the NX option to implement the lock.
The lock variable needs to have an expiration time set to prevent the lock from being held indefinitely in case of exceptions. Therefore, we add the EX/PX option to the SET command to set its expiration time.
The value of the lock variable needs to be able to distinguish the lock operations from different clients to prevent incorrect release operations when releasing the lock. Therefore, when setting the value of the lock variable using the SET command, each client sets a unique value to identify the client.

Similar to the lock operation, the release operation also involves reading the lock variable value, checking the lock variable value, and deleting the lock variable. However, we cannot use a single command to achieve this. Therefore, we can use Lua scripts to perform the release lock operation by executing the Lua script atomically with Redis.

However, when implementing distributed locks based on a single Redis instance, the situation where the instance is abnormal or crashes may occur, which will result in the instance being unable to provide lock operations. Because of this, Redis also provides the Redlock algorithm to implement distributed locks based on multiple instances. In this way, the lock variable is maintained by multiple instances, so even if one instance fails, the lock variable still exists and clients can still perform lock operations. The Redlock algorithm is an effective solution for implementing highly reliable distributed locks, and you can use it in practical applications.

One question per lesson #

As usual, I have a little question for you. In this lesson, I mentioned that we can use the SET command with the NX and EX/PX options to perform a locking operation. So, I’d like you to think about whether we can achieve the same locking operation using the following approach:

// Locking
SETNX lock_key unique_value
EXPIRE lock_key 10S
// Business logic
DO THINGS

Feel free to write down your thoughts and answers in the comments section. Let’s discuss and exchange ideas together. If you found today’s content helpful, feel free to share it with your friends or colleagues. See you in the next lesson.