06 Data Synchronization How Master Slave Repositories Achieve Data Consistency

06 Data Synchronization How Master-Slave Repositories Achieve Data Consistency #

In the previous two lessons, we learned about AOF and RDB. If Redis crashes, data can be recovered by replaying the log or reloading the RDB file, minimizing data loss and improving reliability.

However, even with these methods, there is still a problem of service unavailability. For example, if we only run one Redis instance in practice and it crashes, it cannot serve new data access requests during the recovery process.

So what does it mean when we say Redis is highly reliable? There are two layers of meaning: minimizing data loss and minimizing service interruptions. AOF and RDB ensure the former, while for the latter, Redis achieves it by increasing the redundancy of replicas, storing a copy of the data on multiple instances. Even if one instance fails and takes some time to recover, other instances can still provide service without affecting business usage.

Having multiple instances storing the same data sounds great, but we must consider one question: how can we keep the data consistent between these replicas? Can data read/write operations be sent to all instances?

In fact, Redis provides a master-slave replication mode to ensure the consistency of data replicas, using a read-write separation approach between the master and slave databases.

Read operations: can be performed on both the master and slave databases.
Write operations: will be executed on the master database first, and then the master database synchronizes the write operations to the slave databases.

Redis master-slave replication and read-write separation.

So, why do we adopt the read-write separation approach?

Imagine in the diagram above, if both the master and slave databases can receive write operations from clients, a straightforward problem arises: if a client modifies the same data (e.g., k1) three times, and each modification request is sent to and executed on different instances, then the replicas of this data on these three instances will be inconsistent (v1, v2, and v3 respectively). When reading this data, an outdated value may be read.

If we insist on keeping this data consistent on all three instances, it would involve locking, coordinating whether the modifications are completed between instances, and a series of other operations, which would incur significant overhead and may not be acceptable.

Once the master-slave replication mode with read-write separation is adopted, all data modifications will only be performed on the master database, without the need to coordinate among the three instances. After the master database has the latest data, it will be synchronized to the slave databases, ensuring the consistency of data between the master and slave databases.

So, how is the synchronization between master and slave databases achieved? Is the master database data transmitted to the slave database all at once or in batches? What happens if the network connection between the master and slave databases is disconnected? In this lesson, I will explain the principles of master-slave synchronization and the solutions to deal with the risk of network disconnection.

Alright, let’s first take a look at how the initial synchronization between the master and slave databases is performed, which is the defined action after establishing the master-slave replication mode in Redis instances.

How to perform the initial synchronization between master and slave databases? #

When we start multiple Redis instances, they can form a master-slave relationship with each other through the replicaof (or slaveof before Redis 5.0) command. After that, the data will be synchronized for the first time in three phases.

For example, let’s say we have instance 1 with the IP address 172.16.19.3 and instance 2 with the IP address 172.16.19.5. If we execute the following command on instance 2, it will become a slave of instance 1 and replicate the data from instance 1:

replicaof 172.16.19.3 6379

Next, we need to learn about the three phases of the initial data synchronization between master and slave databases. You can first take a look at the following diagram to get an overall understanding, and then I will explain the process in detail.

Process of the initial synchronization between master and slave databases

The first phase is the connection and sync negotiation between the master and slave databases, mainly to prepare for full replication. In this step, the slave database establishes a connection with the master database and informs the master database about the upcoming synchronization. After confirming the response from the master database, the synchronization between the master and slave databases can begin.

Specifically, the slave database sends the psync command to the master database to indicate that it wants to synchronize data. The master database starts replication based on the parameters specified in this command.

runID is a randomly generated ID that is unique to each Redis instance when it is started. When the slave and master are replicating for the first time, the runID of the master is unknown, so it is set to “?”.
offset is set to -1 at this stage, indicating the first replication.

After receiving the psync command, the master database responds with the FULLRESYNC command, including two parameters: the runID of the master and the current replication offset. The slave database records these two parameters upon receiving the response.

One thing to note here is that the FULLRESYNC response indicates that the initial replication uses full replication, which means that the master database will replicate all the current data to the slave database.

In the second phase, the master database synchronizes all the data to the slave database. After receiving the data, the slave database locally completes the data loading process. This process relies on the RDB file generated by the memory snapshot.

Specifically, the master database executes the bgsave command to generate an RDB file and then sends the file to the slave database. Upon receiving the RDB file, the slave database first clears the current database and then loads the RDB file. This is because the slave database may have saved other data before starting synchronization with the master database via the replicaof command. To avoid the impact of previous data, the slave database needs to clear the current database first.

During the data synchronization process from the master to the slave, the master database is not blocked and can still handle requests. Otherwise, the Redis service would be interrupted. However, the write operations in these requests are not recorded in the newly generated RDB file. To ensure the consistency of data between the master and slave databases, the master database uses a dedicated replication buffer in memory to record all the write operations received after the RDB file is generated.

Finally, in the third phase, the master database sends the new write commands received during the second phase to the slave database. The specific operation is that after the master database completes sending the RDB file, it sends the modification operations in the replication buffer to the slave database, and the slave database re-executes these operations. This way, the master and slave databases achieve synchronization.

Reducing the stress on the master database during full replication in master-slave cascading mode #

By analyzing the process of the initial data synchronization between the master and slave databases, you can see that during a full replication, the master database needs to complete two time-consuming operations: generating the RDB file and transmitting it.

If there are many slave databases that need to perform full replication with the master database, it will cause the master database to be occupied with forking subprocesses to generate RDB files and perform full data synchronization. This forking operation will block the main thread from processing normal requests, resulting in slower response speed to application requests from the master database. In addition, transmitting the RDB file will also consume the network bandwidth of the master database, which will further burden its resources. So, is there a good solution to alleviate the stress on the master database?

Actually, there is. It’s called the “master-slave-slave” mode.

In the master-slave mode we just discussed, all the slave databases are connected to the master database, and all the full replication is performed with the master database. Now, we can use the “master-slave-slave” mode to distribute the pressure of generating and transmitting RDB files in a cascading manner to the slave databases.

Simply put, when deploying a master-slave cluster, we can manually select a slave database (e.g., one with higher memory resources) to act as the cascading database for the other slave databases. Then, we can choose some other slave databases (e.g., one-third of the slave databases) and execute the following command on them to establish a master-slave relationship with the previously selected slave database:

replicaof <IP of the selected slave database> 6379

By doing this, these slave databases will know that they don’t need to interact with the master database during synchronization. They only need to synchronize write operations with the cascading slave database. This can reduce the stress on the master database, as shown in the diagram below:

Cascading “master-slave-slave” mode

Alright, up to this point, we have learned about the process of data synchronization between master and slave databases through full replication, as well as the way to reduce the stress on the master database using the “master-slave-slave” mode. Once the master-slave databases have completed full replication, they will maintain a continuous network connection. The master database will use this connection to propagate subsequent command operations to the slave databases, which is known as command propagation based on long connections and can avoid the overhead of frequent connection establishment.

This may sound simple, but it is important to note that there are risks in this process, most commonly network disconnection or blockage. If there is a network disconnection, the master and slave databases will not be able to propagate commands, and the data in the slave databases will not be able to stay consistent with the master database, which may result in clients reading outdated data from the slave databases.

Next, let’s discuss the solution to network disconnection.

What to do if the network between the master and slave databases breaks? #

Before Redis 2.8, if there was a network disruption between the master and slave databases during command propagation, the slave would have to perform a full replication with the master, which incurred a high cost.

Starting from Redis 2.8, if the network breaks, the master and slave databases will continue to sync using incremental replication. From the name, it’s easy to guess the difference between incremental replication and full replication: full replication synchronizes all data, while incremental replication only synchronizes the commands received by the master during the network disconnection period to the slave.

So, how do the master and slave databases stay in sync during incremental replication? The key lies in the repl_backlog_buffer. Let’s see how it is used for incremental command synchronization.

When the master and slave databases disconnect, the master will write the received write commands during the disconnection period to the replication buffer, and also write these commands to the repl_backlog_buffer.

The repl_backlog_buffer is a circular buffer, and the master will record its own write position, while the slave will record its own read position.

Initially, the write and read positions of the master and slave are together, which can be considered as their starting position. As the master receives new write commands, its write position in the buffer will gradually deviate from the starting position. This offset distance is usually measured by the master_repl_offset. The more new write commands the master receives, the larger the master_repl_offset.

Similarly, after the slave completes the replication of the write commands, its read position in the buffer also starts to deviate from the starting position. At this point, the offset slave_repl_offset that the slave has replicated will also continue to increase. Normally, these two offset values are almost the same.

Use of Redis repl_backlog_buffer

After the connection between the master and slave databases is restored, the slave will first send the psync command to the master and send its current slave_repl_offset to the master. The master will determine the difference between its master_repl_offset and the slave_repl_offset.

During the network disconnection stage, the master may receive new write commands, so generally, the master_repl_offset will be greater than the slave_repl_offset. At this point, the master only needs to synchronize the commands between the master_repl_offset and slave_repl_offset to the slave.

Just like the middle part of the previous diagram, the master and the slave have two different operations, put d e and put d f. During incremental replication, the master only needs to synchronize these operations to the slave.

Speaking of this, let’s take a look at the process of incremental replication with the help of a diagram.

Redis incremental replication process

However, there is one thing that I want to emphasize. Since repl_backlog_buffer is a circular buffer, when it becomes full, the master will continue to write to it, which will overwrite the previously written operations. If the reading speed of the slave is slower, it is likely that the operations that the slave has not yet read will be overwritten by the new operations written by the master, resulting in data inconsistency between the master and slave databases.

Therefore, we need to find a way to avoid this situation. Generally, we can adjust the repl_backlog_size parameter. This parameter is related to the required buffer space size. The calculation formula for buffer space is: buffer space size = master write command speed * operation size - master-slave network transfer command speed * operation size. In actual applications, considering the possibility of some sudden request pressure, we usually need to double the buffer space, that is, repl_backlog_size = buffer space size * 2, which is the final value of repl_backlog_size.

For example, if the master writes 2000 operations per second, each operation size is 2KB, and the network can transmit 1000 operations per second, then there are 1000 operations that need to be buffered, which requires at least 2MB of buffer space. Otherwise, newly written commands will overwrite old operations. To cope with possible burst pressure, we finally set the repl_backlog_size to 4MB.

In this way, the risk of data inconsistency between the master and slave during incremental replication is reduced. However, if the concurrency is very high and even twice the buffer space cannot accommodate new operation requests, the data between the master and slave databases may still be inconsistent.

In this case, on the one hand, you can increase the repl_backlog_size value according to the memory resources of the server where Redis is located, for example, set it to four times the buffer space size. On the other hand, you can consider using sharded clusters to distribute the request pressure of a single master. I will give a detailed introduction to sharded clusters in Lesson 9.

Conclusion #

In this lesson, we have learned the basic principles of master-slave synchronization in Redis. In summary, there are three modes: full sync, command propagation based on long connections, and partial sync.

Although full sync takes time, it is unavoidable for initial sync on slave instances. Therefore, I have a small suggestion for you: the size of a Redis instance should not be too large, and a size in the range of several gigabytes is more suitable. This can reduce the overhead of RDB file generation, transmission, and reloading. Additionally, to avoid multiple slaves performing full sync with the master at the same time and putting too much pressure on the master, we can use a cascading mode of “master-slave-slave” to relieve the pressure on the master.

Long connection replication is the regular sync phase between master and slave after normal operation. In this phase, synchronization is achieved through command propagation between master and slave. However, if there is a network disconnection during this phase, partial sync comes into play. I especially recommend paying attention to the configuration parameter repl_backlog_size. If it is configured too small, it may cause the replication progress of the slave to fall behind the master during partial sync, leading to a full sync on the slave. Therefore, increasing this parameter can reduce the risk of full sync on the slave during network disconnections.

However, while using the master-slave mode with read-write separation avoids data inconsistency caused by writing to multiple instances simultaneously, it still faces the potential risk of master failure. What should be done if the master fails? Can the data remain consistent and can Redis continue to provide services? In the next two lessons, I will discuss with you specific solutions to ensure service reliability after master failure.

One Question per Lesson #

As usual, I have a small question for you. In this lesson, I mentioned that RDB files are used for data replication and synchronization between master and slave databases. As we learned before, AOF records more complete operation commands and loses less data compared to RDB. So, why don’t we use AOF for replication between master and slave databases?

Alright, that’s all for this lesson. If you found it helpful, feel free to share today’s content with your friends.