29 How to Properly Implement Circular Buffers

29 How to Properly Implement Circular Buffers #

Starting today, we will enter the final module of this course, which is the “Programming Techniques Module”. Redis, as a widely used system, not only provides us with valuable knowledge in terms of its own functionality and performance optimization, but also offers programming techniques within its source code worth learning and mastering.

In this module, I will guide you through the implementation and design of Redis in areas such as circular buffer, monitoring, and feature extension modules. The development of these functionalities is crucial for backend system software.

Now, in today’s lesson, I will introduce you to the implementation of circular buffer in Redis.

When developing backend data systems, we often face the challenge of data synchronization. In order to address this issue, the design and implementation of a buffer must be considered. Circular buffer is a commonly used technique in buffer development. Therefore, learning the content of this lesson enables us to understand how to implement a circular buffer, especially focusing on the solution to implementation difficulties. This knowledge will provide us with a reference implementation when developing our own data synchronization.

Okay, let’s now take a look at how a circular buffer works. With this knowledge, we will be able to better understand and master the code implementation in Redis.

How does a circular buffer work? #

In the backend data systems, to ensure data reliability, we often use a master-slave cluster to synchronize data between the master and the slave nodes. Generally, the master node first performs a full synchronization with the slave nodes, transferring all data at a certain moment to the slave nodes.

Then, the master node continues to send commands received to the slave nodes. During this sending process, if the network connection between the slave and master nodes is disconnected, the master node needs to cache the commands intended for the slave node. This way, when the network connection between the slave and master nodes is restored, the master node can resend the cached commands to the slave node. In this scenario, we need to use a buffer to temporarily store the commands received by the master node.

Now, when designing a buffer in a backend system, we face a problem: What should we do when the buffer space is not enough?

An easily conceivable solution is to dynamically allocate buffer space when the buffer is not enough. However, dynamic buffer allocation incurs certain overhead and if space is allocated whenever it is not enough, it can result in continuous growth of the buffer space, occupying excessive memory resources.

In reality, the data in the buffer can be deleted after being sent to the slave node, freeing up new space. In other words, the buffer space can be reused, as shown in the figure below:

Therefore, the implementation of a circular buffer uses the design principle of reusing buffer space, allowing the space to be cyclically written. In terms of its working mechanism, it has two characteristics:

A circular buffer has a write pointer that represents the current write position of the master node in the buffer. If the write pointer already points to the end of the buffer, when the master node writes data at this point, the write pointer will be reset to the beginning of the buffer, and data will be written from the beginning, reusing the buffer space.
A circular buffer also has one or more read pointers, representing the current read positions of different slave nodes in the buffer. When the read pointer points to the end of the buffer, the slave node will reset the read pointer to the beginning of the buffer and continue reading data from the beginning.

The following figure illustrates the working mechanism of the write pointer in a circular buffer, and you can have a look, the working mechanism of the read pointer is similar to this.

Now that we understand the working principle of a circular buffer, let’s see how it is implemented in Redis.

How to implement a circular buffer in Redis? #

In the implementation of Redis master-slave replication, a circular buffer is used to temporarily store commands received by the master node.

Before diving into the specific implementation, let’s understand that in Redis replication, the master node accumulates the total length of the commands it receives for replication. This total length is called the global replication offset (referred to as the global offset).

In the source code, the global offset corresponds to the master_repl_offset member variable of the global variable server. And when the slave node reads commands from the master node, it also records the cumulative position of the commands it has read. This position is called the global read position.

To help you understand, let me give you an example. Suppose the master node receives three commands, each with a length of 16 bytes. At this point, the global offset is 48. Now, if a slave node reads one command from the master node, the global read position of the slave node is 16.

Since the global offset and the global read position will be repeatedly used in the implementation of the circular buffer that I will explain next, it is important for you to understand their meanings.

Now, let’s take a look at the data structure of the circular buffer itself.

Data Structure and Initialization of the Circular Buffer #

In the Redis global variable server structure redisServer, the data structure of the circular buffer is included, as shown below:

struct redisServer {
  …
  char *repl_backlog;             // Circular buffer based on character array
  long long repl_backlog_size;    // Total length of the circular buffer
  long long repl_backlog_histlen; // Cumulative length of data in the circular buffer
  long long repl_backlog_idx;     // Write pointer position in the circular buffer
  long long repl_backlog_off;     // Offset of the earliest saved data byte in the circular buffer within the global scope
  …
}

From the code, you can see that the circular buffer itself is designed as a character array repl_backlog. Redis also introduces the following variables to describe the state of the circular buffer, including:

repl_backlog_size: The value of this variable records the total length of the circular buffer. This value also corresponds to the repl-backlog-size configuration option in the redis.conf configuration file.
repl_backlog_histlen: The value of this variable records the current cumulative length of the data in the circular buffer. This value does not exceed the length of the buffer.
repl_backlog_idx: The value of this variable records the position where data should be written to the circular buffer next. It corresponds to the write pointer in the circular buffer mentioned earlier.
repl_backlog_off: The value of this variable records the offset of the earliest saved data byte within the circular buffer within the global scope. It is important to note that since the circular buffer is continuously reused, once the buffer is full and new data is written from the beginning, the old data in the buffer will be overwritten. Therefore, this value records the offset of the earliest written data byte that is still retained in the buffer within the global scope.

These variables are repeatedly used in the reading and writing processes of the circular buffer, so it is important for you to understand their meanings, as it will help you grasp the implementation of the circular buffer.

Next, let’s take a look at the creation and read-write process of the circular buffer. Since the circular buffer is used in the process of master-slave replication, the relevant operations are also implemented in the replication.c file.

First, let’s take a look at the creation function createReplicationBacklog of the circular buffer. The logic of this function is relatively simple. It reads the size of the circular buffer configuration option repl-backlog-size from the configuration file, and then allocates space for the circular buffer based on this value.

Next, this function initializes repl_backlog_histlen to 0, indicating that there is currently no data written. At the same time, it initializes repl_backlog_idx to 0, indicating that data can be written to the beginning of the buffer. In addition, the createReplicationBacklog function sets repl_backlog_off to the value of master_repl_offset + 1.

The initialization code is as follows:

void createReplicationBacklog(void) {
  serverAssert(server.repl_backlog == NULL);
  server.repl_backlog = zmalloc(server.repl_backlog_size);
  server.repl_backlog_histlen = 0;
  server.repl_backlog_idx = 0;
  server.repl_backlog_off = server.master_repl_offset + 1;
}

Here, it is important to note that Redis calls the createReplicationBacklog function to create the circular buffer in the syncCommand function. The relevant code snippet is as follows:

void syncCommand(client *c) {
  …
  if (listLength(server.slaves) == 1 && server.repl_backlog == NULL) {
    …
    createReplicationBacklog();
  }
  …
}

From the code, you can see that the condition for Redis to create the circular buffer is that there is no circular buffer currently available and there is only one slave node at present. This means that when a master node has multiple slave nodes, these slave nodes will actually share a single circular buffer. This design is mainly to avoid allocating a buffer for each slave node, which would waste memory resources. Alright, now that we understand the data structure and initialization operations of circular buffer, let’s take a look at its read and write operations separately.

Write Operation of Circular Buffer #

First, let’s understand the write operation of the circular buffer, which is implemented by the function feedReplicationBacklog. The prototype of this function is as follows:

void feedReplicationBacklog(void *ptr, size_t len)

The feedReplicationBacklog function takes a pointer ptr as its parameter, which points to the data to be written to the buffer, and an integer len, which represents the length of the data to be written.

The write operation of the circular buffer can be divided into three steps.

Step 1: The feedReplicationBacklog function updates the value of the global variable server.master_repl_offset by adding the length of the data to be written(len) to the current value. It looks like this:

server.master_repl_offset += len;

Step 2: The feedReplicationBacklog function goes through a loop based on the parameter len, which continues until all the data to be written is written into the circular buffer. And this loop operation can be further divided into three sub-steps.

First, calculate the length of data that can be written in this round.

The feedReplicationBacklog function calculates the remaining space length thislen of the circular buffer. If the remaining space is greater than the length of the data to be written, it sets thislen to the actual length of the data to be written. The code looks like this:

while (len) {
  size_t thislen = server.repl_backlog_size - server.repl_backlog_idx;
  if (thislen > len) thislen = len;
  ...
}

Next, write the data to the buffer.

Based on the value of the thislen variable calculated in the first step, the function calls the memcpy function to write the data to be written into the circular buffer. The position to write is pointed by repl_backlog_idx, and the length to write is thislen. The code for this step looks like this:

memcpy(server.repl_backlog+server.repl_backlog_idx,p,thislen);

By combining the operations of the first two steps, you can see that the feedReplicationBacklog function, when writing data, writes the data according to the actual length if it is smaller than the remaining length of the buffer. Otherwise, it writes the data according to the remaining space. This means that the feedReplicationBacklog function tries to write as much data as possible in each round, with the maximum amount of data written in each round being the total length of the buffer.

Alright, at this point, the circular buffer has been written with data once, and then comes the final third step of this round.

Step 3: Finally, the feedReplicationBacklog function updates the status information of the circular buffer at the end of each round, including repl_backlog_idx and repl_backlog_histlen.

As for repl_backlog_idx, it increases by the length of the data that was just written (thislen). However, because the total size of the buffer repl_backlog_size is fixed, if the value of repl_backlog_idx equals the value of repl_backlog_size, repl_backlog_idx is set to 0. This indicates that the circular buffer is full at this point. The write pointer will point to the beginning of the circular buffer, and writing starts again from the beginning. The logic of this part of the code looks like this:

while (len) {
  ...
  server.repl_backlog_idx += thislen;
  if (server.repl_backlog_idx == server.repl_backlog_size)
    server.repl_backlog_idx = 0;
  ...
}

As for repl_backlog_histlen, at the end of each round of the loop, it is added with the length of the data that was just written (thislen). In addition, the feedReplicationBacklog function also updates the remaining length of the data to be written and the pointer position of the data to be written. The code for these steps looks like this:

void feedReplicationBacklog() {
    // ...省略部分代码...

    if (len == 0) return;  //待写入数据长度为0，直接返回
    while(len) {
       ...  //循环体代码
    }

    // ...省略部分代码...
}

写操作结束后，我们来看一下循环缓冲区的读取操作。读取操作发生在主节点接收到连接正常的从节点发送的 sync 命令时。

首先，我们来看下 feedReplicationBacklog 函数内部的代码。在代码中，检查了待写入数据长度 len 是否为0，如果为0则直接返回。之后是一个 while 循环，用于处理读取操作。

在读取操作的 while 循环内部，具体执行的操作是：首先，计算出待读取数据长度 thislen，即主节点需要从循环缓冲区中读取的数据长度，计算公式为 min(len, server.repl_backlog_size - (repl_backlog_idx - server.repl_backlog_off))。其中，len 是待读取数据的剩余长度，server.repl_backlog_size 是循环缓冲区的总长度，repl_backlog_idx 是循环缓冲区中下一个要写入位置的索引，server.repl_backlog_off 是全局复制偏移量减去循环缓冲区的历史长度再加1。

接下来，通过 memcpy 函数将数据从循环缓冲区中拷贝到发送数据的缓冲区中，然后更新相关的状态值 len、p 和 server.repl_backlog_histlen。

最后，判断一下读取操作是否已经结束：即检查待读取数据长度 len 是否为0。如果是，则说明所有数据已经读取完成，退出 while 循环。

下面是具体的代码实现：

unsigned char* p = buf;

if (len == 0) return;

while(len) {
    /* If this will overflow the buffer, write till the end, and rotate. */
    size_t thislen = (len > (server.repl_backlog_size - (repl_backlog_idx - server.repl_backlog_off))) ?
                     (server.repl_backlog_size - (repl_backlog_idx - server.repl_backlog_off)) : len;
    
    memcpy(p, server.repl_backlog + repl_backlog_idx, thislen);

    len -= thislen;
    p += thislen;
    server.repl_backlog_histlen += thislen;

    /* Rotate if we reached the end of the fixed buffer. */
    if (repl_backlog_idx + thislen == server.repl_backlog_size)
        repl_backlog_idx = 0;
    else
        repl_backlog_idx += thislen;
}

这样，feedReplicationBacklog 函数的读取操作就完成了。需要注意的是，循环缓冲区的读取操作是从缓冲区的前部开始读取，然后逐渐后移，再次读取，直到所有待读取的数据都读取完毕。

接下来的内容将会进一步介绍循环缓冲区相关的数据结构和主从复制过程中的一些逻辑。请继续阅读。 In Redis, when a slave node reconnects to the master node after a disconnection, the slave node sends a PSYNC command to the master node, which includes the slave node’s global replication offset, as shown below:

PSYNC <runid> <offset>

After receiving the PSYNC command, the master node processes it in the masterTryPartialResynchronization function. This function includes a call to the addReplyReplicationBacklog function, which reads data from the circular buffer. So, by examining the implementation of the addReplyReplicationBacklog function, we can understand how the circular buffer is read.

The execution logic of the addReplyReplicationBacklog function can be divided into three steps.

First, it subtracts the value of repl_backlog_off from the global replication offset sent by the slave node to obtain the length of data to be skipped by the slave node when reading data. It is calculated as follows:

skip = offset - server.repl_backlog_off;

As I mentioned earlier, repl_backlog_off represents the offset of the earliest data still present in the buffer, globally. The global replication offset of the slave node may not be consistent with repl_backlog_off, so subtracting the two gives us the length of data to be skipped by the slave node.

Next, the addReplyReplicationBacklog function calculates the position in the buffer corresponding to the first byte of the earliest saved data in the buffer. This position is important because it allows the slave node to convert the global replication offset to a corresponding position in the buffer. The calculation of this position is as follows:

j = (server.repl_backlog_idx + (server.repl_backlog_size-server.repl_backlog_histlen)) % server.repl_backlog_size;

Actually, we can understand this calculation code in two situations.

First, when the buffer is not full. In this case, repl_backlog_histlen is equal to repl_backlog_idx, so the calculation is equivalent to taking repl_backlog_size modulo itself, which yields 0. This means that when the buffer is not full, the first byte of the earliest saved data in the buffer is at the beginning of the buffer, because the buffer has not been overwritten. You can refer to the diagram below.

Second, when the buffer has been filled and data has been rewritten from the beginning. In this case, repl_backlog_histlen is equal to the total length of the buffer, repl_backlog_size. So, the calculation is equivalent to taking repl_backlog_idx modulo repl_backlog_size, which yields the value of repl_backlog_idx.

This is easy to understand because repl_backlog_idx points to the position for the next write, and when the buffer is filled, this position contains data. Once the buffer is overwritten, this position will be overwritten as well. You can refer to the diagram below.

After the calculation of the position in the buffer corresponding to the first byte of the earliest saved data, the addReplyReplicationBacklog function adds the length of data to be skipped by the slave node to adjust the position, taking into account that this adjusted position may exceed the boundary of the buffer length. Therefore, it takes repl_backlog_size modulo, resulting in the corresponding position of the slave node’s global replication offset in the buffer:

j = (j + skip) % server.repl_backlog_size;

Now, we know where the slave node should start reading data from the buffer.

Finally, the addReplyReplicationBacklog function calculates the actual length of data to be read by subtracting the length of data to be skipped by the slave node from the actual length of data in the buffer:

len = server.repl_backlog_histlen - skip;

Then, the addReplyReplicationBacklog function enters a loop to actually read the data. The reason for using a loop to read data is that in a circular buffer, the slave node may start reading from a position and continue until the end of the buffer, without finishing. In that case, the slave node needs to read from the beginning of the buffer again, which requires two read operations.

The code below shows this loop process:

while(len) {
   long long thislen = ((server.repl_backlog_size - j) < len) ?
            (server.repl_backlog_size - j) : len;
   ...
   addReplySds(c,sdsnewlen(server.repl_backlog + j, thislen)); // reads and returns the data
   len -= thislen;
   j = 0;
}

As you can see, if the length from the position to the end of the buffer (repl_backlog_size - j) is less than the length to be read (len), it means that the slave node needs to continue reading from the beginning of the buffer. In this case, the function reads from the position to the end of the buffer (server.repl_backlog_size - j).

When the length from the starting position to the end of the buffer (repl_backlog_size - j) is greater than the length to be read (len), the function directly reads the required length.

During this process, the addReplyReplicationBacklog function calls the addReplySds function to return the read data.

So, this is the entire process of reading from and writing to the circular buffer. From this process, you can see that the key to understanding the read and write operations on the circular buffer is to understand the buffer length, the next write position, the corresponding positions of the earliest saved data in both the global and buffer contexts, and the calculation of the buffer position corresponding to the slave node’s global replication offset position.

Summary #

That’s all for today’s lesson. Let’s summarize.

In today’s class, I introduced you to the working principle of circular buffers and how Redis implements circular buffers in master-slave replication. From a working principle perspective, circular buffers don’t seem complicated. When the buffer is full, the program starts writing data from the beginning again. And when the program has read to the end of the buffer, it continues to read from the beginning.

However, I want to remind you of the two challenges you may face when implementing circular buffers.

First, the accumulated length of data to be sent may exceed the buffer length, so old data may be overwritten. When writing data, how do we know the position in the buffer that corresponds to the first byte of the earliest saved data and its position in the global scope? This is also the earliest data that can be read by the slave node. The Redis source code uses the repl_backlog_off variable to record this position, and you need to understand how this value is calculated and used.

Second, when reading data, how do we know the corresponding position of the global read position sent by the slave node in the circular buffer? Because the program can only read data from the buffer when it has this position value. In the Redis source code, the position value is calculated in two steps in the addReplyReplicationBacklog function.

First, it calculates the position of the first byte of the earliest saved data in the buffer; then, based on this position, it adds the length of data that the slave node needs to skip, resulting in the global read position of the slave node in the buffer.

In fact, circular buffers are widely used in data synchronization, communication, and other scenarios. I hope you can understand and master the implementation of circular buffers in Redis source code, especially the implementation methods for these two challenges. This way, it will be easier for you to implement circular buffers on your own. 根据代码逻辑，skip 是 offset - server.repl_backlog_off 的计算结果，而 len 是 server.repl_backlog_histlen - skip 的计算结果。

在这里的情况下，如果 offset 大于 server.repl_backlog_off，那么 skip 将为正数，可能会导致 skip 大于 repl_backlog_histlen。但是，如果 offset 小于或等于 server.repl_backlog_off，那么 skip 将为负数或者零，不可能大于 repl_backlog_histlen。

因此，不排除 skip 可能大于 repl_backlog_histlen 的情况，这取决于 offset 和 server.repl_backlog_off 的值。