20 Where Are New Writes Recorded During Aof Rewrite

20 Where Are New Writes Recorded During AOF Rewrite #

In the previous lesson, I introduced you to the process of AOF rewriting and highlighted the triggers and basic execution flow of AOF rewriting. Now you know that AOF rewriting is performed by the rewriting child process.

However, at the end of the previous lesson, I also mentioned that during AOF rewriting, the main process is still receiving write operations from clients. So, are these new write operations recorded in the AOF rewrite log? If so, how does the rewriting child process obtain these write operations from the main process?

In today’s lesson, I will explain the pipeline mechanism used in the AOF rewriting process and the interaction between the main process and the rewriting child process. This will help you understand the extent to which the AOF rewrite log contains write operations. It will also give you an understanding of pipeline technology, as the AOF rewriting child process communicates with the Redis main process using the pipeline mechanism provided by the operating system. After completing this lesson, you will also be able to grasp pipeline technology for inter-process communication.

Alright, let’s now dive into understanding the pipeline mechanism.

How to Use Pipes for Interprocess Communication between Parent and Child Processes? #

First, we need to understand that when process A creates a child process B by calling the fork function, and process A and B need to communicate, we typically rely on the communication mechanism provided by the operating system. The pipe is a commonly used mechanism for interprocess communication between parent and child processes.

Specifically, the pipe mechanism creates a buffer in the operating system kernel. The parent process A can open the pipe and write data into this buffer. Similarly, the child process B can also open the pipe and read data from this buffer. It’s worth noting that when a process writes data into the pipe, it can only append the data to the current tail of the buffer. And when a process reads data from the pipe, it can only read from the head of the buffer.

In fact, the buffer created by the pipe is like a first-in-first-out (FIFO) queue. The process that writes data writes to the tail of the queue, while the process that reads data reads from the head of the queue. The following diagram illustrates the process of using pipes for data communication between two processes:

Now that we have learned about the basic functionality of pipes, let’s look at a key point to note when using pipes. Data in the pipe can only flow in one direction at a time. This means that if the parent process A writes data into the pipe, the child process B can only read data from the pipe. Similarly, if the child process B writes data into the pipe, the parent process A can only read data from the pipe. If simultaneous data transmission and communication are required between the parent and child processes, we need to create two pipes.

Next, let’s see how to implement pipe communication with code. This is related to the system call pipe which creates the pipe. The function prototype of pipe is as follows:

int pipe(int pipefd[2]);

As you can see, the parameter of pipe is an array pipefd, representing the file descriptors of the pipe. This is because when a process writes or reads data from the pipe, it actually uses the write or read functions, which require file descriptors to perform write and read operations. The array pipefd has two elements, pipefd[0] and pipefd[1], corresponding to the read and write descriptors of the pipe, respectively. This means that when a process needs to read data from the pipe, it needs to use pipefd[0], and when it needs to write data into the pipe, it uses pipefd[1].

Here’s an example code that demonstrates how parent and child processes use pipes for communication:

int main() 
{ 
    int fd[2], nr = 0, nw = 0; 
    char buf[128]; 
    pipe(fd); 

    pid_t pid = fork(); 

    if (pid == 0) {
        // Child process calls read to read data from the descriptor fd[0]
        printf("Child process waiting for a message\n"); 
        nr = read(fd[0], buf, sizeof(buf)); 
        printf("Child process received: %s\n", buf);
    }
    else { 
        // Parent process calls write to write data into the descriptor fd[1]
        printf("Parent process sending a message\n"); 
        strcpy(buf, "Hello from parent"); 
        nw = write(fd[1], buf, sizeof(buf)); 
        printf("Parent process sent %d bytes to the child.\n", nw); 
    } 

    return 0; 
}

From the code, you can see that before the parent and child processes communicate with each other using pipes, we need to define an array fd to store the read and write descriptors in the code. Then, we call the pipe function to create a pipe and pass the fd array as a parameter to the pipe function. Next, in the code of the parent process, the parent process calls the write function to write data into the pipe descriptor fd[1], while in the child process, the child process calls the read function to read data from the pipe descriptor fd[0].

To help you better understand, I’ve also created a diagram for you to refer to:

Now that you understand how to use pipes for communication between parent and child processes, let’s take a look at how the child process communicates with the main process (its parent process) using pipes during the AOF rewrite process.

How does AOF rewrite child processes interact with the parent process using pipes? #

Let’s first take a look at how many pipes are created during AOF rewrite process.

In fact, this is done by calling the aofCreatePipes function in the process of executing the AOF rewrite function rewriteAppendOnlyFileBackground, as shown below:

int rewriteAppendOnlyFileBackground(void) {
...
if (aofCreatePipes() != C_OK) return C_ERR;
...
}

The aofCreatePipes function is implemented in the aof.c file. Its logic is relatively simple and can be divided into three steps.

In the first step, the aofCreatePipes function creates an array fds containing six file descriptor elements. As I just mentioned, each pipe corresponds to two file descriptors. Therefore, the fds array actually corresponds to the three pipes used in the AOF rewrite process. Then, the aofCreatePipes function calls the pipe system call function to create three pipes.

The code snippet for this part is shown below for your reference.

int aofCreatePipes(void) {
  int fds[6] = {-1, -1, -1, -1, -1, -1};
  int j;
  if (pipe(fds) == -1) goto error; /* parent -> children data. */
  if (pipe(fds+2) == -1) goto error; /* children -> parent ack. */
  if (pipe(fds+4) == -1) goto error;
  
}

In the second step, the aofCreatePipes function calls the anetNonBlock function (defined in anet.c) to set the first and second descriptors (fds[0] and fds[1]) of the fds array corresponding to the pipe as non-blocking. Then, the aofCreatePipes function calls the aeCreateFileEvent function to register the read event listener on the third descriptor (fds[2]) of the fds array. The corresponding callback function is aofChildPipeReadable. The aofChildPipeReadable function is also implemented in the aof.c file, and I will explain it in detail later.

int aofCreatePipes(void) {
...
if (anetNonBlock(NULL,fds[0]) != ANET_OK) goto error;
if (anetNonBlock(NULL,fds[1]) != ANET_OK) goto error;
if (aeCreateFileEvent(server.el, fds[2], AE_READABLE, aofChildPipeReadable, NULL) == AE_ERR) goto error;
...
}

After completing the pipe creation, setting, and read event registration, in the final step, the aofCreatePipes function copies the six file descriptors in the fds array to the member variables of the server variable as follows:

int aofCreatePipes(void) {
...
server.aof_pipe_write_data_to_child = fds[1];
server.aof_pipe_read_data_from_parent = fds[0];
server.aof_pipe_write_ack_to_parent = fds[3];
server.aof_pipe_read_ack_from_child = fds[2];
server.aof_pipe_write_ack_to_child = fds[5];
server.aof_pipe_read_ack_from_parent = fds[4];
...
}

In this step, we can see the three pipes created by the aofCreatePipes function and their respective uses from the member variable names of the server variable.

  • fds[0] and fds[1]: Correspond to the pipes used for passing operation commands between the main process and the rewrite child process. They correspond to the read and write descriptors, respectively.
  • fds[2] and fds[3]: Correspond to the pipes used for the rewrite child process to send ACK messages to the parent process. They correspond to the read and write descriptors, respectively.
  • fds[4] and fds[5]: Correspond to the pipes used for the parent process to send ACK messages to the rewrite child process. They correspond to the read and write descriptors, respectively.

The following diagram also shows the basic execution flow of the aofCreatePipes function for reference.

Now that we understand the number and purpose of the pipes used in the AOF rewriting process, let’s take a closer look at how these pipes are used.

Usage of the Operation Command Transmission Pipe #

In fact, when the AOF rewrite child process is running, the main process will continue to receive and process write requests from clients. These write operations will be written to the AOF log file by the main process as usual, and this process is completed by the feedAppendOnlyFile function in aof.c.

In the last step of the feedAppendOnlyFile function, it checks whether there is an AOF rewrite child process running. If so, it calls the aofRewriteBufferAppend function (in aof.c) as shown below:

if (server.aof_child_pid != -1)
        aofRewriteBufferAppend((unsigned char*)buf,sdslen(buf));

The aofRewriteBufferAppend function appends the parameter buf to the server global variable’s aof_rewrite_buf_blocks list.

Here, you need to note that the parameter buf is a byte array. The feedAppendOnlyFile function writes the received command operations from the main process into buf. Each element in the aof_rewrite_buf_blocks list is of type aofrwblock, which includes a byte array with a size of AOF_RW_BUF_BLOCK_SIZE, where the default value is 10MB. In addition, the aofrwblock structure records the used and free space of the byte array.

The following code shows the definition of the aofrwblock structure for reference:

typedef struct aofrwblock {
    unsigned long used, free; // Used and free space of the buf array
    char buf[AOF_RW_BUF_BLOCK_SIZE]; // Macro-defined, default is 10MB
} aofrwblock;

In this way, the aofrwblock structure represents a 10MB data block that records the commands received by the main process during AOF rewriting, and the aof_rewrite_buf_blocks list is responsible for concatenating these data blocks. When the aofRewriteBufferAppend function is executed, it takes out an aofrwblock data block from the aof_rewrite_buf_blocks list to record the command operations.

Of course, if the current data block is not large enough to store the command operations recorded in the buf parameter, the aofRewriteBufferAppend function will allocate another aofrwblock data block.

After the aofRewriteBufferAppend function records the command operations in the aof_rewrite_buf_blocks list, it also checks whether the write event is registered on the aof_pipe_write_data_to_child pipe descriptor. This pipe descriptor corresponds to the fds[1] I mentioned earlier.

If there is no registered write event, the aofRewriteBufferAppend function calls the aeCreateFileEvent function to register a write event that listens to the aof_pipe_write_data_to_child pipe descriptor, which is the pipe for operation command transmission between the main process and the rewrite child process.

When this pipe is ready to write data, the callback function aofChildWriteDiffData (in aof.c) corresponding to the write event will be called. You can refer to the following code for this process:

void aofRewriteBufferAppend(unsigned char *s, unsigned long len) {
...
// Check if there is an event on the aof_pipe_write_data_to_child descriptor
if (aeGetFileEvents(server.el,server.aof_pipe_write_data_to_child) == 0) {
     // If no event is registered, register a write event with the callback function aofChildWriteDiffData
     aeCreateFileEvent(server.el, server.aof_pipe_write_data_to_child,
            AE_WRITABLE, aofChildWriteDiffData, NULL);
}
...}

Actually, the main purpose of the aofChildWriteDiffData function that I introduced earlier as the write event callback function is to retrieve data blocks one by one from the aof_rewrite_buf_blocks list. Then, it sends the command operations from the data block to the rewrite child process through the aof_pipe_write_data_to_child pipe descriptor. The process is illustrated below:

void aofChildWriteDiffData(aeEventLoop *el, int fd, void *privdata, int mask) {
    ...
    while(1) {
        // Retrieve data block from the `aof_rewrite_buf_blocks` list
        ln = listFirst(server.aof_rewrite_buf_blocks);
        block = ln ? ln->value : NULL;
        if (block->used > 0) {
            // Call write to write the data block into the pipe between the main process and the rewrite child process
            nwritten = write(server.aof_pipe_write_data_to_child, block->buf, block->used);
            if (nwritten <= 0) return;
            ...
        }
    ...
}

With this, you understand that the main process actually writes the received command operations into data blocks in the aof_rewrite_buf_blocks list when it is normally recording AOF logs. Then, it uses the aofChildWriteDiffData function to send the recorded command operations to the child process through the pipe between the main process and the rewrite child process.

The following diagram also illustrates this process, you can review it again.

Process Diagram

Next, let’s see how the rewrite child process reads the command operations sent by the parent process from the pipe.

This is actually done by the aofReadDiffFromParent function (in the aof.c file). This function uses a buffer of size 64KB and calls the read function to read the data from the pipe between the parent process and the rewrite child process. The following code also shows the basic execution flow of the aofReadDiffFromParent function.

ssize_t aofReadDiffFromParent(void) {
    char buf[65536]; // Default buffer size of the pipe
    ssize_t nread, total = 0;
    // Read data from `aof_pipe_read_data_from_parent`
    while ((nread = read(server.aof_pipe_read_data_from_parent, buf, sizeof(buf))) > 0) {
        server.aof_child_diff = sdscatlen(server.aof_child_diff, buf, nread);
        total += nread;
    }
    return total;
}

So, from the code, you can see that the aofReadDiffFromParent function reads data from the aof_pipe_read_data_from_parent descriptor. Then, it appends the read command operations to the aof_child_diff string in the global variable server. And finally, at the end of the execution process of the AOF rewrite function rewriteAppendOnlyFile, the aof_child_diff string will be written to the AOF rewrite log file so that we can recover the operations received during the rewrite as much as possible when using AOF rewrite logs.

You can refer to the following code for the process of writing the aof_child_diff string to the rewrite log file:

int rewriteAppendOnlyFile(char *filename) {
    ...
    // Write the accumulated command operations in `aof_child_diff` to the AOF rewrite log file
    if (rioWrite(&aof, server.aof_child_diff, sdslen(server.aof_child_diff)) == 0)
        goto werr;
    ...
}

So, in other words, the aofReadDiffFromParent function implements the rewritable subprocess’ reading of operation commands from the main process. Therefore, we also need to clarify the following questions here: where will the aofReadDiffFromParent function be called, that is, when will the rewritable subprocess read the operations received from the main process from the pipe?

Actually, the aofReadDiffFromParent function will be called by the following three functions.

  • rewriteAppendOnlyFileRio function: This function is executed by the rewritable subprocess, and it is responsible for traversing each of the Redis databases, generating AOF rewrite log. During this process, it will occasionally call the aofReadDiffFromParent function.
  • rewriteAppendOnlyFile function: This function is the main function of the log rewriting process, and it is executed by the rewritable subprocess. It itself will call the rewriteAppendOnlyFileRio function. In addition, after calling the rewriteAppendOnlyFileRio function, it will also call the aofReadDiffFromParent function multiple times to read as many operation commands received by the main process during the log rewriting process.
  • rdbSaveRio function: This function is the main function for creating RDB files. When we use the hybrid persistence mechanism of AOF and RDB, this function will also call the aofReadDiffFromParent function.

From here, we can see that in the implementation of the AOF rewriting process in Redis source code, the rewritable subprocess is actually made to read newly received operation commands from the main process multiple times. This is done to record the latest operations as much as possible and provide a more complete record of operations.

Finally, let’s take a look at the use of the two pipes used to transmit the ACK information between the rewritable subprocess and the main process.

Use of the ACK pipe #

Just now, when introducing the creation of the pipes by the main process with the aofCreatePipes function, you learned that the main process registers a read event on the aof_pipe_read_ack_from_child pipe descriptor. This descriptor corresponds to the pipe from which the rewritable subprocess sends ACK information to the main process. At the same time, this descriptor is a read descriptor, which means that the main process reads ACK information from the pipe.

In fact, when the rewritable subprocess executes the rewriteAppendOnlyFile function, this function, after completing the log rewriting and reading operation commands from the parent process multiple times, will call the write function to write a "!" into the pipe corresponding to the aof_pipe_write_ack_to_parent descriptor. This is how the rewritable subprocess sends an ACK signal to the main process, telling the main process to stop sending new written operations that have been received. The process is shown below:

int rewriteAppendOnlyFile(char *filename) {
...
    if (write(server.aof_pipe_write_ack_to_parent,"!",1) != 1) goto werr;
...
}

Once the ACK pipe from the rewritable subprocess to the main process has data in it, the read event registered on the aof_pipe_read_ack_from_child pipe descriptor will be triggered, which means that there is data available to be read from this pipe. Then, the registered callback function aofChildPipeReadable (in the aof.c file) will be executed.

This function will check whether the data read from the aof_pipe_read_ack_from_child pipe descriptor is "!". If it is, it will call the write function to write a "!" into the pipe corresponding to the aof_pipe_write_ack_to_child descriptor, indicating that the main process has received the ACK information sent by the rewritable subprocess. At the same time, it will send an ACK message back to the rewritable subprocess. The process is shown below:

void aofChildPipeReadable(aeEventLoop *el, int fd, void *privdata, int mask) {
...
    if (read(fd,&byte,1) == 1 && byte == '!') {
        ...
        if (write(server.aof_pipe_write_ack_to_child,"!",1) != 1) { ...}
    }
...
}

Okay, at this point, we have learned that after the rewritable subprocess completes the log rewriting, it first sends an ACK message to the main process. Then, the main process listens for a read event on the aof_pipe_read_ack_from_child descriptor and calls the aofChildPipeReadable function to send an ACK message to the rewritable subprocess.

Finally, the rewriteAppendOnlyFile function executed by the rewritable subprocess calls the syncRead function to read the ACK information sent by the main process from the aof_pipe_read_ack_from_parent pipe descriptor, as shown below:

int rewriteAppendOnlyFile(char *filename) {
...
    if (syncRead(server.aof_pipe_read_ack_from_parent,&byte,1,5000) != 1  || byte != '!') goto werr
...
}

The diagram below also demonstrates the use of the ACK pipe, you can review it again.

In this way, the rewritable subprocess and the main process confirm the end of the rewriting process through two ACK pipes.

Summary #

In today’s lesson, I mainly introduced the pipe communication between the main process and the rewrite sub-process in AOF rewriting. Here, you need to focus on the use of the pipe mechanism and the process of communication between the main process and the rewrite sub-process using the pipe.

In this process, the AOF rewrite sub-process and the main process use a command transfer pipe and two ACK information sending pipes. The command transfer pipe is used for the main process to write the received new command and for the rewrite sub-process to read the commands. The ACK information sending pipe is used by the rewrite sub-process and the main process to confirm the end of the rewriting process. Finally, the rewrite sub-process will further record the received commands in the rewrite log file.

In this way, the new write operations received by the main process during the AOF rewriting will not be missed. On the one hand, these new write operations will be recorded in the normal AOF log, and on the other hand, the main process will cache the new write operations in the aof_rewrite_buf_blocks data block list and send them to the rewrite sub-process through the pipe. This ensures that the rewrite log has the latest and most complete write operations possible.

Finally, I would like to remind you again that the pipes we learned today belong to anonymous pipes, which are used for communication between parent and child processes. If you need to communicate between two processes that are not parent-child processes in actual development, you will need to use named pipes. Named pipes are saved as files in the file system, have corresponding paths and file names. In this way, the two processes of non-parent-child processes can open the pipe and communicate through the path and file name of the named pipe.

One question per lesson #

In today’s lesson, I introduced three pipes used for command transmission and ACK message passing between the child processes and the main process when rewriting. Now, can you find other places where pipes are used in the Redis source code?