21 Master Slave Replication Design and Implementation Based on State Machine

21 Master-Slave Replication Design and Implementation Based on State Machine #

In this lesson, I would like to talk to you about how Redis achieves master-slave replication based on the design philosophy of state machines.

Master-slave replication is a technique that we are probably familiar with because we often use it in Redis or MySQL databases to synchronize data between master and slave nodes, thus improving the high availability of services.

In terms of principles, Redis master-slave replication mainly involves three cases: full replication, incremental replication, and long connection synchronization. Full replication transfers the RDB file, incremental replication transfers commands during the disconnection between master and slave, while long connection synchronization transfers requests received by the master node to the slave node.

These three cases may seem simple, but when implementing them, we usually need to consider various logical processes under different states such as establishing the master-slave connection, handshake and verification, replication situation detection, and data transfer.

So, how can we efficiently implement master-slave replication?

In reality, Redis adopts the design philosophy of state machines to implement different states and transitions between states clearly. This design and implementation approach is actually very important when we implement network functionality, as it can prevent logical conflicts or omissions when dealing with different states. Therefore, in today’s lesson, I will introduce to you how to achieve master-slave replication based on state machines.

However, I want to clarify that since there are multiple states in master-slave replication, if we learn the details of each state all at once, we may easily confuse the differences between states and their transition relationships. So, in today’s lesson, I will first introduce the four stages of the replication process as a whole, and then we will gradually learn about the states and changes in each stage.

The Four Stages of Master-Slave Replication #

First, we can divide the entire replication process into four stages based on the key events that occur during master-slave replication. These stages are initialization, connection establishment, master-slave handshake, and replication type determination and execution. Let’s now understand the main tasks performed in each of these stages.

1. Initialization Stage

When we set a Redis instance A as a slave of another instance B, instance A completes the initialization process, which primarily involves obtaining the IP address and port number of the master. There are three ways to perform this initialization process.

  • Method 1: Execute the replicaof masterip masterport command on instance A, specifying the IP address (masterip) and port number (masterport) of instance B.
  • Method 2: Set replicaof masterip masterport in the configuration file of instance A, allowing instance A to obtain the master’s IP address and port number by parsing the file.
  • Method 3: When starting instance A, set the startup parameter --replicaof [masterip] [masterport]. By parsing the startup parameter, instance A can obtain the IP address and port number of the master.

2. Connection Establishment Stage

Next, once instance A has obtained the IP address and port number of the master, it will attempt to establish a TCP network connection with the master, and it will listen for commands sent by the master on the established network connection.

3. Master-Slave Handshake Stage

After establishing a connection between instance A and the master, instance A begins the handshake process with the master. In simple terms, the handshake process involves the exchange of PING-PONG messages between the master and the slave, while the slave verifies itself with the master based on the configuration information. Finally, the slave sends its own IP address, port number, and information about its support for diskless replication and PSYNC 2 protocol to the master.

Compared to the previous two stages, the master-slave handshake stage involves more operations and various states, so we need to first understand the operations to be completed in this stage. I will provide you with a detailed explanation shortly.

4. Replication Type Determination and Execution Stage

After completing the handshake between the master and the slave, the slave sends the PSYNC command to the master. Subsequently, the master responds with one of the three possible replies based on the command parameters sent by the slave: perform full replication, perform partial replication, or an error occurs. Finally, upon receiving the replication type reply, the slave proceeds to execute the specific replication operations accordingly.

The following diagram shows the overall process and the four stages of master-slave replication.

Now that we have understood the main stages of master-slave replication, let’s proceed to learn how Redis utilizes different states and transitions to enable data replication between the master and the slave.

Implementation of Master-Slave Replication Based on State Machine #

First, you need to know the benefits of implementing master-slave replication based on the state machine. The advantage is that when you develop a program, you only need to consider the specific operations to be performed in different states and the conditions for transitioning between states. Therefore, the design idea and implementation of master-slave replication based on the state machine transition in the Redis source code is worth learning.

So, what exactly does the state machine in master-slave replication correspond to? This is related to the data structure of the Redis instance.

Each Redis instance corresponds to a redisServer structure in the code, which contains various configurations related to the Redis instance, such as RDB, AOF, master-slave replication, sharded cluster configurations, etc. Then, the variable related to the master-slave replication state machine is repl_state. During master-slave replication, the slave is based on the changes in the value of this variable to implement execution and transition in different stages. The following code shows the variables related to replication in the redisServer structure, you can take a look.

struct redisServer {
   ...
   /* Replication related (slave) */
    char *masterauth;              /* Password for authentication with the master */
    char *masterhost;              /* Master host name */
    int masterport;                /* Master port */
    …
    client *master;       /* The client used by the slave to connect to the master */
    client *cached_master; /* Cached information about the master on the slave */
    int repl_state;         /* Replication state machine for the slave */
   ...
}

Okay, next, let’s learn about the state machine transitions and corresponding code implementation in each stage of the master-slave replication.

Initialization Stage #

First, when an instance starts, it will call the initServerConfig function in server.c to initialize the redisServer structure. At this time, the instance sets the initial state of the state machine to REPL_STATE_NONE, as shown below:

void initServerConfig(void) {
   …
   server.repl_state = REPL_STATE_NONE;
   …
}

Then, once the instance executes the replicaof masterip masterport command, it will call the replicaofCommand function in replication.c to process it. The replicaof command carries the masterip and masterport parameters, which correspond to the IP and port number of the master. If the replicaofCommand function determines that the instance has not previously recorded the IP and port number of the master, it means that the current instance can connect to the specified master.

Subsequently, the replicaofCommand function will call the replicationSetMaster function to set the master’s information. The code logic for this part is shown below:

/* Check whether the master information has been recorded, if it has been recorded, return the message that the connection has been established directly */
 if (server.masterhost && !strcasecmp(server.masterhost,c->argv[1]->ptr)&& server.masterport == port) {
    serverLog(LL_NOTICE,"REPLICAOF would result into synchronization with the master we are already connected with. No operation performed.");
  addReplySds(c,sdsnew("+OK Already connected to specified master\r\n"));
      return;
  }
  /* If the IP and port number of the master has not been recorded, set the master's information */
  replicationSetMaster(c->argv[1]->ptr, port);

Besides recording the IP and port number of the master, the replicationSetMaster function also sets the state machine of the slave instance to REPL_STATE_CONNECT. At this point, the initialization stage of master-slave replication is completed, and the state machine transitions from REPL_STATE_NONE to REPL_STATE_CONNECT. The process is shown below:

Connection Establishment Stage #

Next, let’s understand the state machine transitions in the connection establishment stage.

When the slave instance enters this stage, the state has already become REPL_STATE_CONNECT. So, when does the slave start establishing a network connection with the master?

This is related to the execution of Redis periodic tasks. The so-called periodic tasks have been discussed in [Lesson 11], which refers to tasks that a Redis instance repeatedly executes at regular time intervals during runtime. There are many periodic tasks in Redis, and one of them is the replicationCron() task. This task is executed every 1000ms, as shown in the code below:

int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
   …
   run_with_period(1000) replicationCron();
   …
}

The implementation logic of the replicationCron() task function is in server.c. In this task function, an important judgment is to check the replication state machine state of the slave. If the state machine state is REPL_STATE_CONNECT, the slave starts establishing a connection with the master. The connection is established by calling the connectWithMaster() function.

replicationCron() {
   …
   /* If the state of the slave instance is REPL_STATE_CONNECT, the slave connects to the master using connectWithMaster */
    if (server.repl_state == REPL_STATE_CONNECT) {
        serverLog(LL_NOTICE,"Connecting to MASTER %s:%d",
            server.masterhost, server.masterport);
        if (connectWithMaster() == C_OK) {
            serverLog(LL_NOTICE,"MASTER <-> REPLICA sync started");
        }
    }
    …
}

In this way, when the slave instance calls the connectWithMaster function, it first establishes a connection with the master using the anetTcpNonBlockBestEffortBindConnect function. Once the connection is successfully established, the slave instance creates read and write events on the connection and registers the syncWithMaster function as the callback function to handle the read and write events.

Finally, the connectWithMaster function sets the state machine of the slave instance to REPL_STATE_CONNECTING. The following code shows this part of the logic, you can take a look.

int connectWithMaster(void) {
    int fd;
    // Connecting the slave and the master
 fd = anetTcpNonBlockBestEffortBindConnect(NULL, server.masterhost,server.masterport,NET_FIRST_BIND_ADDR);
    …
 
// Register read and write events on the established connection, with the callback function syncWithMaster
 if(aeCreateFileEvent(server.el,fd,AE_READABLE|AE_WRITABLE,syncWithMaster, NULL) ==AE_ERR)
    {
        close(fd);
        serverLog(LL_WARNING,"Can't create readable event for SYNC");
        return C_ERR;
    }
 
    // After the connection is established, set the state machine to REPL_STATE_CONNECTING
    …
    server.repl_state = REPL_STATE_CONNECTING;
    return C_OK;
}

So, when the state of the slave instance changes to REPL_STATE_CONNECTING, the connection establishment phase is complete. The state transitions of the initialization phase and the connection establishment phase are shown in the following diagram for your reference.

Handshake Phase #

Next, after the master and slave libraries establish a network connection, the slave instance does not immediately start data synchronization. Instead, it first initiates a handshake communication with the master library.

The purpose of the handshake communication is mainly to authenticate the slave and master libraries, and for the slave to provide its IP address and port number to the master library. As I mentioned earlier, this phase involves multiple state transitions, but the logic behind these transitions is quite clear.

First, at the end of the connection establishment phase, the state machine of the slave instance is in the REPL_STATE_CONNECTING state. Once the connection between the master and slave is established, the syncWithMaster function of the slave instance is called. In this function, if the state of the slave instance is REPL_STATE_CONNECTING, the instance will send a PING message to the master and set the state machine to REPL_STATE_RECEIVE_PONG.

After receiving the PONG message from the master, the slave will sequentially send authentication information, port number, IP address, and information about support for RDB files and diskless replication to the master. Each handshake communication message corresponds to a set of state transitions in the slave. For example, before the slave sends authentication information to the master, it sets its state machine to REPL_STATE_SEND_AUTH, and then sends the actual authentication information to the master. After sending the authentication information, the state machine of the slave transitions to REPL_STATE_RECEIVE_AUTH, and it starts reading the authentication result information returned by the master.

In this way, when the slave performs handshake communication for port number, IP address, and support for RDB files and diskless replication, it transitions between the SEND and RECEIVE states. To help you understand the transitions between these states, I have included a diagram here, which shows the various state changes from the initialization phase to the handshake phase of the master-slave replication. You can refer to it.

Replication Type Determination and Execution Phase #

After the handshake between the master and slave is completed, the slave reads the CAPA message response from the master, and at this point, the state machine is in the REPL_STATE_RECEIVE_CAPA state. Next, the state machine of the slave transitions to REPL_STATE_SEND_PSYNC, indicating that it is about to start sending the PSYNC command to the master and begin the actual data synchronization.

At this point, the slave calls the slaveTryPartialResynchronization function to send the PSYNC command to the master, and the state machine is set to REPL_STATE_RECEIVE_PSYNC. The following code shows these three state transitions:

/* Slave state machine enters REPL_STATE_RECEIVE_CAPA. */
if (server.repl_state == REPL_STATE_RECEIVE_CAPA) {
  ...
  // Read the CAPA message response from the master
  server.repl_state = REPL_STATE_SEND_PSYNC;
}
// After the state machine transitions to REPL_STATE_SEND_PSYNC, start calling the slaveTryPartialResynchronization function to send the PSYNC command to the master for data synchronization
if (server.repl_state == REPL_STATE_SEND_PSYNC) {
  if (slaveTryPartialResynchronization(fd,0) == PSYNC_WRITE_ERROR) {
    ...
  }
  server.repl_state = REPL_STATE_RECEIVE_PSYNC;
  return;
}

Then, the slave calls the slaveTryPartialResynchronization function, which is responsible for sending the data synchronization command to the master. The master, upon receiving the command, determines whether to perform full replication or incremental replication, or returns an error based on the slave’s sent master ID and replication progress offset.

The following code shows the basic branches of the slaveTryPartialResynchronization function, where you can see that depending on the response message from the master, the function sets different return values, corresponding to full replication, incremental replication, or unsupported PSYNC:

int slaveTryPartialResynchronization(int fd, int read_reply) {
  ...
  // Send the PSYNC command
  if (!read_reply) {
    // Set offset to -1 when the slave synchronizes with the master for the first time
    server.master_initial_offset = -1;
    ...
    // Call sendSynchronousCommand to send the PSYNC command
    reply = sendSynchronousCommand(SYNC_CMD_WRITE,fd,"PSYNC",psync_replid,psync_offset,NULL);
    ...
    // After sending the command, wait for the response from the master
    return PSYNC_WAIT_REPLY;
  }

  // Read the response from the master
  reply = sendSynchronousCommand(SYNC_CMD_READ,fd,NULL);

  // If the master returns FULLRESYNC, perform full replication
  if (!strncmp(reply,"+FULLRESYNC",11)) {
    ...
    return PSYNC_FULLRESYNC;
  }

  // If the master returns CONTINUE, perform incremental replication
  if (!strncmp(reply,"+ CONTINUE",11)) {
    ...
    return PSYNC_CONTINUE;
  }

  // If the master returns an error message
  if (strncmp(reply,"-ERR",4)) {
    ...
  }
  return PSYNC_NOT_SUPPORTED;
}

Since slaveTryPartialResynchronization is called in the syncWithMaster function, when the function returns different results for the PSYNC command, the syncWithMaster function performs different processing based on the returned result value.

Of particular interest is full replication. When the master returns FULLRESYNC in response to the slave’s PSYNC command, the slave registers the readSyncBulkPayload callback function on the network connection with the master, and sets the state machine to REPL_STATE_TRANSFER, indicating that the actual data synchronization is starting, such as the master transferring the RDB file to the slave.

// Read the result of the PSYNC command
psync_result = slaveTryPartialResynchronization(fd,1);
// If the PSYNC result is not returned yet, return from the syncWithMaster function to process other operations first
if (psync_result == PSYNC_WAIT_REPLY) return;
// If the PSYNC result is PSYNC_CONTINUE, return from the syncWithMaster function to perform incremental replication later
if (psync_result == PSYNC_CONTINUE) {
  ...
  return;
}

// If performing full replication, create the readSyncBulkPayload callback function for the readable connection
if (aeCreateFileEvent(server.el,fd, AE_READABLE,readSyncBulkPayload,NULL) == AE_ERR)
{
  ...
}
// Set the slave state machine to REPL_STATE_TRANSFER
server.repl_state = REPL_STATE_TRANSFER;

Alright, up to this point, we have learned about the state machine transitions of the slave in the replication type determination and execution phase. I have integrated the state transitions of each phase of the master-slave replication into the following diagram for better understanding.

Summary #

Master-slave replication is a method used by databases or storage systems like Redis and MySQL to achieve high availability. To implement master-slave replication, it is necessary to deal with various handling logics of Redis in different states throughout the entire process. Therefore, how to correctly implement master-slave replication without missing any possible states is a problem we need to face in practical development.

In this lesson, we have learned the design ideas and implementation methods of Redis master-slave replication. Redis adopts a state machine-driven approach by setting state variables for slave instances. Throughout the replication process, the code logic will handle different situations based on the state transitions of the slave state machine.

To facilitate your understanding of the implementation of master-slave replication, I have broken down the entire process into four stages: initialization, establishing connections, master-slave handshake, and replication type determination and execution. In each stage, the state of the slave will continuously change, completing tasks such as establishing network connections with the master, exchanging configuration information, sending synchronization commands, and executing full or incremental synchronization based on the response from the master to the synchronization requests.

The state machine-driven design approach is a commonly used method in scenarios involving network communication. Redis’ implementation of master-slave replication provides us with a good reference example. When you need to design and implement network functionality on your own, you can utilize the state machine-driven approach.

One question per lesson #

The state machine we introduced in this lesson is used when the instance is a slave. So why don’t we need to use a state machine to implement the flow of the master instance in master-slave replication when the instance is a master?