23 Learning Raft Protocol Implementation From Sentinel Leader Election Part 1

23 Learning Raft Protocol Implementation From Sentinel Leader Election Part 1 #

In the previous lesson, we learned about the initialization process of Sentinel instances. Once a Sentinel instance is running, it periodically checks the running state of the master node it monitors. When it detects that the master node is objectively offline, the Sentinel instance begins the process of failover.

However, when deploying Sentinel instances, it is common to deploy multiple Sentinels to make collective decisions, thereby avoiding misjudgment of the master node’s state by a single Sentinel. But this also brings us to a question: When multiple Sentinels judge that the master node has failed, who should perform the failover?

In fact, this is related to the election of a Sentinel leader. And the election of a Sentinel leader is also related to the classic consensus protocol in distributed systems: the Raft protocol. Learning and mastering the implementation of the Raft protocol serves as a crucial guideline for us to achieve distributed consensus in distributed system development.

Therefore, in the next two lessons, I will guide you through understanding the Raft protocol and the specific design ideas of implementing leader election based on the Raft protocol in the Redis source code. Today, let’s first learn about the basic process of the Raft protocol, its relationship with the election of the Sentinel leader, and the overall execution flow of Sentinel work. This content is also essential knowledge for us to learn about Sentinel leader election.

Sentinel Leader Election and Raft Protocol #

When Sentinel detects a failure in the master node, it will elect a Leader, who will be responsible for executing the specific failover procedure. However, since there can be multiple instances of Sentinel, a certain protocol is needed during the leader election process to reach a consensus on which instance should be the Leader. This is called distributed consensus.

The Raft protocol can be used to achieve distributed consensus. It is an algorithm that ensures multiple nodes in a distributed system reach consensus, and it can be used to elect a Leader node among multiple nodes. To achieve this goal, the Raft protocol designates three types of nodes: Leader, Follower, and Candidate.

The Raft protocol specifies two types of interactions between Leader and Follower nodes:

  • Under normal circumstances in a stable system, there are only Leader and Follower nodes, and the Leader sends heartbeat messages to the Followers.
  • In exceptional cases, if a Follower node does not receive heartbeat messages from the Leader for a period of time, it will transition to a Candidate node and begin the Leader election process.

When a Candidate node starts the Leader election process, it performs the following operations:

  • Votes for itself.
  • Sends vote requests to other nodes and waits for their responses.
  • Starts a timer to determine if the election process times out.

During the time when a Candidate node waits for vote responses from other nodes, if it receives heartbeat messages from a Leader node, it means that a Leader has already been elected. In this case, the Candidate node will transition to a Follower node, and the Leader election process initiated by the Candidate node ends.

If the Candidate node receives vote confirmations from more than half of the other Follower nodes, indicating that more than half of the Follower nodes agree that the Candidate node should be the Leader, the Candidate node will transition to a Leader node. The Leader node can then execute the necessary process logic.

It is important to note that when a Candidate node initiates a vote, it records the current round of voting. During the voting process, a Follower node can only vote for one Candidate node in each round. Once a Follower node has voted, it cannot vote again. If a Leader node is not elected in a round of voting, for example, if multiple Candidate nodes receive the same number of votes, the Raft protocol will allow the Candidate nodes to enter the next round and start voting again.

Now you understand the basic process and principles of Leader election in the Raft protocol. However, there is something else you need to know: Redis Sentinel does not fully implement the Raft protocol. This is mainly because in Redis Sentinel’s implementation, different instances do not have a Leader and Follower relationship during normal operation. Instead, they have a peer-to-peer relationship. Only when Sentinel detects a failure in the master node, it will execute the Leader election process according to the Raft protocol.

Next, let’s take a look at how Sentinel executes the Raft protocol for Leader election from a code perspective.

The sentinelTimer function for handling time events in Sentinel #

First, let’s take a look at the sentinelTimer function in the sentinel.c file because the leader election in Sentinel is triggered during the execution of this function.

The sentinelTimer function is called within the serverCron function (in the server.c file) as shown below:

int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
...
if (server.sentinel_mode) sentinelTimer(); // If the current instance is a Sentinel, run the sentinelTimer function
...
}

The serverCron function is executed every 100ms, and during its execution, it checks the server.sentinel_mode configuration option. If this option is set to 1, it indicates that the current instance is a Sentinel, and it will subsequently call the sentinelTimer function. Therefore, the sentinelTimer function is also executed periodically. I have introduced the setting of the server.sentinel_mode configuration option in our previous lesson which you can review.

Next, the sentinelTimer function calls the sentinelHandleDictOfRedisInstances function. The prototype of this function is as follows, with a dictionary as its parameter:

void sentinelHandleDictOfRedisInstances(dict *instances) 

In fact, when the sentinelTimer function calls sentinelHandleDictOfRedisInstances, the dictionary parameter passed in is the masters hash table maintained in the sentinelState structure, which records the master nodes currently monitored by the Sentinel, as shown below:

void sentinelTimer(void) {
...
// Pass the master nodes being monitored by the current Sentinel instance as a parameter to sentinelHandleDictOfRedisInstances
sentinelHandleDictOfRedisInstances(sentinel.masters); 
...
}

The sentinelHandleDictOfRedisInstances function will then go through a loop in which it retrieves each monitored master node from the sentinel.master hash table, and calls the sentinelHandleRedisInstance function to handle that master node, as shown below:

void sentinelHandleDictOfRedisInstances(dict *instances) {
...
di = dictGetIterator(instances); // Get an iterator for the hash table
while ((de = dictNext(di)) != NULL) {
// Retrieve an instance from the hash table
sentinelRedisInstance *ri = dictGetVal(de); 
// Call sentinelHandleRedisInstance to handle the instance
sentinelHandleRedisInstance(ri);
...
}
...
}

Please note that the sentinelHandleRedisInstance function is an important function in the Sentinel working mechanism, and it implements the core logic of the Sentinel instance. We will first understand the main execution steps of this function, and then separately study the implementation details of the key steps.

Execution flow of the sentinelHandleRedisInstance function #

First, you should know that the sentinelHandleRedisInstance function is executed periodically to check the status of the nodes monitored by the Sentinel. This function mainly performs the following four steps in order:

Step 1: Reconnecting

sentinelHandleRedisInstance calls the sentinelReconnectInstance function to attempt to re-establish a connection with disconnected instances.

Step 2: Sending commands

sentinelHandleRedisInstance calls the sentinelSendPeriodicCommands function to send commands such as PING and INFO to the instances.

Step 3: Determining subjective down

sentinelHandleRedisInstance calls the sentinelCheckSubjectivelyDown function to check if the monitored instance is subjectively down. Step 4: Check for Objective Downtime and Execute Failover

In this step, the sentinelHandleRedisInstance function mainly deals with the monitored master node. This step can be further divided into four smaller steps:

  • Firstly, the function calls sentinelCheckObjectivelyDown to check if the master node is objectively down.
  • Then, it calls sentinelStartFailoverIfNeeded to determine if a failover needs to be initiated. If a failover is needed, it calls sentinelAskMasterStateToOtherSentinels to obtain the judgment of other sentinel instances on the state of the master node, and sends the is-master-down-by-addr command to other sentinels to initiate leader election.
  • Next, it calls sentinelFailoverStateMachine to execute the failover process.
  • Finally, it calls sentinelAskMasterStateToOtherSentinels again to obtain the judgment of other sentinel instances on the state of the master node.

It is worth noting that the sentinelHandleRedisInstance function operates on an instance of the sentinelRedisInstance structure, which can represent a master node, a slave node, or a sentinel instance. In the four main steps mentioned earlier, steps 1, 2, and 3 are executed for master nodes, slave nodes, and sentinel instances respectively. Step 4 is only executed when the sentinelRedisInstance represents a master node.

The following diagram illustrates the basic logic flow of the sentinelHandleRedisInstance function:

Now, we have an understanding of the basic execution process of the sentinelHandleRedisInstance function.

Furthermore, as I mentioned earlier, the sentinelHandleDictOfRedisInstances function accepts the hash table of the master nodes being monitored by the current sentinel as its parameter. Each master node keeps track of other sentinel instances and its slave nodes that monitor it. These are represented by the sentinels and slaves member variables in the sentinelRedisInstance structure respectively. These two variables are also implemented as hash tables to store information about other sentinels and slave nodes. Here is an example:

typedef struct sentinelRedisInstance {
...
 dict *sentinels;    // Other sentinel instances monitoring the same master node
 dict *slaves;      // Slave nodes of the current master node
...
}

Therefore, within the sentinelHandleDictOfRedisInstances function, after the sentinelHandleRedisInstance function processes each master node, it also calls the sentinelHandleDictOfRedisInstances function for other sentinel instances monitoring the same master node, as well as for the slave nodes of the master node. Here is the code snippet:

// If the current instance is a master node, call sentinelHandleDictOfRedisInstances to handle the slave nodes and other sentinels monitoring this master node

// If the current instance is a master node, call sentinelHandleDictOfRedisInstances to handle the slave nodes and other sentinels monitoring this master node
if (ri->flags & SRI_MASTER) {
   sentinelHandleDictOfRedisInstances(ri->slaves);
   sentinelHandleDictOfRedisInstances(ri->sentinels);
   ...
}

In other words, one important task performed by the sentinelTimer function, which is periodically executed, is the sentinelHandleDictOfRedisInstances function.

In addition to calling the sentinelHandleDictOfRedisInstances function, the sentinelTimer function also initially calls the sentinelCheckTiltCondition function to check if it needs to enter the TILT mode. Here, you should note that for a sentinel, the TILT mode is a special operating mode. If the time interval between two consecutive time event processing exceeds a negative value or is too long, the sentinel will enter the TILT mode. In this mode, the sentinel only periodically sends commands to collect information and does not execute the failover process.

Furthermore, after calling and executing the sentinelHandleDictOfRedisInstances function, the sentinelTimer function sequentially calls three other functions: sentinelRunPendingScripts, sentinelCollectTerminatedScripts, and sentinelKillTimedoutScripts, to run pending scripts, collect terminated scripts, and terminate scripts that have exceeded the timeout, respectively.

Finally, the sentinelTimer function adjusts the server.hz configuration parameter. It adds a random value to the default value of server.hz. server.hz determines the execution frequency of the sentinelTimer function. After the adjustment, the sentinelTimer function will be executed again according to the modified frequency.

The following code snippet shows the overall flow of the sentinelTimer function, which you can review:

void sentinelTimer(void) {
    sentinelCheckTiltCondition(); 
    sentinelHandleDictOfRedisInstances(sentinel.masters);
    sentinelRunPendingScripts();
    sentinelCollectTerminatedScripts();
    sentinelKillTimedoutScripts();
    server.hz = CONFIG_DEFAULT_HZ + rand() % CONFIG_DEFAULT_HZ;
}

So far, we have learned about the time event handling function of the sentinel instance, sentinelTimer. In the execution flow of this function, you should pay close attention to the sentinelHandleRedisInstance function, which is the main function responsible for periodically checking the offline status of master nodes and executing the failover process. The leader election of the sentinel also occurs here when a failover needs to be initiated. Therefore, next, let’s dive into the implementation of the sentinelHandleRedisInstance function in more detail.

Implementation of the sentinelHandleRedisInstance function #

Based on the previous introduction to the execution process of the sentinelHandleRedisInstance function, it is now known that this function first calls three functions in order: sentinelReconnectInstance, sentinelSendPeriodicCommand, and sentinelCheckSubjectiveDown. So here, let’s first take a look at the implementation and main functions of these three functions. In the next class, I will give you a detailed introduction to the implementation of other functions in sentinelHandleRedisInstance, in order to help you fully understand the key operations in the sentinel working process.

Implementation of the sentinelReconnectInstance function #

The main function of the sentinelReconnectInstance function is to determine whether the connection between the sentinel instance and the master node is normal. If a disconnection occurs, it will reconnect the sentinel and the master node.

In fact, when the sentinel saves the master node information using the sentinelRedisInstance structure, there is a member variable link of type instanceLink in this structure, which records the two connections between the sentinel and the master node, corresponding to the command connection cc used to send commands and the Pub/Sub connection pc used to send messages, as shown below:

typedef struct instanceLink {
...
redisAsyncContext *cc; // connection for sending commands
redisAsyncContext *pc; // connection for sending pub/sub messages
...
}

When the sentinelReconnectInstance function is executed, it checks whether these two connections are NULL. If they are, it will call the redisAsyncConnectBind function (in the async.c file) to re-establish the two connections with the master node.

This is because, during the process of monitoring the master node status, the sentinel needs to send commands to the master node through the command connection cc, and subscribe to the Hello channel of the master node through the Pub/Sub connection pc, so as to discover other sentinel instances that monitor the same master node through this channel.

After completing the rebuilding of the connections with the master node, the sentinel will continue to call the sentinelSendPeriodicCommands function.

Implementation of the sentinelSendPeriodicCommands function #

The logic of sentinelSendPeriodicCommands is relatively simple. It first calls the redisAsyncCommand function (in the async.c file) to send the INFO command to the master node through the command connection cc between the sentinel and the master node. Then, it sends the PING command to the master node through the sentinelSendPing function (in the sentinel.c file) (the sending of the PING command is also done through the command connection cc between the sentinel and the master node).

Finally, the sentinelSendPeriodicCommands function calls the sentinelSendHello function (in the sentinel.c file) to send the PUBLISH command to the master node through the command connection cc between the sentinel and the master node, sending the IP, port number, and ID of the sentinel itself to the master node.

Next, the sentinel will call the sentinelCheckSubjectivelyDown function to determine whether the monitored master node is subjectively down.

Implementation of the sentinelCheckSubjectivelyDown function #

The sentinelCheckSubjectivelyDown function first calculates the time elapsed since the last PING command was sent by the sentinel, as shown below:

void sentinelCheckSubjectivelyDown(sentinelRedisInstance *ri) {
...
if (ri->link->act_ping_time)  // calculate the time elapsed since the last PING command was sent
        elapsed = mstime() - ri->link->act_ping_time;
else if (ri->link->disconnected) // if the connection between the sentinel and the master node is disconnected, calculate the time elapsed since the last available time of the connection
        elapsed = mstime() - ri->link->last_avail_time;
...
}

After calculating the elapsed time, the sentinelCheckSubjectivelyDown function checks the activity level of the command sending connection and the Pub/Sub connection between the sentinel and the master node. If the activity level is not enough, the sentinel will call the instanceLinkCloseConnection function (in the sentinel.c file) to disconnect the current connection for reconnection.

Then, the sentinelCheckSubjectivelyDown function determines whether the master node is subjectively down based on the following two conditions:

  • Condition 1: The time elapsed since the last PING command was sent has exceeded the down_after_period threshold, and no replies have been received yet. The value of down_after_period is determined by the down-after-milliseconds configuration item in the sentinel.conf configuration file, and its default value is 30s.
  • Condition 2: The sentinel considers the current instance as a master node, but this node reports to the sentinel that it will become a slave node, and after the down_after_period time has passed plus twice the interval between INFO commands, the node still has not made the transition successfully.

When one of these two conditions is met, the sentinel determines that the master node is subjectively down. Then, the sentinel calls the sentinelEvent function to send a “+sdown” event message. The code below shows the logic for this judgment, you can take a look:

 if (elapsed > ri->down_after_period || 
  (ri->flags & SRI_MASTER && ri->role_reported == SRI_SLAVE 
   &&  mstime() - ri->role_reported_time > (ri->down_after_period+SENTINEL_INFO_PERIOD*2)))
 {
        // Judge the master node as subjectively down
        if ((ri->flags & SRI_S_DOWN) == 0) {
            sentinelEvent(LL_WARNING,"+sdown",ri,"%@");
            ri->s_down_since_time = mstime();
            ri->flags |= SRI_S_DOWN;
        }
  } 

Okay, here we have learned about the first three key operations in the execution process of the sentinelHandleRedisInstance function. They are respectively used to rebuild the sentinel and monitoring connections with the master node, send detection commands to the master node, and determine the subjectively down state of the master node. These three steps are essential operations for the sentinel to perform periodic tasks every time.

Conclusion #

In this lesson, I mainly introduced you to an important part of the Sentinel’s working process, which is the election of the Sentinel Leader. This election process is implemented with reference to the Raft protocol, a commonly used distributed consensus protocol in distributed systems. Therefore, you need to first understand the basic process of the Raft protocol, including the three types of nodes - Leader, Follower, and Candidate, the conditions and specific operations for a Follower to become a Candidate, and the rules for Leader voting.

For the Sentinel Leader election, it is based on the Raft protocol, but you need to note that Sentinels do not distinguish between the three types of nodes like the Raft protocol does when they are running normally. Instead, all Sentinels are equal. However, when a Sentinel detects a failure of the master node and needs to perform a failover, it will use the rules for Leader election in the Raft protocol to vote and elect a Leader. This is the difference and connection between the Sentinel Leader election and the Raft protocol.

In addition, I also introduced the Sentinel’s timer event handling function, sentinelTimer. This function periodically calls the sentinelHandleRedisInstance function for each master node monitored by the Sentinel to check their online status. When a master node is objectively offline, the Sentinel will start the Leader election and perform a failover. In this lesson, we first understood the overall execution process of the sentinelHandleRedisInstance function, so you can also grasp the overall working process of the Sentinel. At the same time, you should also have some understanding of the three functions - reconnecting sentinels with master nodes, sending commands, and checking subjective offline status, as they are three important steps in the Sentinel’s work.

So, in the next lesson, I will guide you through the specific process of the Sentinel Leader election and the implementation of a failover.

One Question per Lesson #

The purpose of adjusting the server.hz configuration item in the sentinelTimer function is to modify the frequency at which certain operations are performed by the server. The server.hz value determines the number of times per second that the server’s internal tasks, such as updating its state or executing periodic functions, are executed. By randomly adjusting server.hz within a certain range, the execution frequency of these tasks can be varied to some extent, providing a level of flexibility and randomness in the server’s operations. This can help distribute the workload more evenly and avoid potential performance issues that may arise from excessive load at fixed frequencies.

Feel free to leave your answers and thoughts in the comments section, and also share today’s content with more friends if you’d like.