24 Learning Raft Protocol Implementation From Sentinel Leader Election Part 2

24 Learning Raft Protocol Implementation From Sentinel Leader Election Part 2 #

In the previous lesson, I introduced the basic process of the Raft protocol and the basic steps of the sentinel instance’s operation. The sentinel instance works by periodically executing the serverCron function, which in turn calls the sentinelTimer function to handle time events related to the sentinel. The time events handled by the sentinelTimer function include monitoring each master node by calling the sentinelHandleRedisInstance function. This function checks the online status of the master node and performs failover when the master node is objectively offline.

In addition, I also took you through the first three steps of the sentinelHandleRedisInstance function, which are reconnecting/disconnecting instances, periodically sending check commands to the instances, and checking if the instances are subjectively down. These steps correspond to the functions sentinelReconnectInstance, sentinelSendPeriodicCommands, and sentinelCheckSubjectivelyDown, respectively. You can review them if necessary.

So, in today’s lesson, I will continue to introduce the remaining steps of the sentinelHandleRedisInstance function execution process, which are checking if the master node is objectively down, determining whether a failover needs to be performed, and the specific process of leader election in the sentinel when failover is required.

After learning the content of this lesson, you will have a comprehensive understanding of the process of the sentinel’s operation. Moreover, you will learn how to implement the Raft protocol at the code level to achieve leader election. This way, when you implement distributed consensus in a distributed system in the future, this part of the content can guide you in designing and implementing your code.

Next, let’s first take a look at the objective down detection of the master node.

Objective Offline Judgment of Master Node #

Now we know that in the sentinelHandleRedisInstance function, the sentinel calls the sentinelCheckObjectivelyDown function (in the sentinel.c file) to check whether the master node is objectively offline.

When the sentinelCheckObjectivelyDown function is executed, it not only checks the subjective offline judgment result of the master node by the current sentinel, but also needs to combine the judgment results of other sentinels monitoring the same master node. Only by considering these judgment results can the final judgment of objective offline of the master node be made.

From the perspective of code implementation, in the sentinelRedisInstance structure used by the sentinel to record master node information, a hash table sentinels is already used to store other sentinel instances that monitor the same master node, as shown below:

typedef struct sentinelRedisInstance {
...
dict *sentinels;
...
}

In this way, the sentinelCheckObjectivelyDown function can obtain the judgment results of other sentinel instances for the subjective offline of the same master node by traversing the sentinels hash table recorded by the master node. This is because the sentinel instances stored in the sentinels hash table also use the sentinelRedisInstance structure, and the member variable flags of this structure records the judgment results of the sentinel for the subjective offline of the master node.

Specifically, the sentinelCheckObjectivelyDown function uses the quorum variable to record the number of sentinels judging the master node as subjectively offline. If the current sentinel has judged the master node as subjectively offline, it will first set the quorum value to 1. Then, it sequentially checks the flags variable of other sentinels to check whether the SRI_MASTER_DOWN flag is set. If it is set, it increments the quorum value by 1.

After traversing the sentinels hash table, the sentinelCheckObjectivelyDown function checks whether the quorum value is greater than or equal to the predefined quorum threshold, which is stored in the data structure of the master node, that is, master->quorum, and this threshold is set in the sentinel.conf configuration file.

If the actual quorum value is greater than or equal to the predefined quorum threshold, the sentinelCheckObjectivelyDown function judges the master node as objectively offline and sets the odown variable to 1, which represents the current sentinel’s judgment result for the objective offline of the master node.

The judging logic of this part is shown in the following code, you can take a look:

void sentinelCheckObjectivelyDown(sentinelRedisInstance *master) {
...
// The current master node has been subjectively downgraded by the current sentinel
if (master->flags & SRI_S_DOWN) {
   quorum = 1; // The current sentinel sets the quorum value to 1

   di = dictGetIterator(master->sentinels);
   while((de = dictNext(di)) != NULL) {  // Traverse other sentinels that monitor the same master node
      sentinelRedisInstance *ri = dictGetVal(de);
      if (ri->flags & SRI_MASTER_DOWN) quorum++;
   }
   dictReleaseIterator(di);
   // If the quorum value is greater than or equal to the predefined quorum threshold, set odown to 1.
   if (quorum >= master->quorum) odown = 1;
}

Additionally, I also drew a diagram showing this judgment logic. You can review it again.

Once the sentinelCheckObjectivelyDown function judges the master node as objectively offline, it calls the sentinelEvent function to send the +odown event message, and then sets the SRI_O_DOWN flag in the flags variable of the master node, as shown below:

// Judge the master node as objectively offline
if (odown) {
   // If the SRI_O_DOWN flag is not set
   if ((master->flags & SRI_O_DOWN) == 0) {
    sentinelEvent(LL_WARNING,"+odown",master,"%@ #quorum %d/%d",
                quorum, master->quorum); // Send the +odown event message
    master->flags |= SRI_O_DOWN;  // Record the SRI_O_DOWN flag in the master node's flags
    master->o_down_since_time = mstime(); // Record the time when the objective offline was judged
   }
}

In other words, the sentinelCheckObjectivelyDown function determines whether the master node is objectively offline by traversing the flags variable of other sentinels monitoring the same master node.

However, you may have a question after reading the code just now. In the sentinelCheckSubjectivelyDown function we learned in the previous lesson, if the sentinel judges the master node as subjectively offline, it will set the SRI_S_DOWN flag in the flags variable of the master node as shown below:

// The sentinel has judged the master node as subjectively offline
...
// The flags in the sentinelRedisInstance structure corresponding to the master node do not record the subjective offline
if ((ri->flags & SRI_S_DOWN) == 0) {
   ...
   ri->flags |= SRI_S_DOWN;  // Record the subjective offline flag in the master node's flags
}

However, the sentinelCheckObjectivelyDown function checks the SRI_MASTER_DOWN flag in the flags variable of other sentinels to determine whether the other sentinels’ SRI_MASTER_DOWN flag is set. So how are the SRI_MASTER_DOWN flags of other sentinels set?

This is related to the sentinelAskMasterStateToOtherSentinels function (in the sentinel.c file). Next, let’s take a detailed look at this function.

sentinelAskMasterStateToOtherSentinels function #

The main purpose of the sentinelAskMasterStateToOtherSentinels function is to send the is-master-down-by-addr command to other sentinels listening for the same master node to inquire about the state judgment of the master node by other sentinels.

It calls the redisAsyncCommand function (in the async.c file), and sequentially sends the sentinel is-master-down-by-addr command to other sentinels. At the same time, it sets the processing function for the returned result of the command as sentinelReceiveIsMasterDownReply (in the sentinel.c file), as shown below:

void sentinelAskMasterStateToOtherSentinels(sentinelRedisInstance *master, int flags) {
...
di = dictGetIterator(master->sentinels);
// Traverse other sentinels listening for the same master node
while((de = dictNext(di)) != NULL) {
   sentinelRedisInstance *ri = dictGetVal(de);
   ...
   // Send the `sentinel is-master-down-by-addr` command
}

In this way, the sentinelCheckObjectivelyDown function determines whether a master node is objectively down by flagging from other sentinels on the same master.

retval = redisAsyncCommand(ri->link->cc,
          sentinelReceiveIsMasterDownReply, ri,
          "%s is-master-down-by-addr %s %s %llu %s",
          sentinelInstanceMapCommand(ri,"SENTINEL"),
          master->addr->ip, port,
          sentinel.current_epoch,
          (master->failover_state > SENTINEL_FAILOVER_STATE_NONE) ?
              sentinel.myid : "*");
}
…
}

Additionally, from the code, we can see that the sentinel is-master-down-by-addr command also includes the master node’s IP, the master node’s port, the current epoch, and the instance ID. The format of this command is shown below:

sentinel is-master-down-by-addr master_node_IP master_node_port current_epoch instance_ID

In this command, the sentinel sets the instance ID based on the current state of the master node. If the master node is already starting failover, then the instance ID is set to the ID of the current sentinel itself; otherwise, it is set to an asterisk (*).

It’s worth noting here that the master node’s data structure uses the master->failover_state variable to record the failover state, with an initial value of SENTINEL_FAILOVER_STATE_NONE (corresponding to a value of 0). When the master node starts failover, this state value will be greater than SENTINEL_FAILOVER_STATE_NONE.

Now that we understand the basic execution process of the sentinelAskMasterStateToOtherSentinels function, we also need to know how other sentinels handle the sentinel is-master-down-by-addr command after receiving it.

Handling of the `sentinel is-master-down-by-addr` command #

In fact, for sentinel commands starting with sentinel, they are all processed in the sentinelCommand function (in the sentinel.c file). The sentinelCommand function executes different branches based on the different subcommands following the sentinel command, and is-master-down-by-addr is one of the subcommands.

In the code branch corresponding to the is-master-down-by-addr subcommand, the sentinelCommand function retrieves the sentinelRedisInstance structure for the master node based on the IP and port specified in the command.

Then, it checks whether the flags variable of the master node contains the SRI_S_DOWN and SRI_MASTER flags, meaning that it checks whether the current node is indeed the master node and whether the sentinel has already marked this node as subjectively offline. If these conditions are met, it sets the isdown variable to 1, which represents the sentinel’s subjective offline judgment result for the master node.

Finally, the sentinelCommand function returns the sentinel’s subjective offline judgment result for the master node, the ID of the sentinel leader, and the epoch of the sentinel leader to the sentinel that sent the sentinel command. The returned result consists of three parts.

The basic process of processing sentinel commands in the sentinelCommand function is as follows:

void sentinelCommand(client *c) {
...
// Code branch for the `is-master-down-by-addr` subcommand
else if (!strcasecmp(c->argv[1]->ptr,"is-master-down-by-addr")) {
...
// Sentinel determines that the master node is subjectively offline
if (!sentinel.tilt && ri && (ri->flags & SRI_S_DOWN) && (ri->flags & SRI_MASTER))
  isdown = 1;
...
addReplyMultiBulkLen(c,3); // The sentinel returns three parts of the result of the sentinel command processing
}
...
}

addReply(c, isdown ? shared.cone : shared.czero); // If the sentinel determines that the master node is subjectively offline, the first part is 1, otherwise it is 0
addReplyBulkCString(c, leader ? leader : "*"); // The second part is either the Leader ID or "*"
addReplyLongLong(c, (long long)leader_epoch); // The third part is the epoch of the Leader
…}
…}

You can also refer to the following diagram:

Alright, now you already know that the sentinel will use the sentinelAskMasterStateToOtherSentinels function to send the sentinel is-master-down-by-addr command to other sentinels that monitor the same node, in order to obtain their subjective offline judgment results for the master node. Other sentinels handle the sentinel is-master-down-by-addr command using the sentinelCommand function, and the command processing result returned by the function includes their own subjective offline judgment result for the master node.

However, from the code above, you can also see that the returned result of the sentinelCommand function contains information about the sentinel leader. This is because the sentinel is-master-down-by-addr command itself, sent by the sentinelAskMasterStateToOtherSentinels function, can also be used to trigger the sentinel leader election. I will explain this to you later.

So, let’s go back to the question raised when we were discussing the objective offline judgment of the master node. The sentinelCheckObjectivelyDown function needs to check the SRI_MASTER_DOWN flag in the flags variable of other sentinels that monitor the same master node. But how are the SRI_MASTER_DOWN flags in other sentinels set?

This is actually related to the command result processing function sentinelReceiveIsMasterDownReply, set when the sentinelAskMasterStateToOtherSentinels function sends the sentinel is-master-down-by-addr command to other sentinels.

sentinelReceiveIsMasterDownReply Function #

In the sentinelReceiveIsMasterDownReply function, it checks the reply result returned by other sentinels. The reply result includes the three parts I mentioned earlier: the subjective offline judgment result of the master node made by the current sentinel, the ID of the sentinel leader, and the epoch to which the sentinel leader belongs. This function further checks whether the first part of the content, “the subjective offline judgment result of the master node made by the current sentinel,” is 1.

If it is, it means that the corresponding sentinel has determined that the master node is subjectively offline. In this case, the current sentinel will set the corresponding sentinel’s flags that it records to SRI_MASTER_DOWN.

The code below shows the execution logic of the sentinelReceiveIsMasterDownReply function for judging the reply result from other sentinels. You can take a look.

// r is the result of the command processing received by the current sentinel from other sentinels
// If the result contains three parts, and the types of the first, second, and third parts are integer, string, and integer respectively
if (r->type == REDIS_REPLY_ARRAY && r->elements == 3 &&
        r->element[0]->type == REDIS_REPLY_INTEGER &&
        r->element[1]->type == REDIS_REPLY_STRING &&
        r->element[2]->type == REDIS_REPLY_INTEGER) {
        ri->last_master_down_reply_time = mstime();
        // If the value of the first part of the result is 1, set the `SRI_MASTER_DOWN` flag in the flags of the corresponding sentinel
        if (r->element[0]->integer == 1) {
            ri->flags |= SRI_MASTER_DOWN;
        }
    }

So now, you know that when a sentinel calls the sentinelCheckObjectivelyDown function, it directly checks whether the SRI_MASTER_DOWN flag is set in the flags of other sentinels. And sentinels obtain the subjective offline judgment results of the master node from other sentinels by sending the sentinel is-master-down-by-addr command using the sentinelAskMasterStateToOtherSentinels function. Then, based on the command reply result, in the result processing function sentinelReceiveIsMasterDownReply, the flags of other sentinels are set to SRI_MASTER_DOWN. The diagram below also shows this execution logic, so you can have an overall review.

Now, with a good understanding of this execution logic, let’s take a look at when the sentinel election starts.

Sentinel Election #

Here, in order to understand the trigger of sentinel election, let’s review the calling relationship of the sentinelHandleRedisInstance function that I mentioned in the previous class, as shown in the following diagram:

From the diagram, we can see that sentinelHandleRedisInstance first calls the sentinelCheckObjectivelyDown function, then calls the sentinelStartFailoverIfNeeded function to determine whether to start a failover. If the return value of the sentinelStartFailoverIfNeeded function is non-zero, then the sentinelAskMasterStateToOtherSentinels function will be called. Otherwise, sentinelHandleRedisInstance directly calls the sentinelFailoverStateMachine function and then calls the sentinelAskMasterStateToOtherSentinels function again.

In this calling relationship, sentinelStartFailoverIfNeeded determines whether to perform a failover. It has three conditions:

The flags of the master node have been marked as SRI_O_DOWN;
No failover is currently being executed;
If the failover has already started, the time elapsed since the start must be more than twice the value of the sentinel failover-timeout configuration in the sentinel.conf file.

Once all three conditions are met, sentinelStartFailoverIfNeeded will call the sentinelStartFailover function to initiate the failover. The sentinelStartFailover function sets the failover_state of the master node to SENTINEL_FAILOVER_STATE_WAIT_START and adds the SRI_FAILOVER_IN_PROGRESS flag to the flags of the master node, indicating that the failover has started, as shown below:

void sentinelStartFailover(sentinelRedisInstance *master) {
…
master->failover_state = SENTINEL_FAILOVER_STATE_WAIT_START;
master->flags |= SRI_FAILOVER_IN_PROGRESS;
…
}

Once the sentinelStartFailover function sets the failover_state of the master node to SENTINEL_FAILOVER_STATE_WAIT_START, the sentinelFailoverStateMachine function will execute the state machine to perform the actual failover. However, before the actual failover, the sentinelAskMasterStateToOtherSentinels function will be called.

You may wonder why the sentinelAskMasterStateToOtherSentinels function is called if sentinelStartFailoverIfNeeded determines that a failover should be executed. In fact, this is related to another function of sentinelAskMasterStateToOtherSentinels. In addition to asking other sentinels for their subjective judgments on the state of the master node, this function can also be used to initiate leader election among other sentinels.

When I introduced this function earlier, I mentioned that it sends the sentinel is-master-down-by-addr command to other sentinels, including the IP and port of the master node, the current epoch (sentinel.current_epoch), and the instance ID. If the failover_state of the master node is no longer SENTINEL_FAILOVER_STATE_NONE, then the instance ID is set to the ID of the current sentinel.

In the sentinel command processing function, if the instance ID in the sentinel command is not “*”, the sentinelVoteLeader function is called for leader election.

// The current instance is the master node and the instance ID in the sentinel command is not “*”

// The current instance is the master node and the instance ID in the sentinel command is not "*"
if (ri && ri->flags & SRI_MASTER && strcasecmp(c->argv[5]->ptr,"*")) {
   // Call sentinelVoteLeader for sentinel leader election
   leader = sentinelVoteLeader(ri,(uint64_t)req_epoch, c->argv[5]->ptr,
                                            &leader_epoch);
}

Now let’s take a closer look at the sentinelVoteLeader function.

`sentinelVoteLeader` Function #

The sentinelVoteLeader function actually performs the voting logic. Let me explain it to you through an example.

Assume that sentinel A determines that the master node master is objectively down, and it now sends a vote request to sentinel B. The ID of sentinel A is req_runid. When sentinel B executes the sentinelVoteLeader function, it checks the epoch of sentinel A (req_epoch), the epoch of sentinel B (sentinel.current_epoch), and the leader epoch recorded by the master (master->leader_epoch). According to the definition of the Raft protocol, sentinel A is the candidate node, and sentinel B is the follower node. I mentioned in the previous class when introducing the Raft protocol that the Candidate initiating the vote is recorded in rounds, and each Follower can only vote once in a round. The epoch here serves as a round record. The sentinelVoteLeader function also determines the epoch according to the requirements of the Raft protocol, allowing each Follower to vote only once in a round.

Therefore, the condition for the sentinelVoteLeader function to allow Sentinel B to vote is: the epoch of the Leader recorded by the master is smaller than the epoch of Sentinel A, and at the same time, the epoch of Sentinel A is greater than or equal to the epoch of Sentinel B. These two conditions ensure that Sentinel B has not voted before. If it had, the sentinelVoteLeader function would simply return the Leader ID recorded in the current master, which is what Sentinel B has saved after voting.

The following code demonstrates the logic we discussed just now. Take a look:

if (req_epoch > sentinel.current_epoch) {
   sentinel.current_epoch = req_epoch;
   …
   sentinelEvent(LL_WARNING,"+new-epoch",master,"%llu",
            (unsigned long long) sentinel.current_epoch);
}
 
if (master->leader_epoch < req_epoch && sentinel.current_epoch <= req_epoch)
{
        sdsfree(master->leader);
        master->leader = sdsnew(req_runid);
        master->leader_epoch = sentinel.current_epoch;
        …
}
return master->leader ? sdsnew(master->leader) : NULL;

Now you understand how the sentinelVoteLeader function uses epochs to determine the election of the Sentinel Leader according to the Raft protocol.

Next, the Sentinel that initiates the vote still handles the return results from other Sentinels voting for the Leader through the sentinelReceiveIsMasterDownReply function. As we mentioned earlier, the second and third parts of this result are the ID of the Sentinel Leader and the epoch the Sentinel Leader belongs to. The Sentinel that initiates the vote can obtain the voting results of other Sentinels for the Leader from this result.

Finally, after the Sentinel that initiates the vote calls the sentinelAskMasterStateToOtherSentinels function to allow other Sentinels to vote, it will execute the sentinelFailoverStateMachine function.

If the master starts failover, the failover_state of the master will be set to SENTINEL_FAILOVER_STATE_WAIT_START. In this state, the sentinelFailoverStateMachine function will call the sentinelFailoverWaitStart function. The sentinelFailoverWaitStart function will call the sentinelGetLeader function to determine if the Sentinel that initiated the vote is the Sentinel Leader. To become the Leader, the Sentinel that initiates the vote must satisfy two conditions:

First, it must obtain approval votes from more than half of the other Sentinels.
Second, it must obtain approval votes exceeding the predefined quorum threshold.

You can see these two conditions in the code snippet from the sentinelGetLeader function as follows:

// voters is the total number of Sentinels, max_votes is the number of votes obtained
voters_quorum = voters/2+1;  // The number of approval votes must be more than half of the number of Sentinels
// If the number of approval votes is less than the number of Sentinels that are not in favor or less than the quorum threshold, then the winner is NULL
if (winner && (max_votes < voters_quorum || max_votes < master->quorum))
        winner = NULL;
// Determine the final Leader
winner = winner ? sdsnew(winner) : NULL;

The diagram below shows the call relationship when confirming the Sentinel Leader as we discussed just now. Take a look.

Call Relationship

Alright, by now the final Sentinel Leader can be determined.

Summary #

Alright, this is the end of today’s lesson. Let’s summarize.

In today’s lesson, I focused on introducing you to the objective offline judgment in the sentinel working process, as well as leader election. Because this process involves interaction and inquiry between sentinels, it is not easy to grasp. You need to pay close attention to the key points I mentioned.

Firstly, the judgment of objective offline involves three flag judgments, namely SRI_S_DOWN and SRI_O_DOWN in the main node flags, as well as SRI_MASTER_DOWN in the sentinel instance flags. I drew the table below to show the setting functions and conditions for these three flags. You can review it as a whole.

Once the sentinel determines that the main node is objectively offline, the sentinel will invoke the sentinelAskMasterStateToOtherSentinels function to perform sentinel leader election. Here, you need to note that inquiring about the subjective offline status of the main node from other sentinels, as well as initiating leader voting from other sentinels, are both achieved through the sentinel is-master-down-by-addr command. In the Redis source code, the same function sentinelAskMasterStateToOtherSentinels is used to send this command. Therefore, when reading the source code, be careful to distinguish whether the command sent by sentinelAskMasterStateToOtherSentinels is to query the subjective offline status of the main node or to conduct voting.

Finally, the voting for sentinel leader election is completed in the sentinelVoteLeader function. To comply with the rules of the Raft protocol, when executing the sentinelVoteLeader function, it mainly compares the epoch of the sentinel and the leader epoch recorded by the master. This is to meet the requirement of the Raft protocol that a follower can only cast one vote in a round of voting.

Alright, in today’s lesson, we have understood the process of sentinel leader election. As you can see, even though the final execution logic of sentinel election is in one function, the triggering logic of sentinel election is included in the entire working process of the sentinel. So, we also need to grasp other operations in this process, such as subjective offline judgment and objective offline judgment.

One question per lesson #

Sentinel calls the function sentinelHandleDictOfRedisInstances in the sentinelTimer function. It executes the sentinelHandleRedisInstance function for each master node, and also executes the sentinelHandleRedisInstance function for all the slave nodes of the master node. So, will Sentinel judge subjective and objective offline status for the slave nodes?