30 How to Implement Latency Monitoring in Systems

30 How to Implement Latency Monitoring in Systems #

We know that a notable feature of Redis is its ability to provide low-latency data access. If Redis slows down during operation, we need a way to monitor which commands are slowing down. Furthermore, we need a way to monitor which events during Redis runtime are causing the slowdown. This way, we can not only inspect these slow commands, analyze their operation types and data access volume, and propose countermeasures, but also examine the recorded events, analyze the causes of the events, and propose countermeasures.

To meet these requirements, we need a monitoring framework that can not only monitor events that cause Redis to slow down but also monitor and record slow commands. In fact, this is a common operational and development requirement when developing backend systems, which is how to monitor the operational status of the backend system.

In today’s lesson, I will introduce you to Redis’s latency monitoring framework and the design and implementation of slow command logging. After mastering the content of today’s lesson, on the one hand, you can apply this monitoring framework to actual business scenarios, and on the other hand, you can refer to Redis’s implementation to add latency monitoring capabilities to your own system.

Now, let’s first take a look at Redis’s implementation of the latency monitoring framework.

Implementation of Latency Monitoring Framework #

In fact, during the operation of Redis, several types of events listed in the table below will cause Redis to slow down. We usually refer to these events as latency events. You can refer to these event types in the table, as well as their corresponding event names in the source code.

To address these events, Redis has implemented a latency monitoring framework to record their execution status through sampling. When troubleshooting is needed, the latency monitoring framework provides the latency history command for operational personnel to check these events.

Now, let’s take a look at the data structures used to record the execution status of events. Since the latency monitoring framework is implemented in the latency.h and latency.c files in Redis, you can find the corresponding data structures and functions in these two files.

Data Structure for Recording Event Execution Status #

First of all, Redis uses the latencySample structure to record the sampling time of latency events and the actual execution duration of the events. The code for this structure is shown below:

struct latencySample {
    int32_t time;  // Sampling time of the event
    uint32_t latency;  // Execution duration of the event (in milliseconds)
};

Based on the latencySample structure, Redis has designed the latencyTimeSeries structure. This structure uses an array of latencySample type to record a series of sampling results for a specific type of event, providing more event information for analyzing Redis slowdown.

struct latencyTimeSeries {
    int idx;  // Writing position of the sampling event array
    uint32_t max;  // Maximum latency of the current event
    struct latencySample samples[LATENCY_TS_LEN];  // Array of sampling events, recording LATENCY_TS_LEN sampling results. LATENCY_TS_LEN defaults to 160.
};

Furthermore, because the latency monitoring framework needs to record various types of latency events, Redis further designs a hash table latency_events as a member variable of the global variable server to record the sampling result arrays of different events, as shown below:

struct redisServer {
   …
   dict *latency_events;
   …
}

This hash table is created during the initialization function initServer by calling the latencyMonitorInit function, as shown below:

void initServer(void) {
    …
    latencyMonitorInit();
}

void latencyMonitorInit(void) {
    server.latency_events = dictCreate(&latencyTimeSeriesDictType, NULL);
}

Alright, now that we understand the data structures used to record latency events and the initialization operations, let’s take a look at how event sampling is implemented.

How is Event Sampling Implemented? #

The sampling function for latency events is latencyAddSample, with the following function prototype. Its parameters include the event name to be recorded, which corresponds to a hash entry in the latency_events hash table. In addition, its parameters also include the execution duration of the event.

void latencyAddSample(char *event, mstime_t latency)

The execution logic of the latencyAddSample function is not complicated and can be divided into three steps.

First, it will search for the hash entry corresponding to the event in the latency_events hash table. If the hash entry for the event does not exist, it will add the event to the hash table, as shown below:

// Search for the hash entry for the event
struct latencyTimeSeries *ts = dictFetchValue(server.latency_events, event);
…
if (ts == NULL) {  // If the hash entry is empty, create a new one
    ts = zmalloc(sizeof(*ts));
    ts->idx = 0;
    ts->max = 0;
    memset(ts->samples, 0, sizeof(ts->samples));
    dictAdd(server.latency_events, zstrdup(event), ts);  // Insert the hash entry into the hash table
}

Then, the latencyAddSample function will update the maximum execution time of the current record for the class of events based on the passed execution time, as shown below:

if (latency > ts->max) ts->max = latency;

Finally, the latencyAddSample function will actually record the current sampling result.

However, in this step, if it finds that the current sampling result is obtained in the same second as the previous sampling result, and if the execution duration of the current sampling result is greater than the previous one, the latencyAddSample function will directly update the execution duration recorded in the previous sampling result instead of inserting a new sampling result.

Otherwise, the latencyAddSample function will insert a new sampling result. This design is intended to avoid recording too many sampling results in the same second.

The code below shows the logic of actually recording the sampling result in the latencyAddSample function. Please take a look:

// Get the previous sampling result of the same event type
prev = (ts->idx + LATENCY_TS_LEN - 1) % LATENCY_TS_LEN;
if (ts->samples[prev].time == now) {  // If the current and previous sampling results are obtained in the same second
    if (latency > ts->samples[prev].latency)  // If the execution duration of the current sampling result is greater than that of the previous sampling result

void latencyAddSample(char *event, mstime_t latency) {
    // Search for the hash entry for the event
    struct latencyTimeSeries *ts = dictFetchValue(server.latency_events, event);
    …
    if (ts == NULL) {  // If the hash entry is empty, create a new one
        ts = zmalloc(sizeof(*ts));
        ts->idx = 0;
        ts->max = 0;
        memset(ts->samples, 0, sizeof(ts->samples));
        dictAdd(server.latency_events, zstrdup(event), ts);  // Insert the hash entry into the hash table
    }

    if (latency > ts->max) ts->max = latency;

    // Get the previous sampling result of the same event type
    prev = (ts->idx + LATENCY_TS_LEN - 1) % LATENCY_TS_LEN;
    if (ts->samples[prev].time == now) {  // If the current and previous sampling results are obtained in the same second
        if (latency > ts->samples[prev].latency)  // If the execution duration of the current sampling result is greater than that of the previous sampling result
            ts->samples[prev].latency = latency;  // Update the execution duration in the previous sampling result
    } else {
        // Insert a new sampling result
        ts->samples[ts->idx].time = now;
        ts->samples[ts->idx].latency = latency;
        ts->idx = (ts->idx + 1) % LATENCY_TS_LEN;
    }
}

ts->samples[prev].latency = latency; // Directly update the execution duration of the previous sample result
return;
}
// Otherwise, insert the current sample result
ts->samples[ts->idx].time = time(NULL);
ts->samples[ts->idx].latency = latency;


Here, you should also note that the latencyAddSample function reuses the latencyTimeSeries array when recording sample results. Therefore, if the number of sample results exceeds the default size of the array, the old sample results will be overwritten. If you want to record more sample results, you need to increase the length of the latencyTimeSeries array.

So when is the latencyAddSample function called for sampling?

Actually, the latencyAddSample function is encapsulated in the latencyAddSampleIfNeeded function. In the latencyAddSampleIfNeeded function, it is only called to record sample results when the event execution duration exceeds the value of the latency-monitor-threshold configuration item. You can see the definition of the latencyAddSampleIfNeeded function below.


#define latencyAddSampleIfNeeded(event,var) \
    if (server.latency_monitor_threshold && \
        (var) >= server.latency_monitor_threshold) \
          latencyAddSample((event),(var));


The latencyAddSampleIfNeeded function is actually called when the latency event occurs. Here are two examples.

For example, when a Redis command is executed through the call function (in the server.c file), the call function calls the latencyAddSampleIfNeeded function to sample, as shown below.


if (flags & CMD_CALL_SLOWLOG && c->cmd->proc != execCommand) {
        // Depending on the CMD_FAST flag in the command data structure, determine whether the current event is a fast-command event or a command event
        char *latency_event = (c->cmd->flags & CMD_FAST) ?
                              "fast-command" : "command";
        latencyAddSampleIfNeeded(latency_event,duration/1000);
        …
}


Another example is when Redis calls the flushAppendOnlyFile function to write to the AOF file, if the configuration item for AOF file synchronization is AOF_FSYNC_ALWAYS, the flushAppendOnlyFile function will call the latencyAddSampleIfNeeded function to record the sampling result of the aof-fsync-always latency event, as shown below.


void flushAppendOnlyFile(int force) {
…
if (server.aof_fsync == AOF_FSYNC_ALWAYS) {
latencyStartMonitor(latency); // Call the latencyStartMonitor function to start the timer
redis_fsync(server.aof_fd); // Actually write data to disk
        latencyEndMonitor(latency); // Call the latencyEndMonitor to end the timer and calculate the duration
        latencyAddSampleIfNeeded("aof-fsync-always",latency);
…}
}


In this case, what you need to note is that when Redis source code calls the latencyAddSampleIfNeeded function to record sample results, it often calls the latencyStartMonitor function to start the timer before the latency event occurs, and calls the latencyEndMonitor function to end the timer and calculate the event execution duration after the event execution is completed.

In addition, you can also use tools for reading Redis source code, such as Sublime and Source Insight, to find other places where the latencyAddSampleIfNeeded function is called by searching for function relationships.

Okay, so far the Redis latency monitoring framework can record sampling results of latency events through the latencyAddSampleIfNeeded function. In fact, the Redis latency monitoring framework also implements latency analysis and provides suggestions for dealing with slow latency. Let's take a look.

### Latency Analysis and Suggestions

First, Redis provides a latency doctor command to give a latency analysis report and suggestions for dealing with latency. When we execute this command, Redis uses the latencyCommand function to process it. In the process of handling this command, the latencyCommand function calls the createLatencyReport function to generate a latency analysis report and suggestions.

Specifically, the createLatencyReport function first calls the analyzeLatencyForEvent function to calculate the mean, maximum, and minimum durations of the sampled latency event execution. You can read the source code of the analyzeLatencyForEvent function to understand the specific statistical calculation process.

Then, the createLatencyReport function provides suggestions based on this type of event, combined with Redis configuration and other knowledge.

In fact, in the createLatencyReport function, it defines several int variables. When the values of these variables are 1, it means that it is recommended for Redis users to adopt a measure to deal with high latency. I have shown some variables corresponding to countermeasures in the code below. You can have a look. In addition, you can also read the source code of the createLatencyReport function to understand all the countermeasures.


sds createLatencyReport(void) {
…
int advise_slowlog_enabled = 0;  // Suggest enabling slowlog
int advise_slowlog_tuning = 0;   // Suggest adjusting slowlog threshold
int advise_slowlog_inspect = 0;  // Suggest checking slowlog results
int advise_disk_contention = 0;  // Suggest reducing disk contention
…
}


Let's take a simple example. For the command event, the createLatencyReport function will give suggestions such as enabling slowlog, adjusting slowlog threshold, checking slowlog log results, and avoiding the use of bigkey based on the slowlog settings. The code for this part is as follows:


if (!strcasecmp(event,"command")) {
   // Suggest enabling slowlog if it is not enabled
   if (server.slowlog_log_slower_than < 0) {
       advise_slowlog_enabled = 1;
       advices++;
  }  // Suggest adjusting the slowlog threshold if the command duration threshold used by slowlog is too large
  else if (server.slowlog_log_slower_than/1000 >
            server.latency_monitor_threshold)
            {
                advise_slowlog_tuning = 1;
                advices++;
            }
            advise_slowlog_inspect = 1;  // Suggest checking the slowlog log results
            advise_large_objects = 1;  // Suggest avoiding the use of bigkey
            advices += 2;
        }
}


Therefore, it is great that the createLatencyReport function provides targeted suggestions while calculating latency statistics. This is also a suggestion from the perspective of Redis developers and is more targeted.

Okay, so far we have learned about the implementation of the latency monitoring framework. Next, let's learn about the implementation of slow command logging in Redis.
## Implementation of Slow Command Log

Redis uses a simple method to record slow command logs, which is by using a list to store the commands that exceed the slow command log execution time threshold.

In the data structure `redisServer` of the global variable `server` in Redis, there is a member variable of list type called `slowlog`, which is used to store the slow command log list, as shown below:

```c
struct redisServer {
…
list *slowlog;
…}

The code for implementing the slow command log recording functionality can be found in the slowlog.c file. The main function here is slowlogPushEntryIfNeeded, which has the following prototype:

void slowlogPushEntryIfNeeded(client *c, robj **argv, int argc, long long duration)

From the code, you can see that this function’s parameters include the current executed command and its arguments argv, as well as the duration of the command execution.

The logic of this function is not complex. It checks if the duration of the current command execution duration is greater than the value specified in the redis.conf configuration file for the slow command log threshold slowlog-log-slower-than. If it is greater, it calls the function slowlogCreateEntry to create a slow command log entry for this command and uses the listAddNodeHeader function to add this log entry to the head of the log list, as shown below:

// Check if the duration of the current command execution exceeds the configured threshold
if (duration >= server.slowlog_log_slower_than)
   listAddNodeHead(server.slowlog, slowlogCreateEntry(c,argv,argc,duration));

Of course, if there are too many log entries in the log list, it will consume more memory resources. Therefore, when adding log entries, the slowlogPushEntryIfNeeded function also checks whether the length of the entire log list exceeds the configured slowlog-max-len. If it does, it removes the log entry at the end of the list, as shown below:

// If the log list exceeds the threshold length, delete the log entry at the end of the list
while (listLength(server.slowlog) > server.slowlog_max_len)
    listDelNode(server.slowlog,listLast(server.slowlog));

Now, we understand the basic logic of the main function slowlogPushEntryIfNeeded for recording slow command log entries. Next, let’s take a look at the slowlogCreateEntry function that is called when recording log entries.

This function is used to create a slow command log entry. The data structure of a slow command log entry is slowlogEntry, as shown below:

typedef struct slowlogEntry {
    robj **argv;     // Command and arguments of the log entry
    int argc;        // Number of command and arguments for the log entry
    long long id;       // Unique ID of the log entry
    long long duration;  // Execution duration of the command for this log entry (in microseconds)
    time_t time;        // Execution timestamp of the command for this log entry
    sds cname;          // Name of the client that sent the command for this log entry
    sds peerid;         // Network address of the client that sent the command for this log entry
} slowlogEntry;

From the definition of slowlogEntry, you can see that it records the slow command and its arguments, as well as the client network address that sent the command. The advantage of this design is that when analyzing the slow command log, we can directly see the slow command itself and its arguments, as well as the information about the client that sent the command. This information is helpful for troubleshooting the cause and origin of slow commands.

For example, if we find that the recorded command has a large number of parameters, it may be an operation on a big key.

Of course, considering limited memory resources, when the slowlogCreateEntry function creates a slow command log entry, it also checks the number of command parameters. If the number of command parameters exceeds the value defined by the macro SLOWLOG_ENTRY_MAX_ARGC (default is 32), it will not record the parameters beyond this threshold, but instead record the number of remaining parameters. This way, the slow command log entry will record some command parameters, which is helpful for troubleshooting, while avoiding recording too many parameters and occupying too much memory.

The code below shows the basic execution logic of slowlogCreateEntry:

slowlogEntry *slowlogCreateEntry(client *c, robj **argv, int argc, long long duration) {
    slowlogEntry *se = zmalloc(sizeof(*se)); // Allocate space for the log entry
    int j, slargc = argc;  // Number of parameters to be recorded, default is the number of parameters for the current command
     
    // If the number of parameters for the current command exceeds the threshold, only record the threshold number of parameters
    if (slargc > SLOWLOG_ENTRY_MAX_ARGC) slargc = SLOWLOG_ENTRY_MAX_ARGC;
    se->argc = slargc;
    ...
    for (j = 0; j < slargc; j++) {  // Record the command and parameters one by one
       if (slargc != argc && j == slargc-1) {  // If the number of command parameters exceeds the threshold, use the last parameter to record the actual number of remaining parameters
          se->argv[j] = createObject(OBJ_STRING,
                    sdscatprintf(sdsempty(),"... (%d more arguments)",
                    argc-slargc+1));
        } else {
            ...  // Fill the command parameters in the log entry
        }
    }
    ... // Fill in information such as command execution duration and client address in the log entry
}

Alright, now you understand the implementation of the slow command log. Lastly, it’s important to note that the slow command log only records information about commands that exceed the execution time threshold, and does not record multiple events like a latency monitoring framework. Therefore, the function slowlogPushEntryIfNeeded that records the log will only be called in the command execution function call (in the server.c file), as shown below:

void call(client *c, int flags) {
...
start = server.ustime; // Start timing before command execution
c->cmd->proc(c);  // Actual command execution
duration = ustime()-start; // Calculate the execution time of the command
...
if (flags & CMD_CALL_SLOWLOG && c->cmd->proc != execCommand) {
    ...
    // Call the slowlogPushEntryIfNeeded function to record slow command
    slowlogPushEntryIfNeeded(c,c->argv,c->argc,duration);
}
...}

Summary #

In today’s lesson, I introduced you to Redis’s implementation of the delay monitoring framework and slow command logging. You should know that Redis’s source code samples five types of events that may slow down Redis during their execution. Once the execution time of these events exceeds the threshold, the monitoring framework records the sample results for subsequent analysis. This monitoring method, which samples and records delay events, is worth studying.

The implementation of slow command logging is relatively simple. It records commands whose runtime exceeds the threshold in a list. This list includes the command and its parameters, as well as information about the client that sent the command. This makes it convenient for operations and maintenance personnel to view and analyze.

Of course, the delay monitoring framework implemented in Redis’s source code mainly focuses on events that cause delays to increase. The recorded delay events are tightly coupled with the operations that may slow down Redis during its runtime. In addition, the INFO command in Redis provides real-time monitoring information about Redis’s runtime. However, you should know that the implementation of the INFO command mainly uses the member variables of the global variable “server” to record the real-time operational status or resource usage of the Redis instance.

One question per lesson #

In the Redis call function that executes commands, why doesn’t it call the slowlogPushEntryIfNeeded function to log slow commands specifically for the EXEC command?