16 Does the Lfu Algorithm Have Advantages Over Other Algorithms

16 Does the LFU Algorithm Have Advantages Over Other Algorithms #

In the last lesson, I introduced to you Redis’ approximate implementation of the cache eviction strategy LRU algorithm. In fact, Redis introduced the LFU algorithm, which stands for Least Frequently Used, in version 4.0. When evicting data, the LFU algorithm eliminates the least frequently accessed data. On the other hand, the LRU algorithm eliminates the least recently used data, which also appears to be infrequently accessed. So, what are the differences between the LFU and LRU algorithms? Do we need to use the LFU algorithm in practical scenarios?

In fact, it is not easy to differentiate between these two algorithms just by their basic definitions. Therefore, in today’s lesson, I will guide you to learn about the design and implementation of the LFU algorithm from the source code level. This way, you will have a better understanding of the advantages and applicable scenarios of the LFU algorithm, enabling you to make appropriate choices when setting cache eviction strategies for Redis.

Alright, before diving into the implementation code of the LFU algorithm, let’s first take a look at the basic principles of the LFU algorithm. This will better support us in understanding the execution logic of the code.

Basic Principles of the LFU Algorithm #

Because the LFU algorithm selects data to be evicted based on the frequency of data access, LFU algorithm records the access count of each data. When data is accessed again, the access count of that data is increased.

However, access count and access frequency are not entirely the same. Access frequency refers to the number of accesses within a certain period of time, which means that when calculating the access frequency, we not only need to record the access count but also the length of time in which these accesses were performed. Otherwise, if we only record the access count, we would lack the time dimension information and thus would not be able to evict data based on frequency.

Let me give you an example. Let’s assume that data A was accessed 15 times within a 15-minute period, and data B was accessed 10 times within a 5-minute period. If we only count the access count, data A would have a higher count than data B, so data B would be evicted first. However, if we calculate based on access frequency, data A would have a frequency of 1 access per minute and data B would have a frequency of 2 accesses per minute. Therefore, based on access frequency, data A should be evicted.

So, when implementing the LFU algorithm, we need to be able to track the access frequency of data, rather than simply recording the access count.

Next, let’s learn how Redis implements the LFU algorithm.

Implementation of LFU Algorithm #

Firstly, similar to the LRU algorithm we introduced last class, the LFU algorithm is enabled by setting the maxmemory and maxmemory-policy options in the Redis configuration file redis.conf. The maxmemory option sets the maximum memory capacity that Redis can use, while the maxmemory-policy option can be set to allkeys-lfu or volatile-lfu to indicate whether the eviction should apply to all keys or only to keys with an expiration time.

The implementation of the LFU algorithm can be divided into three parts: recording the access frequency of key-value pairs, initializing and updating the access frequency of key-value pairs, and evicting data using the LFU algorithm. Now let’s take a look at how the access frequency of key-value pairs is recorded.

Recording the Access Frequency of Key-Value Pairs #

From our previous study of the LRU algorithm, we now know that each value of a key-value pair corresponds to a redisObject structure, which includes a 24-bit lru variable. In the implementation of the LRU algorithm, the lru variable is used to record the access timestamp of the data. Since Redis server can only set the maxmemory-policy option to use one eviction strategy at a time, the LRU algorithm and the LFU algorithm are not used simultaneously. To save memory overhead, the Redis source code reuses the lru variable to record the access frequency information required by the LFU algorithm.

Specifically, when the lru variable is used to record the required information for the LFU algorithm, the lower 8 bits of the 24-bit lru variable are used as a counter to record the number of accesses to the key-value pair, and the higher 16 bits of the 24-bit lru variable are used to record the access timestamp. The following figure shows the contents of the lru variable when recording the access frequency. Please take a look.

Now that we understand how the access frequency required by the LFU algorithm is recorded, let’s take a look at how the access frequency of key-value pairs is initialized and updated.

Initialization and Update of Access Frequency of Key-Value Pairs #

First of all, we need to know that the LFU algorithm and LRU algorithm are actually executed in the same entry function. In the previous class, we learned about the basic steps of the LRU algorithm, which include the initialization of data access information, updating of access information, and actual eviction. These steps correspond to the entry functions shown in the following table. You can also review the content of the previous class.

After understanding these entry functions, we can easily find the corresponding functions for the implementation of the LFU algorithm.

For the initialization of the access frequency of key-value pairs, when a key-value pair is created, the createObject function is called to allocate space for the redisObject structure and set the initial values. If Redis sets the maxmemory-policy option to the LFU algorithm, then the initialization value of the lru variable in the redisObject structure of the key-value pair will consist of two parts:

  • The first part is the higher 16 bits of the lru variable, which is a UNIX timestamp with a precision of 1 minute. This is calculated by calling the LFUGetTimeInMinutes function (in the evict.c file).
  • The second part is the lower 8 bits of the lru variable, which is set to the macro definition LFU_INIT_VAL (in the server.h file) with a default value of 5.

You will find that this is consistent with what I just introduced about recording the access frequency of key-value pairs. This means that when using the LFU algorithm, the lru variable includes both the access timestamp and the access count of the key-value pair. The following code also demonstrates the execution logic for this part, please take a look.

robj *createObject(int type, void *ptr) {
    robj *o = zmalloc(sizeof(*o));
    ...
    // When using the LFU algorithm, the `lru` variable includes a UNIX timestamp with a precision of one minute and an access count of 5
    if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
        o->lru = (LFUGetTimeInMinutes()<<8) | LFU_INIT_VAL;
    } else {
        o->lru = LRU_CLOCK();  // Setting for the LRU algorithm
    }
    return o;
}

Now, let’s take a look at the update of the access frequency of key-value pairs.

When a key-value pair is accessed, Redis calls the lookupKey function to perform the lookup. When the maxmemory-policy option is set to use the LFU algorithm, the lookupKey function will call the updateLFU function to update the access frequency (i.e., the value of the lru variable) of the key-value pair, as shown below:

robj *lookupKey(redisDb *db, robj *key, int flags) {
...
if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
                updateLFU(val); // Call the updateLFU function to update the access frequency when using the LFU algorithm
} else {
                val->lru = LRU_CLOCK(); // Call LRU_CLOCK when using the LRU algorithm
}
...

The updateLFU function is implemented in the db.c file and consists of three steps.

The first step is to decay the access count based on the elapsed time.

The updateLFU function first calls the LFUDecrAndReturn function (in the evict.c file) to decay the access count of the key-value pair, as shown below:

void updateLFU(robj *val) {
    unsigned long counter = LFUDecrAndReturn(val);
    ...
}

At this point, you may have a question: isn’t the access count of the key-value pair supposed to increase when accessed? Why is it being decayed first?

In fact, as I mentioned earlier, the LFU algorithm eliminates data based on the access frequency, not just the access count. The access frequency takes into consideration the duration of the key-value pair’s access. The longer the time since the previous access to the key-value pair, the lower the access frequency.

Let me give you an example. Suppose data A is accessed 30 times within the time period T to T+10 minutes. Then, the access frequency of data A during this time period can be calculated as 3 times per minute (30 times / 10 minutes = 3 times per minute).

Next, during the time period from T+10 minutes to T+20 minutes, data A is not accessed. At this point, if we calculate the access frequency of data A during the time period from T to T+20 minutes, its access frequency will decrease to 1.5 times per minute (30 times / 20 minutes = 1.5 times per minute). And so on, as time passes, if data A is not accessed again after T+10 minutes, its access frequency will gradually decrease. This is called access frequency decay.

Because Redis uses the access count in the lru variable to represent the access frequency, every time the access frequency of a key-value pair is updated, the access count is decayed by the LFUDecrAndReturn function.

Specifically, the LFUDecrAndReturn function first retrieves the previous access time of the key-value pair, which is stored in the high 16 bits of the lru variable. Then, based on the value of the global variable server.lru_decay_time, the LFUDecrAndReturn function calculates the decay size, num_periods.

This calculation checks if the value of lfu_decay_time is 0. If lfu_decay_time is 0, the decay size is also 0, and the access count is not decayed.

Otherwise, the LFUDecrAndReturn function calls the LFUTimeElapsed function (in the evict.c file) to calculate the elapsed time since the previous access to the key-value pair. This time is also calculated with a precision of 1 minute. With the elapsed time since the previous access, the LFUDecrAndReturn function divides it by the value of lfu_decay_time and uses the result as the decay size of the access count.

Here, you should note that the value of the lfu_decay_time variable is determined by the lfu-decay-time configuration option in the redis.conf file. When Redis is initialized, the initServerConfig function sets the value of the lfu_decay_time variable, with a default value of 1. So, by default, the decay size of the access count is equal to the number of minutes that have elapsed since the previous access. For example, if the previous access was 10 minutes ago, then by default, the decay size of the access count is 10.

Of course, if the number of minutes since the previous access has exceeded the value of the access count, the access count will be set to 0, indicating that the key-value pair has not been accessed for a long time.

The code below shows the execution logic of the LFUDecrAndReturn function:

unsigned long LFUDecrAndReturn(robj *o) {
    unsigned long ldt = o->lru >> 8; // Get the previous access time of the key-value pair
    unsigned long counter = o->lru & 255; // Get the current access count
    unsigned long num_periods = server.lfu_decay_time ? LFUTimeElapsed(ldt) / server.lfu_decay_time : 0; // Calculate the decay size

Please note that this translation assumes the reader is familiar with the key concepts of Redis and the LFU (Least Frequently Used) algorithm.

if (num_periods)   // If the decay size is not 0
    // If the decay size is less than the current access count, then the access count after decay is the current access count minus the decay size; otherwise, the access count after decay is 0
    counter = (num_periods > counter) ? 0 : counter - num_periods;
return counter;   // If the decay size is 0, return the original access count
}

 

In this way, after the execution of the LFULogIncr function, the access count of the key-value pair is updated.

Step 3: Update the value of the variable lru.

Finally, at this step, the updateLFU function has completed the update of the access count of the key-value pair. Then, it will call the LFUGetTimeInMinutes function to obtain the current timestamp, combine it with the updated access count array, and assign the result to the lru variable of the key-value pair, as shown below:

void updateLFU(robj *val) {
    ...
    val->lru = (LFUGetTimeInMinutes()<<8) | counter;
}

So far, you have learned that when Redis updates the access frequency of key-value pairs, it first attenuates the access count based on the duration between the last access and the current time. Then, it increases the access count with a certain probability. This design method takes into account the influence of the access time on the access frequency and avoids interference from the 8-bit counter on the access count. As for the access time, Redis also gets the latest access timestamp and updates it to the lru variable.

Now let’s take a look at how Redis eliminates data based on the LFU algorithm.

Data Elimination with LFU Algorithm #

When implementing data elimination using the LFU algorithm, Redis uses the same method as approximate LRU algorithm. That is, Redis uses a global array called EvictionPoolLRU to store the set of key-value pairs to be eliminated. Then, in the processCommand function that handles each command, it calls the freeMemoryIfNeededAndSafe function and the freeMemoryIfNeeded function to execute the specific data elimination process.

As I mentioned in the previous lesson, you can review the basic process. Here, I will briefly summarize it again, which can be divided into three steps:

  • Step 1: Call the getMaxmemoryState function to calculate the amount of memory to be released;
  • Step 2: Call the evictionPoolPopulate function to randomly sample key-value pairs and insert them into the EvictionPoolLRU set for elimination;
  • Step 3: Traverse the EvictionPoolLRU set of key-value pairs to select the actually eliminated data and delete them.

Although this basic process is the same as the LRU algorithm, you should note that when eliminating data with the LFU algorithm, the evictionPoolPopulate function in Step 2 uses a different method to calculate the idle time for each candidate key-value pair to be eliminated.

Specifically, when implementing the LRU algorithm, each element in the EvictionPoolLRU set of candidate key-value pairs uses the member variable “idle” to record the idle time since the last access.

However, when implementing the LFU algorithm, because the LFU algorithm attenuates and increases the access count, it approximates the access frequency with the access count. Consequently, the LFU algorithm actually calculates the value of the “idle” variable for each element in the EvictionPoolLRU array by subtracting the access count of the key-value pair from 255. Moreover, before calculating the value of the “idle” variable, the LFU algorithm calls the LFUDecrAndReturn function to attenuate the access count of the key-value pair once, so as to more accurately reflect the access frequency of the data when selecting the eliminated data.

The following code illustrates the process of calculating the “idle” variable value with the LFU algorithm:

if (server.maxmemory_policy & MAXMEMORY_FLAG_LRU) {
    idle = estimateObjectIdleTime(o);
} else if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
    idle = 255-LFUDecrAndReturn(o);
}

Therefore, when the LFU algorithm calculates the value of the “idle” variable for each element in the EvictionPoolLRU array based on the access frequency, the larger the access count of the key-value pair, the smaller the value of “idle”, and vice versa. The elements in the EvictionPoolLRU array are sorted in ascending order based on the “idle” value. Finally, when the freeMemoryIfNeeded function traverses the EvictionPoolLRU array in descending order of the “idle” value and selects the actually eliminated key-value pairs, it will select key-value pairs with smaller access counts, i.e., eliminate the key-value pairs with lower access frequency.

As a result, Redis completes the operation of eliminating data based on access frequency.

Summary #

In this lesson, I mainly introduced you to the LFU (Least Frequently Used) cache eviction strategy used in Redis. The LFU algorithm eliminates data based on the access frequency of key-value pairs. Unlike using the access count to eliminate data, using access frequency not only requires counting the number of accesses, but also considering the duration since the last recorded access.

Therefore, based on this design consideration, when implementing the LFU algorithm in the Redis source code, the lru variable in the redisObject structure of the key-value pair will record both the access count and the access timestamp. When the key-value pair is accessed again, the access count in the lru variable will first decay based on the duration since the last access to the current time, and then increment.

However, the access count of the key-value pair can only be recorded in the limited 8 bits of the lru variable, with a maximum value of 255. In this case, if the access count is incremented by 1 every time a key-value pair is accessed, the access count will easily reach its maximum value, making it impossible to distinguish different access frequencies.

In order to distinguish different access frequencies, the LFU algorithm uses the method of increasing the access count probabilistically. This means that the larger the existing access count of a key-value pair, the more difficult it is to further increase the access count.

You should also be aware that in terms of the execution process of the LFU algorithm, it has the same basic execution process as the LRU (Least Recently Used) algorithm. This includes the entry function, calculation of the memory space to be freed, updating the set of candidate key-value pairs for eviction, and selecting the actual data to be evicted. The difference is that in the set of candidate key-value pairs for eviction, the LFU algorithm sorts and selects the data to be evicted based on the access frequency of the key-value pairs, which also conforms to the requirements of the LFU algorithm itself.

Moreover, precisely because the LFU algorithm eliminates data based on the access frequency and the access frequency decays over time, the LFU algorithm is more likely to eliminate cold data with low access frequencies earlier than other algorithms. This is also its suitable scenario.

Finally, looking at the implementation code of the LFU algorithm, I think the two design methods adopted by Redis: time-based decay of access count and probabilistic increase of access count, can serve as good reference examples when implementing software modules that operate based on access frequency. You can learn from and use them in your own implementation scenarios.

One Question per Lesson #

When the LFU algorithm initializes the access count of key-value pairs, it sets the access count to LFU_INIT_VAL, which has a default value of 5. So, based on the code introduced in this lesson, what would happen if LFU_INIT_VAL were set to 1?