03 How to Implement a High Performance Hash Table

03 How to Implement a High-Performance Hash Table #

Today, let’s talk about Hash in Redis.

We know that a Hash table is a critical data structure that plays an important role in computer systems. For example, in Memcached, a Hash table is used to index data, and in database systems, a Hash table is used to assist SQL queries. In the case of Redis key-value database, a Hash table is not only a value type in key-value pairs, but Redis also uses a global Hash table to store all key-value pairs, which satisfies the storage and retrieval of Hash structure data in applications and provides fast query functionality.

One of the main reasons why Hash tables are widely used is that theoretically, they allow for fast data retrieval with a complexity of O(1). By calculating the Hash function, the Hash table can quickly locate the data and perform operations on it, which makes data operations very fast.

The structure of a Hash table is not difficult to understand. However, when applying a Hash table in practice, its performance is often affected by two issues: Hash collisions and rehash overhead. Both of these issues arise when the amount of data to be stored in the Hash table exceeds its capacity.

So how do we address these two issues? In fact, they are often asked by interviewers in big companies. So now you can think about how you would answer these two questions if you encountered them in an interview.

Okay, let’s stop here for now. Now I will tell you how Redis effectively solves these two problems.

Redis provides us with a classic implementation solution for Hash tables. In terms of Hash collisions, Redis adopts chained Hashing to link data with the same hash value together, allowing them to still be queried in the table without resizing it. Regarding rehash overhead, Redis implements progressive rehashing to gradually reduce the additional overhead caused by rehashing and minimize its impact on system performance.

In this lesson, I will take you through the design principles and implementation methods for Hash tables in Redis. This will help you master the ability to handle Hash collisions and optimize rehash operations, allowing you to create high-performance Hash tables in scenarios where large amounts of data need to be stored.

Alright, next, let’s talk about the design and implementation of chained Hashing.

How does Redis implement chained hash? #

Before we start learning about the design and implementation of chained hash, we need to understand the structure of the hash table in Redis, as well as why hash collisions occur when the data size increases. This will help us better understand the solution of chained hash to hash collisions.

What is hash collision? #

In fact, the simplest form of a hash table is an array, where each element in the array is a hash bucket. The first element in the array is designated as bucket 0, and so on. When a key-value pair’s key is calculated by the hash function and then modulo the number of array elements, the position of the key-value pair in the array is determined, which is the corresponding hash bucket.

As shown in the figure below, after key1 is calculated and modulo, it corresponds to bucket 1. Similarly, key3 and key16 correspond to bucket 7 and bucket 4 respectively.

From the figure, we can also see that there are a total of 16 keys to be written into the hash table, while the size of the hash table is only 8 elements. This will cause some keys to be mapped to the same hash bucket.

In practical applications of hash tables, it is generally difficult to estimate the amount of data to be stored. If we create a very large hash table initially, it will result in wasteful space when the data volume is small. Therefore, we usually set an initial size for the hash table, and when the data volume increases, the size of the key space will exceed the size of the hash table.

It is precisely because the key space is larger than the hash table that when using the hash function to map the keys to the hash table, different keys will inevitably be mapped to the same position in the array. And if only one key-value pair can be stored at the same position, the data stored in the hash table will be very limited. This is what we commonly call hash collision.

For example, in the figure below, both key3 and key100 are mapped to bucket 5 of the hash table. In this case, when bucket 5 can only store one key, either key3 or key100 would not be able to be stored in the hash table.

So how do we solve hash collisions? There are two possible solutions:

The first solution is the chained hash that I’m going to introduce next. However, it should be noted that the length of the chain in chained hash should not be too long, otherwise it will degrade the performance of the hash table.
The second solution is to use rehash when the length of the chain in the chained hash reaches a certain length. However, rehash itself is expensive, so we need to use the progressive rehash design that I will introduce later.

Now let’s get to know the design and implementation of the chained hash.

How is chained hash designed and implemented? #

The so-called chained hash is to connect the keys mapped to the same bucket in the hash table with a linked list. Now let’s take a look at how Redis implements chained hash and why chained hash can help solve hash collisions.

First, we need to understand the implementation of the hash table in the Redis source code. The main files related to the implementation of the hash table in Redis are dict.h and dict.c. In dict.h, the hash table is defined as a two-dimensional array (dictEntry table), and each element in the array is a pointer to a hash entry (dictEntry). The dict.h file defines the structure of the hash table, hash entry, and various operation functions of the hash table, while the dict.c file contains the specific implementation code of various hash table operations.

In the dict.h file, the hash table is defined as a two-dimensional array (dictEntry table), and each element in the array is a pointer to a hash entry (dictEntry). The following code shows the definition of the hash table in the dict.h file:

typedef struct dictht {
    dictEntry **table; // Two-dimensional array
    unsigned long size; // Size of the hash table
    unsigned long sizemask;
    unsigned long used;
} dictht;

So, in order to implement chained hashing, Redis in the design of each dictEntry structure besides containing pointers to the key and value, also includes a pointer to the next hash entry. As shown in the code below, the dictEntry structure contains a *pointer next that points to another dictEntry structure, which is used to implement chained hashing:

typedef struct dictEntry {
    void *key;
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next;
} dictEntry;

In addition to the pointer used to implement chained hashing, there is another thing worth noting here, which is that in the dictEntry structure, the value of the key-value pair is defined by a union v. This union v contains a pointer *val that points to the actual value, as well as an unsigned 64-bit integer, a signed 64-bit integer, and a double value.

The reason I want to remind you of this is to illustrate that this implementation method is a memory-saving development technique that is very worth learning. When the value is an integer or a double-precision floating-point number, since they are already 64 bits in size, they can be stored directly in the key-value pair structure instead of using a pointer, thereby saving memory space.

Well, now that you understand the implementation of chained hashing in Redis, you may still not quite understand why this chained hashing can help resolve hash collisions.

Don’t worry, let me explain it with the previous example. Both key3 and key100 are mapped to bucket 5 in the hash table. When using chained hashing, bucket 5 will not only store key3 or key100, but instead connect key3 and key100 with a linked list, as shown in the diagram below. When more keys are mapped to bucket 5, these keys can be linked together with a linked list to handle hash collisions.

In this way, when we want to search for key100, we can first calculate its hash value using the hash function and see that it is mapped to bucket 5. Then, we can compare the keys in bucket 5 one by one until we find key100. This way, we can find the desired hash entry in the chained hash.

However, chained hashing also has limitations. As the length of the linked list increases, the time it takes to query a hash entry in a specific position of the hash table increases, which leads to a decrease in the overall query time of the hash table and a decrease in performance.

So, is there any other way to reduce the impact on the performance of the hash table? Of course, there is. This is what I’m going to introduce to you next, the design and implementation of rehash.

How does Redis implement rehash? #

The rehash operation refers to expanding the space of the hash table. The basic idea of Redis’s implementation of rehash is as follows:

First, Redis prepares two hash tables to alternately store data during rehashing.

As I mentioned before, Redis defines the hash table using the dictht structure in the dict.h file. However, when using the hash table, Redis defines a dict structure in the dict.h file. This structure contains an array (ht[2]) that includes two hash tables ht[0] and ht[1]. The code definition of the dict structure is as follows:

typedef struct dict {
    ...
    dictht ht[2]; // Two hash tables used for rehashing
    long rehashidx; // Indication of whether the hash table is undergoing rehashing, -1 indicates no rehashing
    ...
} dict;

Next, during normal service request phase, all key-value pairs are written to hash table ht[0] initially.
Then, during rehashing, the key-value pairs are migrated to hash table ht[1].
Finally, after the migration is complete, the space of ht[0] is released, and the address of ht[1] is assigned to ht[0], while the table size of ht[1] is set to 0. This brings us back to the normal service request phase, with ht[0] receiving and servicing requests, while ht[1] serves as the migration table for the next rehash.

Here is a diagram to help you understand the process of using ht[0] and ht[1] alternately:

Now that you have understood the basic idea of Redis using two hash tables to implement rehash, let’s clarify the problems that need to be addressed when implementing rehash. I believe there are three main points:

When to trigger rehash?
How much to expand during rehash?
How to execute rehash?

So, next, I will guide you to learn Redis’s code implementation for these three issues one by one. Through the code implementation, you will understand the design ideas of Redis for these three issues.

When to trigger rehash? #

First of all, it is important to know that the function Redis uses to determine whether to trigger rehash is _dictExpandIfNeeded. So, let’s first take a look at the expansion conditions in the _dictExpandIfNeeded function, and then understand where the _dictExpandIfNeeded function is called.

In fact, the _dictExpandIfNeeded function defines three expansion conditions.

Condition 1: The size of ht[0] is 0.
Condition 2: The number of elements in ht[0] has exceeded the size of ht[0], and the hash table can be expanded.
Condition 3: The number of elements in ht[0] is dict_force_resize_ratio times the size of ht[0], where dict_force_resize_ratio is 5 by default.

The following code shows the definitions of these three conditions in the _dictExpandIfNeeded function:

// If the hash table is empty, expand it to the initial size
if (d->ht[0].size == 0)
   return dictExpand(d, DICT_HT_INITIAL_SIZE);

// If the number of elements in the hash table exceeds its current size and the hash table can be expanded,
// or if the number of elements in the hash table is five times the current size
if (d->ht[0].used >= d->ht[0].size && (dict_can_resize ||
              d->ht[0].used / d->ht[0].size > dict_force_resize_ratio))
{
    return dictExpand(d, d->ht[0].used * 2);
}

So, for condition 1, the hash table is empty at this time, so Redis needs to set the hash table space to the initial size, but this is an initialization task and does not belong to the rehash operation.

Conditions 2 and 3 correspond to the scenario of rehash. Because in these two conditions, the number of elements carried by the hash table (d->ht[0].used) and the current set size of the hash table (d->ht[0].size) are compared, and this ratio is generally called the load factor. In other words, the conditions for Redis to perform rehash are whether the load factor is greater than or equal to 1 and whether it is greater than 5.

In fact, when the load factor is greater than 5, it indicates that the hash table is already overloaded and needs to be expanded immediately. When the load factor is greater than or equal to 1, Redis will also check the value of the dict_can_resize variable to see if resizing is currently allowed. You might be wondering what the value of the variable dict_can_resize is. In fact, this variable is set in two functions, dictEnableResize and dictDisableResize, and its purpose is to enable or disable the hash table rehashing. Here is the code:

void dictEnableResize(void) {
    dict_can_resize = 1;
}

void dictDisableResize(void) {
    dict_can_resize = 0;
}

These two functions are then encapsulated in the updateDictResizePolicy function.

The updateDictResizePolicy function is used to enable or disable the rehashing and resizing feature. This function calls the dictEnableResize function to enable resizing if there is no RDB child process and no AOF child process. This corresponds to the scenario where Redis is not performing an RDB snapshot or AOF rewriting. Here is the code:

void updateDictResizePolicy(void) {
    if (server.rdb_child_pid == -1 && server.aof_child_pid == -1)
        dictEnableResize();
    else
        dictDisableResize();
}

So now we understand the trigger conditions for _dictExpandIfNeeded for rehashing. Now let’s see which functions in Redis call _dictExpandIfNeeded for this check.

First, by looking at the function dict.c, we can see that _dictExpandIfNeeded is called by _dictKeyIndex, and _dictKeyIndex is called by dictAddRaw, which is then called by the following three functions:

dictAdd: Adds a key-value pair to the hash table.
dictReplace: Adds a key-value pair to the hash table or modifies it if it already exists.
dictAddorFind: Calls dictAddRaw directly.

Therefore, whenever we write a new key-value pair or modify an existing one in Redis, it checks whether rehashing is needed. You can refer to the diagram below which shows the relationships of function calls involving _dictExpandIfNeeded.

In summary, the key triggers for rehashing in Redis are the functions _dictExpandIfNeeded and updateDictResizePolicy. _dictExpandIfNeeded determines whether rehashing is needed based on the load factor of the hash table and the resize flag, while updateDictResizePolicy enables or disables rehashing based on the execution status of RDB and AOF.

Now let’s move on to the second problem Redis needs to solve when implementing rehashing: how much should the hash table be expanded during rehash?

How much does the hash table expand during rehashing? #

In Redis, the expansion of the hash table during rehashing is done through the dictExpand function. This function takes two parameters: the hash table to be expanded and the target size to expand to. Here is the prototype of the dictExpand function:

int dictExpand(dict *d, unsigned long size);

For a hash table, we can determine whether it needs to be expanded based on the previous mentioned _dictExpandIfNeeded function. Once it’s determined that expansion is needed, Redis simply uses the following rule to expand the hash table during rehashing: if the current used space of the table is size, then expand the table to size 2.

For example, when _dictExpandIfNeeded determines that rehashing is needed, it calls dictExpand to expand the table. As you can see, the expansion size of rehashing is 2 times the current used size of ht[0].

dictExpand(d, d->ht[0].used*2);

In the dictExpand function, the actual execution is done by the _dictNextPower function. The code below shows the expansion of the hash table, which starts from the initial size (DICT_HT_INITIAL_SIZE) and keeps multiplying it by 2 until the target size is reached.

static unsigned long _dictNextPower(unsigned long size)
{
    // Initial size of the hash table
    unsigned long i = DICT_HT_INITIAL_SIZE;
    // If the size to expand exceeds the maximum value, return the maximum value plus 1
    if (size >= LONG_MAX) return LONG_MAX + 1LU;
    // If the size to expand is within the maximum value
    while(1) {  // Repeatedly expanding the hash table size

        // If the expansion size is greater than or equal to the maximum value, return the current expanded size
        if (i >= size)
            return i;
        // Each expansion step multiplies the current size by 2
        i *= 2;
    }
}

Now, let’s take a look at the third problem that Redis needs to solve, which is how rehash is implemented. In essence, how does Redis achieve progressive rehash design?

How is progressive rehash implemented? #

Before we dive into this, let’s first understand why progressive rehash is needed.

The reason is that when a hash table is rehashed, due to the expansion of the hash table space, keys that were originally mapped to a certain position may be mapped to a new position. Therefore, many keys need to be copied from the original position to the new position. During the key copying process, the main thread of Redis is blocked from executing other requests, resulting in rehash overhead.

To reduce the rehash overhead, Redis proposes the method of progressive rehash.

Simply put, progressive rehash means that Redis does not copy all the keys in the current hash table to the new position at once, but rather copies them in batches. Each key copying only copies the hash entries in one bucket of the hash table. This way, the duration of each key copying is limited, and the impact on the main thread is also limited.

So, how is progressive rehash implemented in code? There are two key functions involved: dictRehash and _dictRehashStep.

Let’s start with the dictRehash function, which is responsible for the actual key copying. It takes two input parameters: the global hash table (i.e., the dict structure mentioned earlier, which includes ht[0] and ht[1]) and the number of buckets to be copied.

The logic of the dictRehash function consists of two parts:

First, the function performs a loop to carry out the key migration for the specified number of buckets n. If all the data in the ht[0] hash table has already been migrated, the key copying loop will stop executing.
Second, after n buckets have been copied, the second part of the dictRehash function checks if all the data in ht[0] has been migrated. If it has, the space occupied by ht[0] will be released. Since the main thread of Redis uses ht[0] in the code logic, even though the data is in ht[1], Redis will still assign ht[1] to ht[0] so that other parts of the code logic can use it normally. After ht[1] is assigned to ht[0], its size will be reset to 0, waiting for the next rehash. At the same time, the rehashidx variable in the global hash table will be set to -1, indicating that the rehash process has ended (the rehashidx variable is used to indicate the progress of rehash, which I will explain in more detail later).

I have created the following diagram to illustrate the main execution flow of the dictRehash function. You can take a look at it.

You can also refer to the following code to understand the main execution logic of the dictRehash function.

int dictRehash(dict *d, int n) {
    int empty_visits = n*10;
    ...
    // Main loop, executes n key migrations or stops when all data in ht[0] is migrated
    while(n-- && d->ht[0].used != 0) {
        ...
    }
    // Check if all data in ht[0] has been migrated
    if (d->ht[0].used == 0) {
        // After ht[0] is migrated, free the memory space of ht[0]
        zfree(d->ht[0].table);
        // Assign ht[1] to ht[0] to accept normal requests
        d->ht[0] = d->ht[1];
        // Reset the size of ht[1] to 0
        _dictReset(&d->ht[1]);
        // Set rehashidx in the global hash table to -1, indicating the end of rehash
        d->rehashidx = -1;
        // Return 0, indicating that all elements in ht[0] have been migrated
        return 0;
    }
    // Return 1, indicating that there are still elements in ht[0] that have not been migrated
    return 1;
}

Now that we understand the main logic of the dictRehash function, let’s take a look at how progressive rehash copies data at the bucket granularity level. This is closely related to the rehashidx variable in the global hash table dictionary structure.

The rehashidx variable indicates which bucket is being migrated during rehashing. For example, when rehashidx is equal to 0, it means that the first bucket in ht[0] is being migrated. When rehashidx is equal to 1, it means that the second bucket in ht[0] is being migrated, and so on.

The main loop of the dictRehash function first checks if the bucket pointed to by rehashidx is empty. If it is empty, the value of rehashidx is incremented and the next bucket is checked. So, is it possible for consecutive buckets to be empty? It is indeed possible. In such cases, the progressive rehashing does not continuously increase the rehashidx for checking. This is because once the rehashing is executed, the Redis main thread cannot handle other requests.

Therefore, during the execution of progressive rehashing, a variable called empty_visits is set to keep track of the empty buckets that have been checked. Once a certain number of empty buckets have been checked, the current rehashing round stops and the Redis server continues to process incoming requests, avoiding any impact on Redis performance. The following code demonstrates this logic:

while (n-- && d->ht[0].used != 0) {
    // If the current bucket to be migrated is empty
    while (d->ht[0].table[d->rehashidx] == NULL) {
        // move to the next bucket
        d->rehashidx++;
        // decrease the empty_visits counter
        if (--empty_visits == 0) return 1;
    }
    ...
}

If the rehashidx bucket contains data to be migrated, Redis will retrieve the hash entries from that bucket one by one. It will then recalculate the hash entry’s bucket position in ht[1] based on the size of ht[1], and assign the hash entry to the corresponding bucket in ht[1].

After migrating each hash entry, the used variables in ht[0] and ht[1], which indicate the number of hash entries in each table, are respectively decreased and increased. If the data in the current rehashidx bucket has been fully migrated, rehashidx is incremented to point to the next bucket. The following code demonstrates this migration process:

while (n-- && d->ht[0].used != 0) {
    ...
    // Get the hash entry from the hash table
    de = d->ht[0].table[d->rehashidx];
    // If the `rehashidx` bucket is not empty
    while (de) {
        uint64_t h;
        // Get the next hash entry in the same bucket
        nextde = de->next;
        // Calculate the bucket position for the current hash entry in the expanded `ht[1]`
        h = dictHashKey(d, de->key) & d->ht[1].sizemask;
        // Add the current hash entry to the expanded `ht[1]`
        de->next = d->ht[1].table[h];
        d->ht[1].table[h] = de;
        // Decrease the number of hash entries in the current hash table
        d->ht[0].used--;
        // Increase the number of hash entries in the expanded hash table
        d->ht[1].used++;
        // Move to the next hash entry
        de = nextde;
    }
    // If there are no more hash entries in the current bucket, set it to NULL
    d->ht[0].table[d->rehashidx] = NULL;
    // Increment `rehashidx` to migrate elements from the next bucket next time
    d->rehashidx++;
}

So far, we have a basic understanding of the dictRehash function.

Now we know that the dictRehash function performs hash entry migration on a per-bucket basis. The number of bucket migrations performed internally is mainly determined by the loop counter n passed in. Whenever Redis needs to perform a rehash operation, it ultimately calls the dictRehash function.

Next, let’s learn about the second key function related to progressive rehashing: _dictRehashStep. This function is responsible for performing rehashing on only one bucket at a time.

From the Redis source code, we can see that a total of 5 functions call the _dictRehashStep function and subsequently call the dictRehash function to perform rehashing. These functions are: dictAddRaw, dictGenericDelete, dictFind, dictGetRandomKey, and dictGetSomeKeys.

Among them, the dictAddRaw and dictGenericDelete functions correspond to adding and deleting key-value pairs in Redis, respectively. The latter three functions correspond to query operations in Redis. The following diagram illustrates the relationships between these functions:

However, note that regardless of the operation (add, delete, or query), when these 5 functions call the _dictRehashStep function, they pass a value of 1 for the loop counter n. The following code shows this parameter passing:

static void _dictRehashStep(dict *d) {
    // dictRehash is called with a loop counter `n` value of 1, indicating that normal operations are performed after each bucket migration.
    if (d->iterators == 0) dictRehash(d,1);
}

In this way, after migrating each bucket, the hash table can perform normal add, delete, and query operations, which is how progressive rehashing is achieved at the code level.

Summary #

Implementing a high-performance Hash table is not only the requirement of Redis, but also an important goal in the development of many computer systems. To achieve a highly efficient Hash table, it is necessary to address the two main problems: hash collisions and rehash overhead.

In today’s lesson, I have guided you in learning about the structure design of the Hash table in Redis, the implementation of the chaining hash method, and the design and implementation of the progressive rehash method. The structure design of the Hash table in Redis is unique, as each hash entry contains a pointer for implementing chaining hash. Additionally, Redis includes two Hash tables in the global Hash table, which helps in migrating data from one table to another during rehash.

Moreover, Redis’ progressive rehash implementation is a generic method for Hash table expansion, which is worth learning. The key to this design method is to migrate a limited number of buckets each time, to avoid the performance impact of migrating all buckets at once. Once you understand the design concept and implementation method of progressive rehash, you can apply it to your own Hash table implementation scenarios.

One question per lesson #

The choice of hash function can affect the query efficiency and hash collision situation of a hash table. So, can you find out which hash function Redis uses for its hash table from the source code?

Feel free to share your answer in the comments section. If you find this helpful, you’re also welcome to share today’s content with more friends.