05 Why Ordered Sets Support Point Query and Range Query at the Same Time

05 Why Ordered Sets Support Point Query and Range Query at the Same Time #

A sorted set is an important data type in Redis. It is a set type that allows elements to have weights and be sorted by those weights.

A fellow developer who worked on Redis development once asked me: why can a sorted set provide both the following operations with complexities of O(logN)+M and O(1) respectively?

  • ZRANGEBYSCORE: Returns elements within a specified range based on their weights.
  • ZSCORE: Returns the weight of a specific element.

The essence of this question is: why can a sorted set support efficient range queries while also retrieving the weight of an element with a complexity of O(1)?

The answer lies in the underlying design and implementation of the sorted set. The sorted set supports range queries because its core data structure is based on skip lists, and it can retrieve the weight of an element in constant time because it also uses a hash table for indexing.

Now, you may be curious about how the sorted set combines these two data structures and how they work together. In today’s lesson, I will introduce the design and implementation of the sorted set’s dual indexing. Understanding and mastering this design concept is crucial for implementing database systems.

Alright, let’s take a look at the basic structure of a sorted set.

Basic Structure of Sorted Set #

To understand the structure of a Sorted Set, you need to read its code file. It is important to note that in the Redis source code, the code file for Sorted Set is not like other data types such as the dict.c/dict.h for hash tables or the ziplist.c/ziplist.h for compressed lists, which have dedicated data structure implementations and definition files.

The implementation code for Sorted Set is in the t_zset.c file, which includes various operations implementation for Sorted Set. The structure definition relevant to Sorted Set is in the server.h file. If you want to understand and learn about the modules and operations of Sorted Set, make sure to look for them in the t_zset.c and server.h files.

Now that you know where the code files for Sorted Set are located, let’s take a look at its structure definition. The structure of Sorted Set is named zset and it contains two members, namely a dictionary dict and a skip list zsl, as shown below.

typedef struct zset {
    dict *dict;
    zskiplist *zsl;
} zset;

As I mentioned at the beginning of this lesson, the design concept of Sorted Set, which utilizes both skip list and hash table indexing structures, is worth learning. This design concept fully leverages the efficient support for range queries (such as the ZRANGEBYSCORE operation) provided by skip lists and the efficient support for point queries (such as the ZSCORE operation) provided by hash tables. This way, we can efficiently support both range queries and point queries in a single data structure, which is difficult to achieve with a single indexing structure.

However, since Sorted Set uses both skip list and hash table indexing structures to organize data, when implementing Sorted Set, we encounter the following two problems:

  • What kind of data is stored in the skip list or hash table?
  • How is the data in the skip list and hash table kept consistent?

Since I have already introduced to you the implementation approach for hash tables in Lesson 3, I will mainly introduce to you the design and implementation of skip lists next. By learning about skip lists, you can understand the data stored in skip lists, as well as the common operations performed on skip lists. Then, I will show you how Sorted Set combines hash tables and skip lists, and how the data in these two indexing structures remains consistent.

Design and Implementation of Skip List #

First, let’s understand what is a skip list.

A skip list is actually a multi-level ordered linked list. In order to illustrate it easily in the course, I will sort the levels in the skip list from low to high. The bottom level is called level 0, followed by level 1, level 2, and so on.

The following image shows a skip list with 3 levels. The head node contains three pointers, which serve as the head pointers for level 0 to level 2.

As you can see, there are a total of 7 nodes on level 0, namely 3, 11, 23, 33, 42, 51, 62. These nodes are connected by pointers, and the level0 pointer of the head node points to node 3. Then, among these 7 nodes, node 11, 33, and 51 each have another pointer, which is also connected in order. The level 1 pointer of the head node points to node 11. In this way, these 3 nodes form all the nodes on level 1.

Finally, node 33 has another pointer, which points to the tail node. At the same time, the level 2 pointer of the head node points to node 33, forming level 2, but there is only 1 node 33 on level 2.

Now that we have an intuitive impression of skip list, let’s take a look at the specific data structure for implementing skip list.

Skip List Data Structure #

Let’s first look at the structure definition of a skip list node as shown below.

typedef struct zskiplistNode {
    // The element in the Sorted Set
    sds ele;
    // The score value of the element
    double score;
    // Backward pointer
    struct zskiplistNode *backward;
    // The level array of the node, which stores the forward pointers and spans on each level
    struct zskiplistLevel {
        struct zskiplistNode *forward;
        unsigned long span;
    } level[];
} zskiplistNode;

First, because in the Sorted Set, we need to store both the element and its score, the corresponding variables in the skip list node structure are ele of type sds and score of type double. In addition, in order to facilitate reverse lookup from the tail node of the skip list, each skip list node also stores a backward pointer (*backward) pointing to the previous node.

Then, since a skip list is a multi-level ordered linked list, each level is also composed of multiple nodes connected by pointers. Therefore, the structure definition of a skip list node also includes an array called level array of type zskiplistLevel.

Each element in the level array corresponds to a zskiplistLevel structure, which represents a level in the skip list. The zskiplistLevel structure defines a forward pointer (*forward) pointing to the next node on the current level, which allows nodes to be connected on a certain level with subsequent nodes. At the same time, the zskiplistLevel structure also defines a span, which is used to record the number of nodes spanned by the forward pointer on level 0 between this pointer and the node it points to.

Let’s take a look at the following diagram, which shows the level array and span information for node 33. As you can see, the level array of node 33 has three elements, corresponding to the pointers on three levels. In addition, the spans of level 2, level 1, and level 0 are 3, 2, and 1 respectively in the level array.

Skip List Level Array

Finally, because the nodes in the skip list are sorted, for a given node in the skip list, we can accumulate the spans of the forward pointers on the query path from the head node to this node. This accumulated value can be used to calculate the order of this node in the entire skip list. This feature can also be used to implement operations such as rank in Sorted Set, such as ZRANK and ZREVRANK.

Now that we understand the definition of a skip list node, we can take a look at the definition of a skip list. In the skip list structure, the head node and tail node of the skip list, the length of the skip list, and the maximum level of the skip list are defined as follows:

typedef struct zskiplist {
    struct zskiplistNode *header, *tail;
    unsigned long length;
    int level;
} zskiplist;

Since the nodes in the skip list are connected by pointers, when we use the skip list, we only need to obtain the head node or tail node from the skip list structure, and then we can access the nodes in the skip list through the node pointers.

So when we search for an element in the Sorted Set, does the search code need to sequentially compare each node in the list, just like searching a regular linked list?

Actually, no. Because when searching for a node, the skip list will start from the highest level of the head node and look for the next node. Since the skip list nodes store both the element and its score, the skip list has two corresponding conditions for comparison:

  1. If the score stored in the found node is less than the score being searched for, the skip list will continue to access the next node on that level.
  2. If the score stored in the found node is equal to the score being searched for, the skip list will also check whether the SDS data stored in that node is smaller than the SDS data being searched for. If the node data is smaller than the data being searched for, the skip list will still continue to access the next node on that level.

However, when neither of the above two conditions is met, the skip list will use the level array of the current found node. The skip list will use the next level pointer in the level array of the current node and continue the search along that level, which is equivalent to jumping to the next level for further searching.

The code logic for this part is shown below. Since the search function in the skip list is used for search, insert, update, or delete operations in Redis, you can focus on understanding it.

// Skip list node search
Node *node = skip_list->header;
for (int i = skip_list->level - 1; i >= 0; i--) {
    while (node->level[i].forward && 
          (node->level[i].forward->score < target_score ||
          (node->level[i].forward->score == target_score &&
           compare_sds(node->level[i].forward->ele, target_ele) < 0))) {
        node = node->level[i].forward;
    }
}

In the code above, target_score and target_ele represent the score and element being searched for, respectively. The variable compare_sds is a function that compares two SDS data. The skip list is searched from the highest level to the lowest level. In each level, the forward pointer is compared to determine whether to continue to the next node. If neither condition is met, the search jumps to the next lower level through the level array of the current node. // Get the header of the skip list x = zsl->header; // Iterate from the highest level for (i = zsl->level-1; i >= 0; i–) { … while (x->level[i].forward && (x->level[i].forward->score < score || (x->level[i].forward->score == score && sdscmp(x->level[i].forward->ele,ele) < 0))) { … x = x->level[i].forward; } … }

Setting the Level of Skip List Nodes #

With the level array, a skip list node can be accessed on multiple levels. The number of layers in the level array of a node determines the number of levels on which the node can be accessed.

Therefore, when deciding the level of a node, we actually determine the specific number of layers in the level array.

One design method is to make the number of nodes on each layer approximately half of the number on the next layer, as shown in the following illustration. The number of nodes on level 0 is 7, and the number of nodes on level 1 is 3, which is approximately half of the number on level 0. The number of nodes on level 2 is only 1, which is approximately half of the number on level 1.

The benefit of this design method is that when searching from the highest level of the skip list, since the number of nodes on each level is approximately half of the number on the next level, the search process is similar to binary search, and the search complexity can be reduced to O(logN).

However, this design method also has negative impacts, namely, when a new node is inserted or a node is deleted to maintain the ratio of adjacent layer node numbers as 2:1, the nodes at the insertion or deletion position and their subsequent nodes need to adjust their levels. This brings additional overhead.

Let me give you an example to show the impact of not maintaining the ratio of node numbers. Although this avoids adjusting the levels, it increases the query complexity.

First, assume that the skip list currently has 3 nodes with values 3, 11, and 23, as shown in the following illustration.

Then, suppose we want to insert a node with the value 15. If we do not adjust the levels of other nodes and simply insert the node with the value 15, the ratio of the number of nodes on levels 0 and 1 in the skip list becomes 4:1, as shown in the following illustration.

If we continue to insert multiple nodes without adjusting the levels of other nodes, the number of nodes on level 0 will increase, as shown in the following illustration.

Correspondingly, if we want to find a node with a score greater than 11, we need to sequentially search in the nodes on level 0, and the complexity is O(N). Therefore, to reduce the query complexity, we need to maintain the relationship between the numbers of adjacent layer nodes.

Now, let’s see the impact of maintaining the ratio as 2:1.

For example, we can add a layer of pointers to the level array of node 23, as shown in the following illustration. In this way, the ratio of the number of nodes on levels 0 and 1 is maintained as 2:1. However, the cost is that we also need to reallocate space for the level array to add a layer of pointers.

Similarly, if we want to delete node 33 in a skip list with 7 nodes, all the nodes after node 33 need to be adjusted:

The adjusted skip list is shown in the following illustration. You can see that nodes 42 and 62 need to add space for the level array to store pointers on 3 and 2 levels, respectively, while node 51 needs to reduce the number of layers in the level array. That is, such adjustments bring additional operation overhead.

Therefore, to avoid the above problems, a skip list adopts another design method when creating nodes, which is to randomly generate the level for each node. In this case, it is not necessary to maintain a strict 2:1 relationship between the number of nodes on adjacent layer chains. In this way, when a new node is inserted, only the pointers of the previous and subsequent nodes need to be modified, and the levels of other nodes do not need to be changed, thereby reducing the complexity of insertion operations.

In the Redis source code, the level of a skip list node is determined by the zslRandomLevel function. The zslRandomLevel function initializes the level to 1, which is the minimum level of a node. Then, the function generates a random number. If the value of the random number is less than ZSKIPLIST_P (the probability of increasing the level of a skip list node, which is 0.25), the level is increased by 1. Since the probability of the random number falling in the range [0,0.25) is less than 25%, this indicates that the probability of increasing the level by one is not higher than 25%. The following code shows the execution logic of the zslRandomLevel function.

#define ZSKIPLIST_MAXLEVEL 64  // Maximum level is 64
#define ZSKIPLIST_P 0.25       // Probability of the random number is 0.25
int zslRandomLevel(void) {
    // Initialize the level to 1
    int level = 1;
    while ((random()&0xFFFF) < (ZSKIPLIST_P * 0xFFFF))
        level += 1;
    return (level<ZSKIPLIST_MAXLEVEL) ? level : ZSKIPLIST_MAXLEVEL;
}

Now that we have understood the basic structure, search method, and method to set the level of nodes in a skip list, let’s continue to learn how a Sorted Set combines a skip list and a hash table and how it keeps the data consistent in these two index structures.

Combination of Hash Table and Skip List #

In fact, the combination of hash table and skip list is not complicated.

First, we can see from the introduction of the Sorted Set structure that it already contains both of these indexing structures. This is the first step in combining the two. Furthermore, we can also see from the creation code of Sorted Set (in the t_zset.c file) that the skip list and hash table are created in sequence.

When creating a zset, the code calls the dictCreate function to create the hash table in the zset, and calls the zslCreate function to create the skip list, as shown below:

zs = zmalloc(sizeof(*zs));
zs->dict = dictCreate(&zsetDictType,NULL);
zs->zsl = zslCreate();

With these two indexing structures in the Sorted Set, the next step in combining them is to ensure that the data in these two structures remains consistent. In other words, we need to insert data into both the skip list and the hash table when inserting data into the skip list.

Maintaining consistency between the two indexing structures is not difficult. When inserting data into the Sorted Set, the zsetAdd function is called. Therefore, we can understand how to combine the two by reading the element insertion function zsetAdd of the Sorted Set. Let’s analyze the process of the zsetAdd function.

First, the zsetAdd function determines whether the Sorted Set uses ziplist or skiplist as the encoding method. The logic of the zsetAdd function is different depending on the encoding method used by the Sorted Set.

Note that the logic of the zsetAdd function is different depending on whether the Sorted Set uses ziplist or skiplist as the encoding method. Since we are focusing on the skiplist encoding method in this lesson, let’s mainly look at the logic of the zsetAdd function when using the skiplist encoding method.

The zsetAdd function first uses the dictFind function of the hash table to check if the element to be inserted already exists. If it does not exist, the zsetAdd function directly calls the skip list element insertion function zslInsert and the hash table element insertion function dictAdd to insert the new element into the skip list and the hash table, respectively.

It is worth noting that Redis does not embed the hash table operations into the skip list operations themselves. Instead, it executes the two functions in sequence in the zsetAdd function. The benefit of this design is that it maintains the independence of the operations of the skip list and the hash table.

Then, if the zsetAdd function finds that the element to be inserted already exists using the dictFind function, it checks whether the weight of the element needs to be increased.

If the weight value changes, thezsetAdd function calls the zslUpdateScore function to update the weight value of the skip list. Then, the zsetAdd function directs the value of the element (the key in the hash table) in the hash table to the weight value in the skip list node. This way, the weight value of the element in the hash table can be kept up to date.

The following code shows the execution process of the zsetAdd function:

// If the ziplist encoding method is used, the processing logic of the zsetAdd function
if (zobj->encoding == OBJ_ENCODING_ZIPLIST) {
  ...
}
// If the skiplist encoding method is used, the processing logic of the zsetAdd function
else if (zobj->encoding == OBJ_ENCODING_SKIPLIST) {
    zset *zs = zobj->ptr;
    zskiplistNode *znode;
    dictEntry *de;
    // Query the newly added element from the hash table
    de = dictFind(zs->dict,ele);
    // If the element can be found in the hash table
    if (de != NULL) {
        /* NX? Return, same element already exists. */
        if (nx) {
            *flags |= ZADD_NOP;
            return 1;
        }
        // Query the weight of the element from the hash table
        curscore = *(double*)dictGetVal(de);


        // If the weight value needs to be updated
        if (incr) {
            // Update the weight value
            ...
        }


        // If the weight has changed
        if (score != curscore) {
            // Update the skip list node
            znode = zslUpdateScore(zs->zsl,curscore,ele,score);
            // Direct the value of the hash table element to the weight of the skip list node
            dictGetVal(de) = &znode->score; 
            ...
        }
        return 1;
    }
   // If the new element does not exist
    else if (!xx) {
        ele = sdsdup(ele);
        // Insert a new skip list node
        znode = zslInsert(zs->zsl,score,ele);
        // Insert a new hash table element
        serverAssert(dictAdd(zs->dict,ele,&znode->score) == DICT_OK);
        ...
        return 1;
    } 
    ..

In summary, we can remember that the Sorted Set achieves the combination of a skip list and a hash table by simultaneously defining both indexing structures in its data structure. Then, when the Sorted Set performs data insertion or update, it sequentially inserts or updates the corresponding data in the skip list and the hash table, thereby ensuring consistency between the information recorded in the skip list and the hash table.

This way, the Sorted Set can not only support range queries based on the skip list but also directly query the weight of an element based on the hash table.

Summary #

In this lesson, I introduced you to the underlying implementation of the Sorted Set data type. To support range queries based on weights and single point queries on element weights, Sorted Set uses a combination of skip list and hash table in its data structure design.

A skip list is a multi-level sorted linked list, and when performing a query operation in a skip list, the query code can start from the highest level. The higher the level, the fewer the number of nodes, and the greater the span of high-level nodes. Therefore, when querying a node in the higher level, it may already be in the middle of the list.

In this way, the skip list will first check the higher levels, and if it directly finds a node that is equal to the target element, it can return directly. If it encounters the first node that is greater than the target element, it switches to the next level for further querying. Since there are more nodes in the lower levels than in the higher levels, this allows for further searching for the target element among more nodes.

This design of the skip list can save query costs. Additionally, the skip list design uses a random method to determine the level of each node, which avoids the issue of node cascading updates when adding new nodes.

In addition, Sorted Set also stores the elements in a hash table, with the elements serving as the keys of the hash table and the values pointing to the weights of the elements in the skip list. With the use of the hash table, Sorted Set can directly find a specific element and its weight value through hash computation. Compared to searching for a single element through the skip list, the use of the hash table effectively improves the query efficiency.

In conclusion, combining two types of index structures to manage data, such as using skip lists and hash tables in Sorted Sets, is a great design approach. I hope you can apply this in your daily system development as well.

One Question Per Lesson #

When using a double indexing mechanism combining skip list and hash table, while achieving efficient range queries and single point queries, can you think of any drawbacks of this double indexing mechanism?