07 Why Stream Employed Radix Tree

07 Why Stream Employed Radix Tree #

In this lesson, let’s continue discussing how Redis’s Stream data type saves messages from the perspective of low-level data structures.

Redis has supported the Stream data type since version 5.0, which can be used to save message data and help us implement a basic message queue with read and write functionality. When I talked about “How to Implement a Message Queue with Redis,” many students expressed their interest in learning about the implementation of Stream in order to understand the operational characteristics of the internal structure. However, they were not familiar with the Stream type and didn’t know how messages are saved using underlying data structures.

In order to save memory, Stream’s underlying data structure uses both Radix Tree and listpack to store messages. I introduced listpack in Lesson 6, which is a compact list that saves memory while storing data.

Therefore, in today’s lesson, I will introduce another data structure used by Stream: Radix Tree. The main feature of this data structure is that it is suitable for saving data with common prefixes, thus achieving the goal of saving memory and supporting range queries.

Similar to common B-trees or B+ trees, Radix Trees are also important tree structures used in operating systems and databases. Understanding the design and implementation of Radix Trees can help us master the implementation strategy of Stream and apply Radix Trees to scenarios where memory needs to be saved in ordered tree-like indexes, further resolving the problem of memory overhead when storing a large amount of data with common prefixes.

Alright, next, let’s first understand the characteristics of the message data saved by Stream. This is also an important consideration for Redis to use Radix Tree and listpack as the underlying structures for saving messages.

Characteristics of Stream Message Data #

First, as a message queue, Stream typically has the following two characteristics for the messages it stores:

  • A message consists of one or more key-value pairs.
  • Each inserted message corresponds to a message ID.

Generally, we let Redis server automatically generate incrementing message IDs. At this time, the message ID is composed of a timestamp and a sequence number. The timestamp represents the server’s current time in milliseconds when the message is inserted, and the sequence number represents the order of message insertion within the current millisecond.

For example, if I execute the following operations in a Redis instance, I can continuously insert 5 messages into a message stream named “devmsg”. Each message records the temperature information of a device ID.

127.0.0.1:6379> XADD devmsg * dev 3 temp 26
"1628172536845-0"
127.0.0.1:6379> XADD devmsg * dev 5 temp 28
"1628172545411-0"
127.0.0.1:6379> XADD devmsg * dev 8 temp 24
"1628172553528-0"
127.0.0.1:6379> XADD devmsg * dev 1 temp 25
"1628172560442-0"
127.0.0.1:6379> XADD devmsg * dev 5 temp 26
"1628172565683-0"

From the inserted data and the returned results above, we can see that for the Stream type, the data it needs to store also has two characteristics:

  • The message IDs of continuously inserted messages have a large number of common prefixes. For example, the first 8 digits of the message IDs of the 5 messages just inserted are all “16281725”.
  • For continuously inserted messages, the keys in their corresponding key-value pairs are usually the same. For example, for the 5 messages just inserted, the keys in their messages are “dev” and “temp”.

So, given these two data characteristics of Stream, what kind of data structure should we design to store these message data?

You might think of using a hash table, where each message ID corresponds to a key in the hash table, and the message content corresponds to the value of this key. However, as mentioned earlier, message IDs and keys in messages often have duplicate parts. If a hash table is used, it will result in a lot of redundant data, wasting Redis’ valuable memory space.

Therefore, in order to save memory space to the maximum extent, Stream uses two memory-friendly data structures: listpack and Radix Tree. Among them, the message ID is used as the key in the Radix Tree, and the specific message data is stored using listpack. The message data is saved together with the message ID as the value in the Radix Tree.

You can take a look at the following definition of the Stream structure, where messages are saved using the Radix Tree type structure rax.

typedef struct stream {
    rax *rax;               // Radix Tree to store messages
    uint64_t length;        // Number of messages in the message stream
    streamID last_id;       // ID of the last inserted message in the current message stream
    rax *cgroups;           // Consumer group information for the current message stream, also saved using Radix Tree
} stream;

Alright, so what exactly is the structure of a Radix Tree? Let’s learn about the basic structure of a Radix Tree next.

The Basic Structure of Radix Tree #

Radix Tree is a type of prefix tree. The prefix tree, also known as Trie Tree, has the characteristic that each key saved in the tree is split into individual characters and saved in the nodes of the tree one by one. The root node of the prefix tree does not save any characters, and each node, except for the root node, only saves one character. When we concatenate the characters on the path from the root node to the current node, we can obtain the value of the corresponding key.

The following diagram shows a simple prefix tree. You can take a look. The prefix tree in the diagram has two leaf nodes. By concatenating the characters on the path from the root node to these two leaf nodes, we get two keys: “read” and “real”.

Furthermore, from the diagram, we can see that the prefix tree separates and shares the common prefixes of the saved keys (i.e., “r,” “e,” “a”). This avoids storing the same characters repeatedly in the tree.

If we don’t use this method and simply save these two keys in a hash table, the common prefixes of the keys will be stored separately, resulting in wasted memory space. Therefore, compared to the storage method of hash tables, prefix trees can save memory space effectively, which is very important for Redis.

Shortcomings of Prefix Trees and Improvements of Radix Tree #

Of course, in a prefix tree, each node only saves one character, which allows the sharing of common prefixes among different keys as much as possible. However, this can also lead to the wastage of space and a decrease in query performance because some strings in the keys are no longer shared but are still stored in the form of individual characters in each node.

Let me give you an example. Let’s say we have five keys: “radix,” “race,” “read,” “real,” and “redis.” Their layout in the prefix tree is shown in the following diagram.

For the key “redis,” it shares the characters “r” and “e” with “read” and “real,” and “r” with “radix” and “race,” which means that the “r” and “e” nodes point to multiple child nodes. Similarly, “real” and “read” share the “r,” “e,” and “a” prefixes, and the “a” node also points to multiple child nodes. Therefore, it is necessary to save “r,” “e,” and “a” separately in the prefix tree nodes.

However, if we look at the key “redis,” apart from sharing the “r” and “e” characters with other keys, the “dis” after “re” is no longer shared with other keys. So there is no need to split “dis” into individual characters “d,” “i,” and “s” and save them separately; they can be combined and saved together.

So far, you can see that in the prefix tree, indeed, some characters need to be saved separately to be shared as common prefixes for different keys, but in fact, some single-character nodes can be merged with other single-character nodes to further save space.

From a more general perspective, starting from a node in a prefix tree, if each node between that node and another node has only one child node, it indicates that the characters corresponding to these nodes are no longer shared with other nodes. If we still create a node for each character and save them according to the prefix tree method, it will not only waste memory space but also, during querying, require matching each node’s represented character one by one, which will affect query performance. Therefore, in a radix tree, if the connections between a series of single-character nodes are unique, then these single-character nodes can be merged into one node, and this structure of the tree is called a Radix Tree, also known as a radix tree. Compared to a prefix tree, Radix Tree can save memory usage and improve query access efficiency.

I have drawn the following diagram, which shows the layout of the 5 keys (radix, race, read, real, and redis) on the Radix Tree. You can compare it with the layout on the prefix tree.

Radix Tree Layout

Radix Tree Data Structure #

From the structure of the Radix Tree introduced earlier, we can find that there are two types of nodes in the Radix Tree.

The first type of node is non-compressed node, which contains multiple pointers to different child nodes and the characters corresponding to those child nodes. For example, the node “r” in the Radix Tree example above contains pointers to the child nodes “a” and “e”. Also, if the string from the root node to a non-compressed node on the path already corresponds to a key stored in the Radix Tree, then this non-compressed node also contains a pointer to the value corresponding to that key.

For example, the following diagram shows the node “r” from the example above, which is a non-compressed node that points to two child nodes, “a” and “e”. This non-compressed node contains pointers to the child nodes “a” and “e”. Additionally, the header HDR stored at the beginning of the non-compressed node is the metadata of the Radix Tree node data structure, which I will explain in detail later.

Non-compressed Node

The second type of node is compressed node, which contains a pointer to a child node and the merged string represented by that child node. For example, the node “e” from the Radix Tree example points to a child node that represents the merged string “dis”. Similar to non-compressed nodes, if the string from the root node to a compressed node on the path already corresponds to a key stored in the Radix Tree, then this compressed node also contains a pointer to the value corresponding to that key.

The following diagram shows a compressed node that contains a pointer to a child node representing the merged string “is”. So in this compressed node, the merged string “is” is stored. Similar to non-compressed nodes, the header HDR of compressed nodes also stores the metadata of the Radix Tree node structure.

Compressed Node

Since the header HDR of both types of nodes contains metadata, let’s take a look at what information is included in this metadata.

First, we need to understand the node data structure of the Radix Tree. The node structure of the Radix Tree is defined in the raxNode in the rax.h file, as shown below:

typedef struct raxNode {
typedef struct raxNode {
    uint32_t iskey:1;     // whether the node contains a key
    uint32_t isnull:1;    // whether the value of the node is NULL
    uint32_t iscompr:1;   // whether the node is compressed
    uint32_t size:29;     // the size of the node
    unsigned char data[]; // the actual stored data of the node
} raxNode;

The member variables in this structure include 4 metadata, and the meanings of these four metadata are as follows:

  • iskey: Indicates whether the string composed of characters from the root node to the current node represents a complete key. If it does, then the value of iskey is 1. Otherwise, the value of iskey is 0. However, it is worth noting that the key represented by the current node does not include the content of the node itself.
  • isnull: Indicates whether the current node is a null node. If the current node is a null node, then there is no need to allocate memory space for a pointer to the value.
  • iscompr: Indicates whether the current node is a compressed node or a non-compressed node.
  • size: Indicates the size of the current node, and the specific value will vary depending on whether the node is a compressed node or a non-compressed node. If the current node is a compressed node, this value represents the length of the compressed data; if it is a non-compressed node, this value represents the number of child nodes that the node points to.

These 4 metadata correspond to the HDR of the compressed node and the non-compressed node introduced earlier, where iskey, isnull, and iscompr are each represented by 1 bit, and size takes up 29 bits.

In addition, from the raxNode structure, we can also see that, in addition to the metadata, the structure also has a char type array called data. We know that data is used to store actual data. However, the data stored here will vary depending on the type of the current node:

  • For a non-compressed node, the data array includes the characters represented by the child nodes, pointers to the child nodes, and value pointers corresponding to the node when it represents a key.
  • For a compressed node, the data array includes the merged string represented by the child nodes, pointers to the child nodes, and value pointers when the node represents a key.

Now, you may have noticed that in the implementation of the raxNode, both the compressed node and the non-compressed node actually have two characteristics:

  • The key they represent is the string formed by the characters from the root node to the current node on the path, but it does not include the current node itself.
  • They already include the characters or merged strings represented by the child nodes. And for their child nodes, they are also non-compressed or compressed nodes, so the child nodes themselves will also save the characters or merged strings represented by their child nodes.

These two characteristics bring two changes to the structure of the Radix Tree when actually storing data.

On one hand, the non-leaf nodes of the Radix Tree are either compressed nodes, which point to a single child node, or non-compressed nodes that point to multiple child nodes, but each child node represents only one character. Therefore, non-leaf nodes cannot simultaneously point to child nodes representing single characters and child nodes representing merged strings.

Let me give you an example. In the left half of the following figure, the child node a of node r, both of its child nodes represent merged strings “dix” and “ce”. Therefore, the raxNode structure of node a cannot simultaneously point to the dix child node and the ce child node. Similarly, the child node e of node r, its two child nodes, one represents a single character “a”, and the other represents the merged string “dis”. The raxNode structure of node e cannot simultaneously point to these two child nodes.

Therefore, when storing data using the raxNode structure, the node dix will be split into nodes d and ix, the node ce will be split into nodes c and e, and the node dis will be split into nodes d and is, as shown in the right half of the figure. In this way, the child nodes a and e of node r can be represented by the structure of non-compressed nodes.

 ┌─"r"─┬─"e"─┬─"d"─┬─"is"─┐
 │      │     │     ├─"ix"─┤
 │      │     │     ├─"c"──┤
 │      │     ├─"a"──────┤
 │      ├─────┤
 ├──────┤

On the other hand, for the leaf nodes of the Radix Tree, because they have no child nodes, Redis uses an raxNode node that does not include a child node pointer to represent leaf nodes, which means that the raxNode of a leaf node has size 0 and does not have a child node pointer. If the leaf node represents a key, then its raxNode will also save a pointer to the value data of this key.

It is worth noting that, to meet the needs of memory alignment, raxNode will fill in some bytes after the saved string based on the string length, which is the padding part shown in the image.

Now that you understand the content of raxNode structure for different types of nodes in the Radix Tree, let’s learn about the basic operation functions of the Radix Tree next.

Operations of Radix Tree #

The basic operations of Radix Tree are implemented in the rax.c file, including the following:

  • raxNew function

The prototype of this function is as follows. It calls the rax_malloc function to allocate a new rax structure.

rax *raxNew(void)

The definition of the rax structure is as follows. It contains the number of keys and nodes in the Radix Tree, as well as a pointer to the head node. The raxNew function calls the raxNewNode function to create the head node.

typedef struct rax {
    raxNode *head;  // Pointer to the head of Radix Tree
    uint64_t numele;  // Number of keys in Radix Tree
    uint64_t numnodes;  // Number of raxNodes in Radix Tree
} rax;
  • raxNewNode function

The prototype of this function is as follows. It is used to create a new uncompressed node. The parameter “children” represents the number of children of the uncompressed node, and the parameter “datafield” indicates whether a space should be allocated for the value pointer.

raxNode *raxNewNode(size_t children, int datafield)

It is worth noting that compressed nodes are not created by the raxNewNode function, but rather by the raxCompressNode function.

  • raxGenericInsert function

The prototype of this function is as follows. It is used to insert a string “s” of length “len” into the Radix Tree.

int raxGenericInsert(rax *rax, unsigned char *s, size_t len, void *data, void **old, int overwrite)
  • raxLowWalk function

The prototype of this function is as follows. It is called when searching, inserting, or deleting nodes in the Radix Tree.

static inline size_t raxLowWalk(rax *rax, unsigned char *s, size_t len, raxNode **stopnode, raxNode ***plink, int *splitpos, raxStack *ts)
  • raxGetData/raxSetData functions

The prototypes of these two functions are as follows. They are used to get the value pointer stored in raxNode and set the value pointer stored in raxNode, respectively.

void *raxGetData(raxNode *n)
void raxSetData(raxNode *n, void *data)

Now that we have understood the basic operations of Radix Tree, let’s take a look at how Stream combines Radix Tree and listpack for its usage.

How to Combine Radix Tree and listpack in Stream? #

As we know, when viewing the data saved in Stream as key-value pairs, the message ID is equivalent to the key, and the message content is equivalent to the value. In other words, Stream uses Radix Tree to store message IDs and saves the message content in a listpack, serving as the value of the message ID and pointing to the corresponding listpack with the value pointer of raxNode.

Here, I’ve included a diagram showing the relationships between the Stream structure, rax, raxNode, and listpack. Please note that in this diagram, we assume that there is only one streamID serving as the key.

We can see that the pointer rax in the stream structure points to the head node of the Radix Tree, which is the rax structure. The head pointer in the rax structure further points to the first raxNode. Since we assume that there is only one streamID and no other streamID shares the prefix with it, the current streamID can be saved using a compressed node.

Then, the first raxNode points to the next raxNode, which is the leaf node of the Radix Tree. The size of this node is 0, and its value pointer points to the actual message content.

In terms of message content, it is saved using listpack. As you can see, the listpack uses a master entry to store the key in key-value type messages, and the value is saved after the master entry. This storage method is actually used to save memory space because many message keys are the same, so only one copy needs to be saved. I will continue to provide detailed explanations about this design method of separating message keys and values and saving them in listpack in the upcoming courses about Stream.

Summary #

In today’s lesson, I taught you about the underlying implementation structure of the Redis Stream data type. Now you know that the main purpose of Stream is to store message data.

Each message has a message ID composed of a timestamp and sequence number, as well as a key-value pair for the message content. Because different message IDs usually share a common prefix in the timestamp, it would be inefficient to save each message ID separately using a data structure like a hash table, resulting in wasted space. Therefore, in order to save memory space, Stream uses a Radix Tree to store message IDs and uses listpack to store the message content.

In the design and implementation of the Radix Tree, understanding the overall structure and node data structure is important. So, you should pay attention to the non-compressed and compressed node types of the Radix Tree, as well as the actual data structure raxNode in the source code.

In addition, to better understand non-compressed and compressed nodes, let me summarize their similarities and differences. You can also review them as a whole.

Their similarities are:

  • Both have a node header HDR that stores metadata.
  • Both contain pointers to child nodes and the strings represented by the child nodes.
  • If the string on the path from the root node to the current node is a key in the Radix Tree, both will include a pointer to the corresponding value for the key.

Their differences are:

  • Non-compressed nodes point to child nodes, where each child node represents one character. Non-compressed nodes can point to multiple child nodes.
  • Compressed nodes point to a child node that represents a merged string. Compressed nodes can only point to one child node.

In addition to learning about raxNode, I also introduced you to several basic operations in the Radix Tree and demonstrated how the Stream data type stores message IDs and message content in the Radix Tree and listpack, respectively.

Here, it is important to note that Radix Tree effectively saves memory when storing data with common prefixes. Additionally, Radix Tree itself is an ordered tree-based index that supports single-point and range queries. So, by storing message IDs in the Radix Tree, Redis can save memory space and efficiently support querying of message IDs. Listpack, on the other hand, is a compact list that not only saves a large amount of message content but also effectively saves memory.

Therefore, I hope that through the use of Radix Tree and listpack in Stream, you can apply them to appropriate scenarios involving message storage or storage of large amounts of strings.

One question per course #

As an ordered index, Radix Tree can also provide range queries. Compared with the B+ tree we use in our daily life and the skiplist introduced in [Lesson 5], what do you think are the advantages and disadvantages of Radix Tree?

Feel free to share your answer and thought process in the comment section. If you find it helpful, you are also welcome to share today’s content with more friends.