22 How to Recognize and Apply Redis Internal Data Structures

22 How to Recognize and Apply Redis Internal Data Structures #

In the previous lesson, we learned about Redis protocol parsing and processing. Now, let’s take a look at what Redis’s internal data structures look like.

Redis Internal Data Structures #

RedisDb #

img

All data in Redis is stored in a DB. By default, Redis supports up to 16 DBs. Each DB in Redis corresponds to a redisDb structure. That means each Redis instance has 16 redisDbs by default. The default DB used when accessing Redis is DB 0, but you can switch between different DBs using the SELECT $dbID command.

img

redisDb mainly consists of two core dictionaries, three non-core dictionaries, dbID, and other auxiliary attributes. The two core dictionaries include a main dictionary called dict and an expires dictionary called expires. The main dictionary is used to store all the data in the current DB. It associates keys with values of various data types, and this dictionary is also called the key space. The expires dictionary is used to store the expiration time for each key, mapping keys to their expiration time. In daily data storage and access operations, we will mostly access these two dictionaries in redisDb.

The three non-core dictionaries include a blocking dictionary called blocking_keys, a unblock dictionary called ready_keys, and a watch dictionary called watched_keys.

When executing blocking commands like BLPOP, BRPOP, or BRPOPLPUSH on a list in Redis, if the corresponding list is empty, Redis will set the corresponding client to a blocking state and add the client to the blocking_keys dictionary in the DB. So, this dictionary stores the keys and client lists that are in a blocking state.

When another client adds elements to a list that is being monitored, Redis checks whether there are any clients blocking on this key. It does this by checking if the key is present in the blocking_keys dictionary. If it is, Redis will add this key to the ready_keys dictionary. It also saves this key in a list called read_keys in the server. This allows for efficient and non-redundant insertion and polling.

When a client uses the WATCH command to monitor a key, both the key and the client will be stored in the watched_keys dictionary. The redisDb can store all data types, and all data types in Redis are stored in a structure called redisObject.

redisObject #

img

redisObject consists of five fields.

  • type: the data type of the Redis object. Currently, there are 7 supported types:

  • * OBJ_STRING
    
    • OBJ_LIST
    • OBJ_SET
    • OBJ_ZSET
    • OBJ_HASH
    • OBJ_MODULE
    • OBJ_STREAM
  • encoding: the internal encoding of the Redis object, which represents the internal data structure type. Currently, there are 10 different encoding methods available, including:

  • * OBJ_ENCODING_RAW
    
  • OBJ_ENCODING_INT

  • OBJ_ENCODING_HT

  • OBJ_ENCODING_ZIPLIST, etc.

  • LRU: Stores data for LRU time or LFU frequency and time for eviction.

  • refcount: Records the reference count of the Redis object, indicating the number of times the object is shared. When the count reaches 0, it means the object is no longer in use and will be freed to reclaim memory.

  • ptr: Points to the internal data structure of an object. For example, for an object representing a string, ptr may point to an sds or a long integer.

dict #

As mentioned earlier, Redis data is actually stored in two core dict dictionaries in the DB. Dict is also a widely used internal data structure in Redis.

img

The dict in Redis is similar to the hashtable in Memcached. It can be used for fast insertion, updating, and locating of keys or elements. The dict dictionary has an array of two hash tables, with the 0th hash table being used for everyday access. If the elements in the 0th hash table become too many, a space of double the size of the 0th hash table is allocated to the 1st hash table, and then a gradual migration occurs. The rehashidx field is specifically used to mark the migration position. When performing hash table operations, a singly-linked list is used to resolve hash conflicts. The dict also has an important field called type, which is used to store the hash function and key/value assignment and comparison functions.

The table in dictht is an array of hash tables, and each bucket points to a dictEntry structure. dictht uses a singly-linked list of dictEntry to resolve hash conflicts.

img

dictht stores key-value mappings using dictEntry. The key is an sds string, and the value is a redisObject structure that stores various data types.

dict can be used by redisDb to store key-value data and auxiliary information for command operations. It can also be used as an internal data structure for some Redis data types. dict can be used as the internal data structure for the set collection. If the number of elements in the hash exceeds 512 or the value in the hash is larger than 64 bytes, dict is also used as the internal data structure for the hash type.

sds #

Strings are the most common data type in Redis, and their underlying implementation is a simple dynamic string called sds. A simple dynamic string is essentially a char *, managed internally by sdshdr. sdshdr has 4 fields: len, which represents the actual length of the string; alloc, which represents the total memory size currently allocated for the byte array; flags, which records the attributes of the byte array; and buf, which stores the actual value of the string and ends with a \0.

img

The buf storage of sds can dynamically expand or shrink, and the length of the string can be obtained directly without iterating. It is convenient to modify and access the string. Since the string is stored in the buf array in sds and the length is defined by len, instead of stopping at 0 like a traditional string, sds is binary safe and can store any binary data.

img

Getting the length of a simple dynamic string sds is very convenient. It can be obtained directly through len, while for traditional strings, it is necessary to iterate through the string, resulting in a time complexity of O(n).

Compared to traditional strings, sds introduces an sdshdr, which is a non-negligible overhead for a large number of very short strings. Starting from the 3.2 version, sds will select different data structures based on the actual length of the string to improve memory efficiency. Currently, the sdshdr structure is divided into 5 subtypes: sdshdr5, sdshdr8, sdshdr16, sdshdr32, sdshdr64. Among them, sdshdr5 only has the flags and buf fields, while the other subtypes have len and alloc of different types from uint8_t to uint64_t to save memory space.

sds can be used as an internal data structure for strings, and it is also the internal data structure for hyperloglog and bitmap types.

ziplist #

In order to save memory and reduce memory fragmentation, Redis designed the ziplist compressed list internal data structure. The ziplist is a continuous block of memory that can store multiple elements without redundancy. It is a sequential memory structure composed of continuous blocks of memory.

img

The structure of the ziplist is shown in the diagram, which mainly includes five parts:

  1. zlbytes is the total number of memory bytes occupied by the ziplist.
  2. zltail is the number of bytes from the tail node to the start position.
  3. zllen is the total number of nodes/memory blocks it contains.
  4. Entry is the various data nodes stored in the ziplist, and the lengths of these data points are arbitrary.
  5. Zlend is a magic number 255 used to mark the end of the ziplist.

As shown in the diagram, a ziplist containing 4 elements occupies a total of 100 bytes of memory. The pointer to the starting element of this ziplist is p, and zltail is 80. Therefore, the pointer to the 4th element is p+80.

img

The structure of the storage node of the ziplist ziplist shown in the diagram has 6 fields:

  • prevRawLen is the length of the previous node.
  • preRawLenSize is the number of bytes required to encode prevRawLen.
  • len is the length of the current node.
  • lensize is the number of bytes required to encode len.
  • encoding is the encoding type used by the current node.
  • entryData is the data of the current node.

Since the ziplist is stored in a continuous and compact manner without redundancy, inserting new elements requires reallocating and extending memory. Therefore, if the space occupied by the ziplist is too large, the cost of reallocating and copying will be high. Therefore, ziplist is not suitable for storing too many elements or too large strings.

Therefore, ziplist is only used as the internal data structure for hash and zset when the number of elements and values is not large. The limit for using ziplist as the internal data structure for hash is that the number of elements does not exceed 512 by default, and the value does not exceed 64 bytes by default. These two thresholds, hash_max_ziplist_entries and hash_max_ziplist_value, can be adjusted by modifying the configuration.

For sorted sets (zset), the limitation for using ziplist as the internal data structure is that the number of elements does not exceed 128 by default, and the value does not exceed 64 bytes by default. These two thresholds, zset_max_ziplist_entries and zset_max_ziplist_value, can be adjusted by modifying the configuration.

quicklist #

Redis introduced quicklist after version 3.2 to replace linkedlist. In linkedlist, each node occupies 16 bytes due to the presence of previous and next pointers, and each node is allocated memory separately, which easily exacerbates memory fragmentation. Ziplist, on the other hand, is not suitable for storing too many elements or values that are too large because adding elements requires reallocating memory and deleting elements requires memory copying.

Quicklist, a rapid list, was created to address these issues. It is a doubly-linked list based on ziplist. Data is stored in ziplists that are then connected using double pointers. The structure of quicklist is shown in the diagram.

  • head and tail are two pointers that point to the first and last ziplist nodes, respectively.
  • count represents the total number of elements in the quicklist.
  • len represents the number of ziplist nodes.
  • compress represents the LZF compression depth.

In the quicklist, the management of ziplists is done by the quicklistNode structure. quicklistNode mainly contains a prev/next double pointer and a ziplist node. A single ziplist node can hold multiple elements.

Reading and writing data from the head and tail of the quicklist is fast, with a time complexity of O(1). It also supports inserting or reading/writing elements from any position within the quicklist, but the speed is slower, with a time complexity of O(n). Currently, the quicklist is mainly used as an internal data structure for lists.

zskiplist #

A skip list, zskiplist, is an ordered data structure that speeds up access by maintaining multiple pointers to other nodes at each node. The skip list supports average O(logN) and worst-case O(n) complexity for node lookup. In most scenarios, the efficiency of a skip list is similar to that of a balanced tree, but the implementation of a skip list is simpler, so many programs use skip lists instead of balanced trees.

A skip list consists of a zskipList and zskiplistNode nodes. The structure of zskipList is shown in the diagram. header points to the head node of the skip list, and tail points to the tail node. length represents the length of the skip list, which is the number of nodes in the skip list excluding the head node. level represents the maximum level among all nodes in the skip list, excluding the head node.

The structure of a zskiplistNode in the skip list is shown in the diagram. ele represents the value of the node and, in the case of a sorted set, it represents the field element in the set. score represents the score of the node, which determines the order of nodes in the skip list from smallest to largest. backward is a pointer that points to the previous node of the current node. level represents the level of the node, and each node generally has multiple levels. Each level has two properties: forward and span. forward is a forward pointer used to point to the node in the direction of the tail, and span is the distance from the forward pointer to the current node.

The diagram shows an example of a skip list with three nodes. The corresponding element values are S1, S2, and S3, and the corresponding score values are 1.0, 3.0, and 5.0, respectively. The maximum level of the S3 node is 5, and the level of the skip list is also 5. header points to the head node, and tail points to the tail node. While searching for an element, the sum of spans on the path gives the position of the element. In a skip list, elements must be unique, but scores can be the same. Different elements with the same score are sorted in lexicographic order.

In the sorted set data type, if the number of elements is large or the elements are large in size, Redis chooses to use a skip list as the internal data structure of the sorted set.

Redis mainly uses the following internal data structures for the previously mentioned eight data types:

  • For the string type, Redis mainly uses sds for storage.
  • For the list type, Redis uses quicklist for storage.
  • For the set type, Redis uses dict for storage.
  • For the sorted set type, if the elements are fewer than 128 and the length of the elements is smaller than 64, Redis uses ziplist for storage; otherwise, it uses zskiplist for storage.
  • For the hash type, if the number of elements is fewer than 512 and the length of the elements is smaller than 64, Redis uses ziplist for storage; otherwise, it uses dict for storage.
  • For hyperloglog, Redis uses sds (simple dynamic string) for storage.
  • For geo, if the number of locations is fewer than 128, Redis uses ziplist for storage; otherwise, it uses zskiplist for storage.
  • For bitmap, Redis uses sds (simple dynamic string) for storage.

Apart from these main internal data structures, Redis also uses other internal structures for specific scenarios. For example, if the strings being operated on are integers and the commands are incr, decr, etc., Redis stores the strings as long integers. However, due to time constraints, these scenarios will not be further elaborated on.