22 How to Recognize and Apply Redis Internal Data Structures #
In the previous lesson, we learned about Redis protocol parsing and processing. Now, let’s take a look at what Redis’s internal data structures look like.
Redis Internal Data Structures #
RedisDb #
All data in Redis is stored in a DB. By default, Redis supports up to 16 DBs. Each DB in Redis corresponds to a redisDb
structure. That means each Redis instance has 16 redisDb
s by default. The default DB used when accessing Redis is DB 0, but you can switch between different DBs using the SELECT $dbID
command.
redisDb
mainly consists of two core dictionaries, three non-core dictionaries, dbID
, and other auxiliary attributes. The two core dictionaries include a main dictionary called dict
and an expires dictionary called expires
. The main dictionary is used to store all the data in the current DB. It associates keys with values of various data types, and this dictionary is also called the key space. The expires dictionary is used to store the expiration time for each key, mapping keys to their expiration time. In daily data storage and access operations, we will mostly access these two dictionaries in redisDb
.
The three non-core dictionaries include a blocking dictionary called blocking_keys
, a unblock dictionary called ready_keys
, and a watch dictionary called watched_keys
.
When executing blocking commands like BLPOP
, BRPOP
, or BRPOPLPUSH
on a list in Redis, if the corresponding list is empty, Redis will set the corresponding client to a blocking state and add the client to the blocking_keys
dictionary in the DB. So, this dictionary stores the keys and client lists that are in a blocking state.
When another client adds elements to a list that is being monitored, Redis checks whether there are any clients blocking on this key. It does this by checking if the key is present in the blocking_keys
dictionary. If it is, Redis will add this key to the ready_keys
dictionary. It also saves this key in a list called read_keys
in the server. This allows for efficient and non-redundant insertion and polling.
When a client uses the WATCH
command to monitor a key, both the key and the client will be stored in the watched_keys
dictionary. The redisDb
can store all data types, and all data types in Redis are stored in a structure called redisObject
.
redisObject #
redisObject
consists of five fields.
-
type
: the data type of the Redis object. Currently, there are 7 supported types: -
* OBJ_STRING
- OBJ_LIST
- OBJ_SET
- OBJ_ZSET
- OBJ_HASH
- OBJ_MODULE
- OBJ_STREAM
-
encoding
: the internal encoding of the Redis object, which represents the internal data structure type. Currently, there are 10 different encoding methods available, including: -
* OBJ_ENCODING_RAW
-
OBJ_ENCODING_INT
-
OBJ_ENCODING_HT
-
OBJ_ENCODING_ZIPLIST, etc.
-
LRU: Stores data for LRU time or LFU frequency and time for eviction.
-
refcount: Records the reference count of the Redis object, indicating the number of times the object is shared. When the count reaches 0, it means the object is no longer in use and will be freed to reclaim memory.
-
ptr: Points to the internal data structure of an object. For example, for an object representing a string, ptr may point to an sds or a long integer.
dict #
As mentioned earlier, Redis data is actually stored in two core dict dictionaries in the DB. Dict is also a widely used internal data structure in Redis.
The dict in Redis is similar to the hashtable in Memcached. It can be used for fast insertion, updating, and locating of keys or elements. The dict dictionary has an array of two hash tables, with the 0th hash table being used for everyday access. If the elements in the 0th hash table become too many, a space of double the size of the 0th hash table is allocated to the 1st hash table, and then a gradual migration occurs. The rehashidx field is specifically used to mark the migration position. When performing hash table operations, a singly-linked list is used to resolve hash conflicts. The dict also has an important field called type, which is used to store the hash function and key/value assignment and comparison functions.
The table in dictht is an array of hash tables, and each bucket points to a dictEntry structure. dictht uses a singly-linked list of dictEntry to resolve hash conflicts.
dictht stores key-value mappings using dictEntry. The key is an sds string, and the value is a redisObject structure that stores various data types.
dict can be used by redisDb to store key-value data and auxiliary information for command operations. It can also be used as an internal data structure for some Redis data types. dict can be used as the internal data structure for the set collection. If the number of elements in the hash exceeds 512 or the value in the hash is larger than 64 bytes, dict is also used as the internal data structure for the hash type.
sds #
Strings are the most common data type in Redis, and their underlying implementation is a simple dynamic string called sds. A simple dynamic string is essentially a char *, managed internally by sdshdr. sdshdr has 4 fields: len, which represents the actual length of the string; alloc, which represents the total memory size currently allocated for the byte array; flags, which records the attributes of the byte array; and buf, which stores the actual value of the string and ends with a \0.
The buf storage of sds can dynamically expand or shrink, and the length of the string can be obtained directly without iterating. It is convenient to modify and access the string. Since the string is stored in the buf array in sds and the length is defined by len, instead of stopping at 0 like a traditional string, sds is binary safe and can store any binary data.
Getting the length of a simple dynamic string sds is very convenient. It can be obtained directly through len, while for traditional strings, it is necessary to iterate through the string, resulting in a time complexity of O(n).
Compared to traditional strings, sds introduces an sdshdr, which is a non-negligible overhead for a large number of very short strings. Starting from the 3.2 version, sds will select different data structures based on the actual length of the string to improve memory efficiency. Currently, the sdshdr structure is divided into 5 subtypes: sdshdr5, sdshdr8, sdshdr16, sdshdr32, sdshdr64. Among them, sdshdr5 only has the flags and buf fields, while the other subtypes have len and alloc of different types from uint8_t to uint64_t to save memory space.
sds can be used as an internal data structure for strings, and it is also the internal data structure for hyperloglog and bitmap types.
ziplist #
In order to save memory and reduce memory fragmentation, Redis designed the ziplist compressed list internal data structure. The ziplist is a continuous block of memory that can store multiple elements without redundancy. It is a sequential memory structure composed of continuous blocks of memory.
The structure of the ziplist is shown in the diagram, which mainly includes five parts:
zlbytes
is the total number of memory bytes occupied by the ziplist.zltail
is the number of bytes from the tail node to the start position.zllen
is the total number of nodes/memory blocks it contains.Entry
is the various data nodes stored in the ziplist, and the lengths of these data points are arbitrary.Zlend
is a magic number 255 used to mark the end of the ziplist.
As shown in the diagram, a ziplist containing 4 elements occupies a total of 100 bytes of memory. The pointer to the starting element of this ziplist is p
, and zltail
is 80. Therefore, the pointer to the 4th element is p+80
.
The structure of the storage node of the ziplist ziplist shown in the diagram has 6 fields:
prevRawLen
is the length of the previous node.preRawLenSize
is the number of bytes required to encodeprevRawLen
.len
is the length of the current node.lensize
is the number of bytes required to encodelen
.encoding
is the encoding type used by the current node.entryData
is the data of the current node.
Since the ziplist is stored in a continuous and compact manner without redundancy, inserting new elements requires reallocating and extending memory. Therefore, if the space occupied by the ziplist is too large, the cost of reallocating and copying will be high. Therefore, ziplist is not suitable for storing too many elements or too large strings.
Therefore, ziplist is only used as the internal data structure for hash and zset when the number of elements and values is not large. The limit for using ziplist as the internal data structure for hash is that the number of elements does not exceed 512 by default, and the value does not exceed 64 bytes by default. These two thresholds, hash_max_ziplist_entries
and hash_max_ziplist_value
, can be adjusted by modifying the configuration.
For sorted sets (zset), the limitation for using ziplist as the internal data structure is that the number of elements does not exceed 128 by default, and the value does not exceed 64 bytes by default. These two thresholds, zset_max_ziplist_entries
and zset_max_ziplist_value
, can be adjusted by modifying the configuration.
quicklist #
Redis introduced quicklist after version 3.2 to replace linkedlist. In linkedlist, each node occupies 16 bytes due to the presence of previous and next pointers, and each node is allocated memory separately, which easily exacerbates memory fragmentation. Ziplist, on the other hand, is not suitable for storing too many elements or values that are too large because adding elements requires reallocating memory and deleting elements requires memory copying.
Quicklist, a rapid list, was created to address these issues. It is a doubly-linked list based on ziplist. Data is stored in ziplists that are then connected using double pointers. The structure of quicklist is shown in the diagram.
head
andtail
are two pointers that point to the first and last ziplist nodes, respectively.count
represents the total number of elements in the quicklist.len
represents the number of ziplist nodes.compress
represents the LZF compression depth.
In the quicklist, the management of ziplists is done by the quicklistNode
structure. quicklistNode
mainly contains a prev/next
double pointer and a ziplist node. A single ziplist node can hold multiple elements.
Reading and writing data from the head and tail of the quicklist is fast, with a time complexity of O(1). It also supports inserting or reading/writing elements from any position within the quicklist, but the speed is slower, with a time complexity of O(n). Currently, the quicklist is mainly used as an internal data structure for lists.
zskiplist #
A skip list, zskiplist
, is an ordered data structure that speeds up access by maintaining multiple pointers to other nodes at each node. The skip list supports average O(logN) and worst-case O(n) complexity for node lookup. In most scenarios, the efficiency of a skip list is similar to that of a balanced tree, but the implementation of a skip list is simpler, so many programs use skip lists instead of balanced trees.
A skip list consists of a zskipList
and zskiplistNode
nodes. The structure of zskipList
is shown in the diagram. header
points to the head node of the skip list, and tail
points to the tail node. length
represents the length of the skip list, which is the number of nodes in the skip list excluding the head node. level
represents the maximum level among all nodes in the skip list, excluding the head node.
The structure of a zskiplistNode
in the skip list is shown in the diagram. ele
represents the value of the node and, in the case of a sorted set, it represents the field element in the set. score
represents the score of the node, which determines the order of nodes in the skip list from smallest to largest. backward
is a pointer that points to the previous node of the current node. level
represents the level of the node, and each node generally has multiple levels. Each level has two properties: forward
and span
. forward
is a forward pointer used to point to the node in the direction of the tail, and span
is the distance from the forward
pointer to the current node.
The diagram shows an example of a skip list with three nodes. The corresponding element values are S1, S2, and S3, and the corresponding score values are 1.0, 3.0, and 5.0, respectively. The maximum level of the S3 node is 5, and the level of the skip list is also 5. header
points to the head node, and tail
points to the tail node. While searching for an element, the sum of spans on the path gives the position of the element. In a skip list, elements must be unique, but scores can be the same. Different elements with the same score are sorted in lexicographic order.
In the sorted set data type, if the number of elements is large or the elements are large in size, Redis chooses to use a skip list as the internal data structure of the sorted set.
Redis mainly uses the following internal data structures for the previously mentioned eight data types:
- For the
string
type, Redis mainly usessds
for storage. - For the
list
type, Redis usesquicklist
for storage. - For the
set
type, Redis usesdict
for storage. - For the
sorted set
type, if the elements are fewer than 128 and the length of the elements is smaller than 64, Redis usesziplist
for storage; otherwise, it useszskiplist
for storage. - For the
hash
type, if the number of elements is fewer than 512 and the length of the elements is smaller than 64, Redis usesziplist
for storage; otherwise, it usesdict
for storage. - For
hyperloglog
, Redis usessds
(simple dynamic string) for storage. - For
geo
, if the number of locations is fewer than 128, Redis usesziplist
for storage; otherwise, it useszskiplist
for storage. - For
bitmap
, Redis usessds
(simple dynamic string) for storage.
Apart from these main internal data structures, Redis also uses other internal structures for specific scenarios. For example, if the strings being operated on are integers and the commands are incr
, decr
, etc., Redis stores the strings as long integers. However, due to time constraints, these scenarios will not be further elaborated on.