08 Dictionary Usage and Internal Implementation Principles

08 Dictionary Usage and Internal Implementation Principles #

The dictionary type (Hash) is also known as the scatter type or hash table type. It associates a key with a special “hash table” that contains two columns of data: fields and values. For example, we can use the dictionary type to store detailed information about an article, and the storage structure is shown in the diagram below: Hash table storage structure.png Similarly, we can also use the dictionary type to store user information. When using the dictionary type to store such information, there is no need to manually serialize and deserialize data, making it more convenient and efficient to use.

1. Basic Usage #

First, we use the command line tool redis-cli to perform related operations on dictionary types.

1) Inserting a Single Element #

Syntax: hset key field value Example:

127.0.0.1:6379> hset myhash key1 value1
(integer) 1
127.0.0.1:6379> hset myhash key2 value2
(integer) 1

2) Inserting Data When a Key Does Not Exist #

Syntax: hsetnx key field value Example:

127.0.0.1:6379> hsetnx myhash k4 v4
(integer) 1
127.0.0.1:6379> hget myhash k4
"v4"

If you attempt to insert a key that already exists, the original value will not be changed, as shown in the following example:

127.0.0.1:6379> hsetnx myhash k4 val4
(integer) 0
127.0.0.1:6379> hget myhash k4
"v4"

Attempting to modify the existing key “k4” with the value “val4” does not take effect, and the result of querying “k4” is still the original value “v4”.

3) Querying a Single Element #

Syntax: hget key field Example:

127.0.0.1:6379> hget myhash key1
"value1"

4) Deleting one or more elements from a key #

Syntax: hdel myhash field [field …] Example:

127.0.0.1:6379> hdel myhash key1 key2
(integer) 1

Note: You cannot use a command like hdel myhash to delete the entire Hash value.

5) Accumulating a certain integer value #

Syntax: hincrby key field increment Example:

127.0.0.1:6379> hset myhash k3 3
(integer) 1
127.0.0.1:6379> hincrby myhash k3 2
(integer) 5
127.0.0.1:6379> hget myhash k3
"5"

For more commands, please refer to the appendix.

2. Code Implementation #

Next, we will use Java code to implement operations on Redis. Similarly, we first introduce the Jedis framework, and then use code to operate on dictionary types. The sample code is as follows:

import redis.clients.jedis.Jedis;
import java.util.Map;

public class HashExample {
    public static void main(String[] args) throws InterruptedException {
        Jedis jedis = new Jedis("127.0.0.1", 6379);
        // Define the Key value as a variable
        final String REDISKEY = "myhash";
        // Insert a single element
        jedis.hset(REDISKEY, "key1", "value1");
        // Query a single element
        Map<String, String> singleMap = jedis.hgetAll(REDISKEY);
        System.out.println(singleMap.get("key1"));  // Output: value1
        // Query all elements
        Map<String, String> allMap = jedis.hgetAll(REDISKEY);
        System.out.println(allMap.get("k2")); // Output: val2
        System.out.println(allMap); // Output: {key1=value1, k1=val1, k2=val2, k3=9.2, k4=v4...}
        // Delete a single element
        Long delResult = jedis.hdel(REDISKEY, "key1");
        System.out.println("Deletion result: " + delResult);    // Output: Deletion result: 1
        // Query a single element
        System.out.println(jedis.hget(REDISKEY, "key1")); // Output: Returns null
    }
}


From the code, we can see that in Jedis, we can directly use a Map to receive dictionary data read from Redis, which eliminates the hassle of manual conversion and is quite convenient.

### 3. Data Structure

The dictionary type is essentially composed of an array and a linked list structure. Let's take a look at the source code implementation of the dictionary type:

```c
typedef struct dictEntry {
    void *key;
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next; // next entry
} dictEntry;

The data structure of the dictionary type is shown in the figure below:

In most cases, the dictionary type uses an array to store related data, but a linked list structure is used to store data in the event of a hash collision.

4. Hash Collision #

The storage process of the dictionary type first calculates the hash value of the key, obtains the array index corresponding to the key-value pair being stored, and then stores the data based on the array index. However, in rare events, different key-value pairs may result in the same hash value after hash calculation. This situation is called hash collision.

Hash collisions are generally resolved in the form of linked lists. The same hash value corresponds to a linked list structure. When a hash collision occurs, the new element is inserted at the end of the linked list. Please refer to the above data structure diagram.

The process of key-value pair lookup is as follows:

Obtain the array index through algorithmic operations (e.g., hash calculation and modulo operation), and find the corresponding element based on the index.
Check if the element is equal to the searched key-value. If it is equal, successfully return the data; otherwise, check if the next pointer points to another corresponding element. If it doesn’t, return null. If it does, repeat this step.

The process of key-value pair lookup is shown in the figure below:

5. Progressive Rehashing #

To ensure high-performance operation of applications, Redis provides an important mechanism called progressive rehashing. Progressive rehashing is used to ensure the efficiency of dictionary resizing, meaning that progressive rehashing is performed during dictionary expansion or contraction.

1) Expansion #

When the number of elements is equal to the length of the array, an expansion operation is performed. The core code is as follows:

int dictExpand(dict *d, unsigned long size) {
    if (dictIsRehashing(d) || d->ht[0].used > size)
        return DICT_ERR;
    dictht n;
    unsigned long realsize = _dictNextPower(size);
    if (realsize == d->ht[0].size) return DICT_ERR;
    n.size = realsize;
    n.sizemask = realsize-1;
    n.table = zcalloc(realsize*sizeof(dictEntry*));
    n.used = 0;
    if (d->ht[0].table == NULL) {
        d->ht[0] = n;
        return DICT_OK;
    }
    d->ht[1] = n;
    d->rehashidx = 0;
    return DICT_OK;
}

From the above source code, we can see that if an expansion is needed, a new memory address is allocated and assigned to ht[1], and the rehashidx of the dictionary is set to 0, indicating that rehashing will be performed later.

2) Contraction #

When the usage capacity of the dictionary is less than 10% of the total space, contraction is triggered. Redis also sets rehashidx to 0 when performing contraction, indicating that rehashing will be performed later.

3) Progressive Rehashing Process #

During progressive rehashing, two hash structures are kept at the same time. When a new key-value pair is added, it is directly inserted into the new hash structure, and the elements in the old hash structure are gradually moved to the new hash structure. When the last element is removed, the old hash structure is cleared. The main execution process is as follows:

When expanding or contracting, the rehashidx field in the dictionary is set to 0.
When executing timed tasks or client instructions such as hset and hdel, determine if rehashing needs to be triggered (by checking the rehashidx field). If rehashing needs to be triggered, call the dictRehash function, which adds elements from ht[0] to the new hash table ht[1].
After rehashing is complete, clear the hash table ht[0], swap the values of ht[1] and ht[0], and set the rehashidx field of the dictionary to -1, indicating that rehashing does not need to be performed.

6. Use Cases #

The typical use cases of Hash dictionaries are as follows:

Shopping carts: Hash dictionaries are suitable for representing shopping carts. Use unique user IDs as keys in the dictionary, and the value can store information such as the ID and quantity of the products.
Storing user attribute information: Use unique user IDs as keys in the dictionary, and the value is the attribute field and its corresponding value.
Storing article details, etc.

7. Summary #

In this article, we learned about the operations of dictionary types and their use in code. We also understand that dictionary types are actually composed of an array and a linked list. When resizing a dictionary, Redis performs progressive rehashing, which is used to ensure the efficiency of Redis. The process of progressive rehashing is as follows: two hash tables are kept simultaneously, and the elements in the old table are gradually moved to the new table. When querying, both hash tables are searched. After all elements have been moved to the new hash table, the old hash table is deleted.