12 Set Usage and Internal Implementation Principles

12 Set Usage and Internal Implementation Principles #

The Set data type is an unordered collection of unique key-value pairs.

The reason why sets are considered unordered collections is that their storage order does not follow the order of insertion, as shown in the following code:

127.0.0.1:6379> sadd myset v2 v1 v3 # Insert values v2, v1, and v3
(integer) 3
127.0.0.1:6379> smembers myset # Query values
1) "v1"
2) "v3"
3) "v2"

From the above code execution results, it can be seen that the storage order of myset is not based on the order of insertion.

The differences between set and list types are as follows:

  • Lists can store duplicate elements, while sets can only store unique elements.
  • Lists store elements in the order of insertion, while sets store elements in an unordered manner.

1 Basic Usage #

The Set data type has more functionalities than the List data type. Sets can be used to calculate the intersection, difference, and union of multiple sets, as shown in the following code.

1) Add one or more elements #

Syntax: sadd key member [member …] Example:

127.0.0.1:6379> sadd myset v1 v2 v3
(integer) 3

2) Query all elements in the set #

Syntax: smembers key Example:

127.0.0.1:6379> smembers myset
1) "v1"
2) "v3"
3) "v2"

3) Query the number of members in the set #

Syntax: scard key Example:

127.0.0.1:6379> scard myset
(integer) 3

4) Check if a set contains a specific element #

Syntax: sismember key member Example:

127.0.0.1:6379> sismember myset v1
(integer) 1
127.0.0.1:6379> sismember myset v4
(integer) 0

5) Move an element from one set to another #

Syntax: smove source destination member Example:

127.0.0.1:6379> smembers myset
1) "v1"
2) "v3"
3) "v2"
127.0.0.1:6379> smembers myset2
1) "v1"
2) "v8"
127.0.0.1:6379> smove myset myset2 v3
(integer) 1
127.0.0.1:6379> smembers myset2
1) "v1"
2) "v8"
3) "v3"
127.0.0.1:6379> smembers myset
1) "v1"
2) "v2"

6) Remove one or more elements from a set #

Syntax: srem key member [member …] Example:

127.0.0.1:6379> smembers myset
1) "v4"
2) "v1"
3) "v3"
4) "v2"
5) "v5"
127.0.0.1:6379> srem myset v5
(integer) 1
127.0.0.1:6379> smembers myset
1) "v3"
2) "v2"
3) "v1"
4) "v4"

Note: When using the srem command, non-existent elements will be ignored. For more commands, please refer to the appendix.

2 Code Practice #

Let’s take a look at the usage of Set in Java. First, add the Jedis framework using the following code:

import redis.clients.jedis.Jedis;
import java.util.Set;

public class SetExample {
    public static void main(String[] args) {
        Jedis jedis = new Jedis("xxx.xxx.xxx.xxx", 6379);
        jedis.auth("xxx");
        // Create a set and add elements
        jedis.sadd("set1", "java", "golang");
        // Query all elements in the set
        Set<String> members = jedis.smembers("set1");
        System.out.println(members); // Output: [java, golang]
        // Query the number of elements in the set
        System.out.println(jedis.scard("set1"));
        // Remove an element from the set
        jedis.srem("set1", "golang");
        System.out.println(jedis.smembers("set1")); // Output: [java]
        // Create set2 and add elements
        jedis.sadd("set2", "java", "golang");
        // Query the intersection of two sets
        Set<String> inters = jedis.sinter("set1", "set2");
        System.out.println(inters); // Output: [java]
        // Query the union of two sets
        Set<String> unions = jedis.sunion("set1", "set2");
        System.out.println(unions); // Output: [java,golang]
        // Query the difference of two sets
        Set<String> diffs = jedis.sdiff("set2", "set1");
        System.out.println(diffs); // Output: [golang]
    }
}

3 Implementation #

A set in Redis is composed of an intset (integer set) or hashtable. When a set is stored using a hashtable, the key of the hashtable is the element value to be inserted, and the value of the hashtable is NULL. The following image illustrates this: set-set-hashtable.png

When all the values in a set are integers, Redis uses the intset structure to store the set, as shown in the following code:

127.0.0.1:6379> sadd myset 1 9 3 -2
(integer) 4
127.0.0.1:6379> object encoding myset
"intset"

From the above code, we can see that Redis uses the intset structure to store the set when all the elements are integers. However, there are two situations in which Redis will use a hashtable instead of an intset to store a set: 1) when the number of elements exceeds a certain threshold (default is 512, but can be configured using the command set-max-intset-entries xxx); 2) when the elements are non-integer values. The following code illustrates the use of a hashtable:

127.0.0.1:6379> sadd myht "redis" "db"
(integer) 2
127.0.0.1:6379> object encoding myht
"hashtable"

From the above code, we can see that Redis uses a hashtable to store the set when the elements are non-integer values.

4 Source Code Analysis #

The source code for sets can be found in the t_set.c file. The core code is as follows:

/* 
 * Add an element to the set
 * If the value already exists, return 0 without making any changes. Otherwise, add the element and return 1.
 */
int setTypeAdd(robj *subject, sds value) {
    long long llval;
    if (subject->encoding == OBJ_ENCODING_HT) { // Hashtable encoding
        dict *ht = subject->ptr;
        dictEntry *de = dictAddRaw(ht,value,NULL);
        if (de) {
            // Use the value as the key and NULL as the value to store the element in the dictionary
            dictSetKey(ht,de,sdsdup(value));
            dictSetVal(ht,de,NULL);
            return 1;
        }
    } else if (subject->encoding == OBJ_ENCODING_INTSET) { // Intset encoding
        if (isSdsRepresentableAsLongLong(value,&llval) == C_OK) {
            uint8_t success = 0;
            subject->ptr = intsetAdd(subject->ptr,llval,&success);
            if (success) {
                // If the number of elements exceeds the maximum allowed by intset, convert to hashtable encoding
                if (intsetLen(subject->ptr) > server.set_max_intset_entries)
                    setTypeConvert(subject,OBJ_ENCODING_HT);
                return 1;
            }
        } else {
            // Convert to hashtable encoding if conversion to integer fails
            setTypeConvert(subject,OBJ_ENCODING_HT);

            serverAssert(dictAdd(subject->ptr,sdsdup(value),NULL) == DICT_OK);
            return 1;
        }
    } else {
        // Unknown encoding
        serverPanic("Unknown set encoding");
    }
    return 0;
}

The above code confirms what we mentioned earlier: when all the elements are integers and the number of elements does not exceed the maximum value set, the set will use the intset data structure for storage. When the number of elements exceeds the maximum or the elements are non-integer values, the set will use the hashtable data structure for storage.

5 Use Cases #

Some classic use cases for set types include:

  • Storing people who follow me on Weibo and people I follow, to ensure that individuals are not repeated.
  • Storing information about winners to ensure that a person does not win multiple times.

6 Conclusion #

Through this article, we have learned that set types in Redis are composed of either integer sets (intset) or hashtables. Set types are suitable for data deduplication and ensuring data uniqueness. In addition, set types can also be used to calculate the intersection, difference, and union of multiple sets (see the appendix). When storing unordered and deduplicated data, set types are a good choice for storage.