06 String Usage and Internal Implementation Principles

06 String Usage and Internal Implementation Principles #

Redis has developed 9 data types, among which the most basic and commonly used data types are 5, namely: string type, list type, hash table type, set type, and sorted set type. Among these 5 data types, the most commonly used one is the string type, so in this article, let’s start with the usage of strings.

The full name of the string type is Simple Dynamic Strings, abbreviated as SDS. It is stored in the form of key-value pairs, and the value is stored and retrieved based on the key. Its usage is relatively simple, but it is widely used in practical projects.

1 What Can You Do with String Type? #

There are many use cases for the string type, but from a functional perspective, they can be roughly divided into the following two categories:

  • String storage and manipulation;
  • Integer and floating-point storage and calculation.

The most common business scenarios for strings are as follows.

1) Page Data Caching #

As we know, the most valuable resource of a system is the database resource. With the development and expansion of a company’s business, the storage capacity of the database will also increase, and the number of requests to handle will also increase. When the data volume and concurrency reach a certain level, the database becomes the “culprit” that slows down the system. To avoid this situation, we can store the query results in a cache system (Redis), so that the next time the same query is made, the result is retrieved directly from the cache system instead of querying the database. This reduces the pressure on the database and improves the speed of program execution.

Based on this idea, we can put the data of an article detail page into the cache system. The specific approach is to serialize the article detail page into a string and store it in the cache, then retrieve the string from the cache, deserialize it into an object, and assign it to the page for display (of course, hash types can also be used for storage, which will be discussed in the next article). This way, we achieve the caching function for the article detail page. The comparison between the architecture processes is shown in the following figure.

Original system workflow: String Usage-1.png

Workflow after introducing the cache system: String Usage-2.png

2) Number Calculation and Statistics #

Redis can be used to store integer and floating-point data and can directly accumulate and store integer information using commands, thus eliminating the need to fetch data, convert data, concatenate data, and store data every time. Only one command is needed to complete this process. The specific implementation process will be discussed in the second half of this article. With this feature, we can use it to implement traffic statistics, simply incrementing the traffic when someone accesses it.

3) Sharing Session Information #

Usually, when developing a backend management system, we use sessions to save user session (login) status, and these session information is saved on the server-side. However, this only applies to single-system applications. If it is a distributed system, this pattern would no longer be applicable.

For example, the session information of user A is stored on Server 1, but when user A is allocated to Server 2 in the second visit, Server 2 does not have user A’s session information, which will result in the need to log in again. In a distributed system, each request is randomly assigned to different servers, so we need to use a cache system to store and manage these session information uniformly. This way, no matter which server the request is sent to, the server will go to the same cache system to get the relevant session information, solving the problem of session storage in a distributed system.

Workflow of storing sessions in a distributed system separately: String Usage-3.png

Workflow of storing sessions in a distributed system using the same cache system: String Usage-4.png

2 How to Use Strings? #

Usually, we use two ways to operate Redis: one is to use the command-line interface, such as redis-cli; the other is to use code. Let’s take a look at each of them.

1) Command-Line Operation #

There are many commands for string operations, but they can be roughly divided into the following categories:

  • Single key-value pair operations
  • Multiple key-value pair operations
  • Number statistics

In this article, we use redis-cli to operate Redis. Before using a command, type redis-cli to connect to the Redis server.

① Single key-value pair operations #
a. Adding a key-value pair #

Syntax: set key value [expiration EX seconds|PX milliseconds] [NX|XX] Example:

127.0.0.1:6379> set k1 val1
OK
b. Retrieving a key-value pair #

Syntax: get key Example:

127.0.0.1:6379> get k1
"val1"
c. Appending a value to an element #

Syntax: append key value Example:

127.0.0.1:6379> get k1
"v1"
127.0.0.1:6379> append k1 append
(integer) 5
127.0.0.1:6379> get k1
"v1append"
d. Retrieving the length of a string #

Syntax: strlen key Example:

127.0.0.1:6379> strlen k1
(integer) 5
② Multiple key-value pair operations #
a. Creating one or more key-value pairs #

Syntax: mset key value [key value …] Example:

127.0.0.1:6379> mset k2 v2 k3 v3
OK

Tip: mset is an atomic operation, and all the given keys will be set at the same time. There will be no cases where some keys are updated while others are not.

b. Retrieving one or more elements #

Syntax: mget key [key …] Example:

127.0.0.1:6379> mget k2 k3
    1) "v2"
    2) "v3"

##### ③ Number Operations

In Redis, you can directly operate on integers and floating-point numbers, such as using commands to add or subtract values.

###### a. Increment an integer value by 1

Syntax: incr key Example:
    
    127.0.0.1:6379> get k1
    "3"
    127.0.0.1:6379> incr k1
    (integer) 4
    127.0.0.1:6379> get k1
    "4"

###### b. Decrement an integer value by 1

Syntax: decr key Example:
    
    127.0.0.1:6379> get k1
    "4"
    127.0.0.1:6379> decr k1
    (integer) 3
    127.0.0.1:6379> get k1
    "3"

###### c. Decrement a key by a specified value

Syntax: decrby key decrement Example:
    
    127.0.0.1:6379> get k1
    "3"
    127.0.0.1:6379> decrby k1 2
    (integer) 1
    127.0.0.1:6379> get k1
    "1"

If the key does not exist, it will be initialized to 0 and then the subtraction operation will be performed:
    
    127.0.0.1:6379> get k2
    (nil)
    127.0.0.1:6379> decrby k2 3
    (integer) -3
    127.0.0.1:6379> get k2
    "-3"

###### d. Increment a key by a specified integer value

Syntax: incrby key increment Example:
    
    127.0.0.1:6379> get k1
    "1"
    127.0.0.1:6379> incrby k1 2
    (integer) 3
    127.0.0.1:6379> get k1
    "3"

If the key does not exist, it will be initialized to 0 and then the addition of the integer value will be performed:
    
    127.0.0.1:6379> get k3
    (nil)
    127.0.0.1:6379> incrby k3 5
    (integer) 5
    127.0.0.1:6379> get k3
    "5"

###### e. Increment a key by a specified float value

Syntax: incrbyfloat key increment Example:
    
    127.0.0.1:6379> get k3
    "5"
    127.0.0.1:6379> incrbyfloat k3 4.9
    "9.9"
    127.0.0.1:6379> get k3
    "9.9"

If the key does not exist, it will be initialized to 0 and then the addition of the float value will be performed:
    
    127.0.0.1:6379> get k4
    (nil)
    127.0.0.1:6379> incrbyfloat k4 4.4
    "4.4"
    127.0.0.1:6379> get k4
    "4.4"

For more commands, please refer to the appendix.

#### 2) Code Operations

In this article, we will use the Java language to implement operations on Redis. First, we need to add a reference to the Jedis framework in our project. If it is a Maven project, we will add the following information to the pom.xml file:
    
    <dependency>
      <groupId>redis.clients</groupId>
      <artifactId>jedis</artifactId>
      <version>${version}</version>
    </dependency>
    
    

Jedis is the recommended Java client development package for Redis. It is used to implement fast and simple operations with Redis. After adding Jedis, let's write the specific operation code. The operation functions are similar to command calls, as shown in the following code:

    import redis.clients.jedis.Jedis;
    import java.util.List;
    
    public class StringExample {
        public static void main(String[] args) {
            Jedis jedis = new Jedis("127.0.0.1", 6379);
            // jedis.auth("xxx"); // Enter password, if there is no password, it can be left unset
            // Add an element
            jedis.set("mystr", "redis");

            // Get the element
            String myStr = jedis.get("mystr");
            System.out.println(myStr); // Output: redis

            // Add multiple elements (key, value, key2, value2)
            jedis.mset("db", "redis", "lang", "java");

            // Get multiple elements
            List<String> mlist = jedis.mget("db", "lang");
            System.out.println(mlist);  // Output: [redis, java]

            // Append a string to the element
            jedis.append("db", ",mysql");

            // Print the appended string
            System.out.println(jedis.get("db")); // Output: redis,mysql

            // Assign a value to the key when the key does not exist
            Long setnx = jedis.setnx("db", "db2");

            // Since the db element already exists, 0 modifications will be returned
            System.out.println(setnx); // Output: 0

            // String slicing
            String range = jedis.getrange("db", 0, 2);
            System.out.println(range); // Output: red

            // Add a key and set an expiration time (in milliseconds)
            String setex = jedis.setex("db", 1000, "redis");
            System.out.println(setex); // Output: ok

            // Query the expiration time of the key
            Long ttl = jedis.ttl("db");
            System.out.println(ttl); // Output: 1000
        }
    }
    
    

### 3 Code Implementation

In the first part of this article, we talked about many use cases for strings. In this section, we will use an example of storing user object information in a string in Redis, and then retrieve and deserialize the string into object information using Java.

First, add the JSON conversion class to handle serialization and deserialization between objects and strings. Here, we will use Google's Gson library to implement this. To begin, add the following dependency in the pom.xml file:

    <!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
    <dependency>
      <groupId>com.google.code.gson</groupId>
      <artifactId>gson</artifactId>
      <version>2.8.6</version>
    </dependency>
    
After adding the Gson dependency, we can write the specific business logic. First, we serialize the user object information into a string and store it in Redis:

    Jedis jedis = new Jedis("xxx.xxx.xxx.xxx", 6379);
    jedis.auth("xxx");
    Gson gson = new Gson();

    // Build user data
    User user = new User();
    user.setId(1);
    user.setName("Redis");
    user.setAge(10);

    String jsonUser = gson.toJson(user);

    // Print user information (json)
    System.out.println(jsonUser); // Output: {"id":1,"name":"Redis","age":10}

    // Store the string in Redis
    jedis.set("user", jsonUser);
    
When we need to use the user information, we can deserialize it from Redis. The code is as follows:

    String getUserData = jedis.get("user");
    User userData = gson.fromJson(getUserData, User.class);

    // Print object property information
    System.out.println(userData.getId() + ":" + userData.getName()); // Output: 1:Redis
    
The above two steps complete the process of storing user information in Redis, which is also one of the commonly used classic use cases.

### 4 Internal Implementation of Strings

#### 1) Source Code Analysis

Prior to Redis 3.2, the source code of SDS was as follows:

    struct sds{
        int len; // Number of occupied bytes
        int free; // Number of remaining bytes
        char buf[]; // Storage space for the string data
    }
    
It can be seen that prior to Redis 3.2, SDS was an array of bytes with length information, and the storage structure is as shown in the following figure:

![String Storage Structure.png](../images/2020-02-28-031222.png)

To more effectively utilize memory, Redis 3.2 optimized the storage structure of SDS. The source code is as follows:

    typedef char *sds;
    
    struct __attribute__ ((__packed__)) sdshdr5 { // Corresponding to a string length less than 1<<5
        unsigned char flags;
        char buf[];
    };
    struct __attribute__ ((__packed__)) sdshdr8 { // for strings with length less than 1<<8
        uint8_t len; /* length used, stored in 1 byte */
        uint8_t alloc; /* total length */
        unsigned char flags; 
        char buf[]; // data space to store the string
    };
    struct __attribute__ ((__packed__)) sdshdr16 { // for strings with length less than 1<<16
        uint16_t len; /* length used, stored in 2 bytes */
        uint16_t alloc; 
        unsigned char flags; 
        char buf[];
    };
    struct __attribute__ ((__packed__)) sdshdr32 { // for strings with length less than 1<<32
        uint32_t len; /* length used, stored in 4 bytes */
        uint32_t alloc; 
        unsigned char flags; 
        char buf[];
    };
    struct __attribute__ ((__packed__)) sdshdr64 { // for strings with length less than 1<<64
        uint64_t len; /* length used, stored in 8 bytes */
        uint64_t alloc; 
        unsigned char flags; 
        char buf[];
    };
    
    

This allows different storage types to be allocated for strings of different lengths, effectively saving memory usage.

#### 2) Data Types

We can use the `object encoding key` command to check the data type of an object (key-value pair). When we use this command to query an SDS object, we find that the SDS object contains three different data types: int, embstr, and raw.

##### ① int type
    
    
    127.0.0.1:6379> set key 666
    OK
    127.0.0.1:6379> object encoding key
    "int"
    
    

##### ② embstr type
    
    
    127.0.0.1:6379> set key abc
    OK
    127.0.0.1:6379> object encoding key
    "embstr"
    
    

##### ③ raw type
    
    
    127.0.0.1:6379> set key abcdefghigklmnopqrstyvwxyzabcdefghigklmnopqrs
    OK
    127.0.0.1:6379> object encoding key
    "raw"
    
    

The int type is easy to understand, it corresponds to the int type, while the string corresponds to the embstr type. When the length of the string is greater than 44 bytes, it will be stored as the raw type.

#### 3) Why 44 bytes?

In Redis, if the storage value of an SDS is greater than 64 bytes, the Redis memory allocator will consider the object as a large string and store it as the raw type. When the data is smaller than 64 bytes (string type), it will be stored as the embstr type. Since the memory allocator's judgment criterion is 64 bytes, why is the storage judgment value for the embstr type and raw type 44 bytes?

This is because when Redis stores an object, it creates associated information for this object, such as the redisObject object header and the attributes of the SDS itself. These information will occupy a certain amount of storage space, so the length judgment criterion becomes 44 bytes instead of 64 bytes.

In Redis, all objects will include the redisObject object header. Let's take a look at the source code of redisObject:
    
    
    typedef struct redisObject {
        unsigned type:4; // 4 bits
        unsigned encoding:4; // 4 bits
        unsigned lru:LRU_BITS; // 3 bytes
        int refcount; // 4 bytes
        void *ptr; // 8 bytes
    } robj;
    
    

The parameter description of redisObject is as follows:

  * type: the data type of the object, such as string, list, hash, etc., occupies 4 bits, which is half a character in size;
  * encoding: the encoding of the object data, occupies 4 bits;
  * lru: records the LRU (Least Recently Used) information of the object, which is used for memory reclamation. It occupies 24 bits (3 bytes);
  * refcount: the reference count, occupies 32 bits (4 bytes);
  * *ptr: the object pointer used to point to the specific content, occupies 64 bits (8 bytes).



The redisObject occupies a total of 0.5 bytes + 0.5 bytes + 3 bytes + 4 bytes + 8 bytes = 16 bytes.

After understanding redisObject, let's take a look at the data structure of the SDS itself. From the source code of SDS, we can see that there are a total of 5 storage types for SDS: SDS*TYPE*5, SDS*TYPE*8, SDS*TYPE*16, SDS*TYPE*32, SDS*TYPE*64. Among these types, the smallest storage type is SDS*TYPE*5, but SDS*TYPE*5 will be automatically converted to SDS*TYPE*8. The source code below can prove this, as shown in the following diagram: ![SDS-0116-1.png](../images/2020-02-28-031223.png)

Now let's look at the source code of SDS*TYPE*8:
    
    
    struct __attribute__ ((__packed__)) sdshdr8 {
        uint8_t len; // 1 byte
        uint8_t alloc; // 1 byte
        unsigned char flags; // 1 byte
        char buf[];
    };
    
    

It can be seen that except for the content array (buf), the other three attributes occupy 1 byte each. The final separator character is 64 bytes, subtracting the 16 bytes of redisObject, then subtracting the 3 bytes of SDS itself, and subtracting the 1 byte of the null terminator `\0`, the final result is 44 bytes (64-16-3-1=44). The memory usage is shown in the following diagram:

![44字节说明图.png](../images/2020-02-28-031224.png)

### Summary

This article introduces the definition and usage of strings, which can be used for single key-value operations, multi-key-value operations, number counting, key-value expiration operations, and advanced string operations. It also introduces the three scenarios where strings can be used: page data caching, for caching article details, etc.; number calculation and statistics, such as calculating the number of page visits; and session sharing, for recording administrator login information, etc. We also delve into the five data storage structures of strings, as well as the three internal data types of strings, as shown in the following diagram:

![字符串总结图.png](../images/2020-02-28-031225.png)

In addition, we also learned that the conversion from embstr type to raw type is because each Redis object includes a redisObject object header and the SDS itself takes up some space, which ultimately results in a data type judgment length of 44 bytes.