16 How Does the Commonly Used Caching Component Redis Operate

16 How Does the Commonly Used Caching Component Redis Operate #

Hello, I am your caching course teacher, Chen Bo. Welcome to the 16th lesson on “Redis Basic Principles”.

Redis Basic Principles #
Introduction to Redis #

Redis is a log-based key-value storage component written in ANSI C language under the BSD license. All of its data structures are stored in memory and it can be used as a cache, database, or message broker.

Redis stands for Remote dictionary server, and a Redis instance can have multiple dictionaries for storing data. Clients can use the “select” command to choose a dictionary (DB) for data storage.

Redis Features #

As key-value storage components, Memcached only supports binary byte blocks as the data type. Redis, on the other hand, has a much richer set of data types. It has 8 core data types, each corresponding to a series of operation instructions. Redis has high performance and can achieve a QPS (Queries Per Second) of 10-11w in single-threaded stress testing.

Although all read and write operations in Redis are performed in memory, data can also be persisted to disk. Redis provides two types of persistence:

  • Snapshot: Writes all data at a certain point in time to the RDB (Redis Database) file on disk.
  • Append-only file (AOF): Appends all write commands to the AOF file on disk.

For online Redis instances, it is common to use both methods. By enabling the “appendonly” configuration option, write commands can be added to the AOF file in a timely manner. Additionally, during low traffic periods (e.g., during the night), a “bgsave” command can be used to save a snapshot of all the data in memory.

In Internet systems, the number of read operations is often much greater than the number of write operations. For example, in a microblogging system, read requests account for about 90% of the total traffic. A large number of read requests usually exceed the capacity of Redis. In such cases, Redis’s replication feature can be used. One Redis instance acts as the master and multiple synchronized replicas, called slaves, are mounted. By separating read and write operations, all write operations are directed to the master, while read operations are randomly distributed among the multiple slaves, greatly improving Redis’s read and write capabilities.

Lua is an efficient, concise, and easily extensible scripting language that can be embedded in other languages. Redis has supported Lua since version 2.6. By supporting client-side custom Lua scripts, Redis can reduce network overhead, improve processing performance, and perform multiple operations within a script as an atomic operation.

Redis also supports transactions. After issuing the “multi” command, multiple operations can be specified, and then they can be executed all at once using the “exec” command. If an exception occurs during execution, none of the commands will be executed. Otherwise, all operations will be executed in order, and no other instructions will be executed during the process.

Redis also has the Cluster feature, which can automatically or manually distribute all keys to different nodes using hashing. When the capacity is insufficient, one can use the Redis migration command to migrate some of the keys to other nodes.

img

For the features of Redis, you can have a preliminary understanding through this mind map. In the following lessons, I will explain each feature in detail.

As a caching component, Redis’s greatest advantage is its support for various data types. Currently, Redis supports 8 core data types, including string, list, set, sorted set, hash, bitmap, geo, and hyperloglog.

All of Redis’s in-memory data structures are stored in a global dictionary (dict), which is similar to Memcached’s hashtable. Redis’s dict also has two hash tables. When inserting a new key, the 0th hash table is generally used. As keys are inserted or deleted, if the number of keys in the 0th hash table exceeds the number of hash table buckets or falls below 1/10th of the bucket size, the hash table will be resized. In dict, the method of resolving conflicts in the hash table is the same as Memcached, which is to use a singly linked list inside the bucket to point to multiple key/value data with the same hash.

High Performance of Redis #

Redis is generally regarded as a single-process/single-threaded component because its network IO and command processing are handled by a single thread in the core process. Redis is developed based on the Epoll event model, which allows for non-blocking network IO. Due to single-threaded command processing, there is no competition during the entire processing process, so there is no need for locking, and there is no context switch overhead. All data operations are performed in memory, so Redis has high performance, and a single instance can reach a QPS of more than 100,000. In addition to handling network IO and command processing, the core thread is also responsible for writing data to the buffer for easy synchronization of the latest write operations to the AOF and slave.

In addition to the main process, Redis also forks a child process to handle heavy tasks. Redis forks child processes in three main scenarios.

  • When receiving the bgrewriteaof command, Redis calls fork to build a child process that writes all the commands to rebuild the database state to a temporary AOF file. When the writing is completed, the child process notifies the parent process, which appends the newly added write operations to the temporary AOF file. The parent process then replaces the old AOF file with the temporary file and renames it.
  • When receiving the bgsave command, Redis creates a child process that persists all the data in memory by taking a snapshot and writes it to an RDB file.
  • When performing full replication, the master also starts a child process that saves the database snapshot to an RDB file. After writing the RDB snapshot file, the master sends it to the slave and synchronizes the subsequent new write commands to the slave.

img

In the main process, in addition to the main thread handling network IO and command operations, there are three auxiliary BIO threads. These three BIO threads are responsible for processing tasks such as file closing, flushing AOF buffer data to disk, and cleaning objects in three task queues.

When Redis starts, these three BIO threads also start, and the BIO threads sleep and wait for tasks. When a relevant type of background task needs to be executed, a bio_job structure is constructed to record the task parameters, and then the bio_job is appended to the end of the task queue. Then the BIO thread is awakened to perform the task.

Redis Persistence #

Redis persistence is achieved through RDB and AOF files. RDB only records snapshots at a certain point in time and can automatically build an RDB snapshot if the number of modified keys within a specified time exceeds a threshold. However, in live operations, it is generally recommended to perform regular RDB snapshots during periods of low business activity. RDB stores data snapshots at a specific moment, and once the memory data is written to disk, subsequent changes are ignored. On the other hand, AOF records the commands that build the entire database content. It appends new write operations continuously. Due to continuous appending, the AOF file records a large amount of intermediate states, and the AOF file can become very large. In this case, the bgrewriteaof command can be used to rewrite the AOF file and keep only the latest content to greatly reduce the size of the AOF.

img

To improve system scalability and enhance the support for read operations, Redis supports master-slave replication. After a Redis slave is deployed and configured, it establishes a connection with the master to perform full synchronization.

During the first connection establishment or in the case of a long disconnection, if the missing commands exceed the size of the master’s replication buffer, a full synchronization needs to be performed. During full synchronization, the master starts a child process to save the database snapshot to a file and sends this snapshot file to the slave while synchronizing the write commands after the snapshot to the slave.

After full synchronization is completed, if a slave is briefly disconnected and then reconnects for replication, if the length of the missing write commands is less than the size of the master’s replication buffer, the master sends all the missing content to the slave for incremental replication.

A Redis master can have multiple slaves mounted, and a slave can also have its own slave. Through this method, the workload on the master can be effectively reduced. Moreover, if the master fails, the current slave can stop synchronizing with the master using the slaveof no one command and become the new master.

Redis Cluster Management #

Redis cluster management can be done in three ways.

  • Client sharding, where the client hashes the key and distributes the reads and writes of the key to different Redis instances using modulo or consistent hashing.
  • Adding a proxy in front of Redis, which handles routing policies and backend Redis state maintenance. The client accesses the proxy directly, and when the backend Redis needs to be changed, only the proxy configuration needs to be modified.
  • Using Redis cluster directly. When Redis is initially created, the client directly assigns slots to Redis nodes. When accessing the data later, the client hashes the key to find the corresponding slot and accesses the Redis instance where the slot is located. When scaling up or down, the cluster setslot and migrate commands can be used to transfer all keys under a slot to a target node to achieve scaling.

With this, the basic principles of Redis have been explained. I believe you now have a general understanding of Redis. Next, I will delve into the various technical details of Redis one by one.