29 What Is Still Missing When Redis Faces Billion Level Qps Access

29 What is Still Missing When Redis Faces Billion-level QPS Access #

As we all know, when Redis is running in a production environment, it faces challenges dealing with massive data and high-concurrency access. This requires specific expansion and optimization. In this lesson, I will combine the problems encountered by Weibo when using Redis to analyze how to expand and transform Redis in a production environment to cope with millions of QPS.

Function Expansion #

For online businesses with high traffic, the memory usage of a single Redis instance can easily reach several gigabytes, and the corresponding AOF can occupy tens of gigabytes of disk space. Even if we perform a rewriteaof during off-peak times to reduce data redundancy, the AOF file will still exceed 10GB due to the large amount of business data and write operations.

In such cases, when Redis needs to be upgraded or bugs need to be fixed, restarting the service directly will take nearly 10 minutes due to data recovery, which greatly affects the system’s availability. To address this issue, Redis can be extended with a hot upgrade feature, allowing upgrades to be completed in milliseconds without affecting business access at all.

The hot upgrade solution is as follows: first, build a Redis shell program that saves all properties of redisServer (including redisDb, client, etc.) as global variables. Then, encapsulate all Redis processing logic into a dynamic link library (.so file). On the first start of Redis, it loads and restores data from disk. In subsequent upgrades, the shell program reloads the new .so file of Redis through a command to complete the version upgrade in milliseconds. Throughout the entire process, all client connections are retained, and after a successful upgrade, the original clients can continue to read and write operations. The entire process is completely transparent to the business.

In addition, in the use of Redis, we often encounter special business scenarios that cannot be well satisfied by the current Redis data structures. In such cases, custom expansion of Redis can be performed. Based on the characteristics of the business data, one can expand new data structures or even create new Redis storage models to improve the memory efficiency and processing performance of Redis.

In Weibo, there is a business type called “follow list”. The follow list stores the uids of all the users that a user follows. The follow list can be used to verify the follow relationship, and it can also be used to further obtain the microblog lists of all the followers. Due to the large number of users, the Redis used to store the follow list is used as a cache, so inactive follow lists will be quickly evicted from Redis. When the follow list of a user is needed again, it is reloaded from the database and written back to Redis. The elements of the follow list are all long integers, and it was originally stored using a set. When writing back to the set, the sadd command is used for batch addition. However, it was discovered that for follow lists with a large number of followers, such as those with thousands or tens of thousands of users, adding thousands or tens of thousands of uids using sadd, even in batches, takes up a lot of time. This leads to low efficiency in data writing back and causes Redis to stall. In addition, using a set to store the follow list also has low memory efficiency.

As a result, we extended Redis with the “longset” data structure. The “longset” is essentially a one-dimensional open array of long integers. It can be addressed using double hashing.

Before loading the user’s follow list from the database and preparing to write it back to Redis, the client first converts the list of followed uids into the binary format of a long array, and then writes it to Redis using the extended “lsset” command. When Redis receives the command, it directly stores the binary format long array sent by the client as the value.

The long array in the longset uses double hashing for addressing. That is, two hash functions are used to calculate each long value, and then the position of the long value is determined using the formula (h1 + n*h2)% array length. Here, n starts from 0, and if there is a hash collision, i.e., if the calculated hash position already has another element, n is incremented by 1 and the calculation is continued, with the maximum number of calculations being the length of the array.

During the process of continuously adding long value elements to the longset data structure, when the fill ratio of the array exceeds a certain threshold, Redis will return an exception indicating that the longset is full. In this case, the client will construct a one-dimensional long array with double the capacity based on the latest full data, and write it back to Redis using lsset again.

In mobile social platforms, there is a huge user base, with users following and subscribing to each other. Users themselves continue to share various status updates, and these status updates will be read, commented on, shared, and liked by other users. In terms of user dimension, there are numbers of followers, fans, and various counts for different status behaviors. Additionally, for each feed or status that a user posts, there are counts for views, comments, reposts, and reactions. On one hand, there is a large number of keys that need to be counted, and on the other hand, a single key may have multiple counts. In daily operations, a single query not only needs to retrieve a large number of keys, but also needs to retrieve multiple counts for each key.

Taking Weibo as an example, the historical count can reach billions, and with daily addition of hundreds of millions of feed records, each record generates 4 to 8 counts. If Redis is used for counting and only a single replica is used for storage, the historical data alone would occupy 5 to 6 terabytes of memory, with daily increase of more than 50 gigabytes. If multiple IDCs are considered, and one master and multiple slaves are deployed in each IDC, memory usage would increase by an order of magnitude. Since all the keys for Weibo counts are long integer values that increase over time, we transformed the storage structure of Redis.

First, we use a cdb segmented storage counter, which stores the counts in a pre-allocated memory array called Table, and solves conflicts using double hashing, avoiding the large pointer overhead in the Redis implementation. Then, we support multiple columns through the Schema strategy, where multiple counts corresponding to a key ID can be treated as a single count record. We also support dynamically adding or removing count columns, with memory usage for each column reduced to bits. Moreover, due to the clear distinction between hot and cold data for feed counting, we have implemented a storage scheme that separates recent hot data in memory from older cold data on disk to reduce machine costs.

Extension of Counter Service #

In the upcoming case analysis session, I will further discuss the transformation plan.

When using Redis in production, both the initial sync mechanism and later psync and psync2 mechanisms, the master-slave replication is limited by the replication backlog buffer. If the slave disconnects from the replication connection for a long time, or if the master writes a large amount of data during a certain period of time and the replication delay of the slaves is large, the replication offset of the slave will fall outside the replication backlog buffer of the master, resulting in full replication.

Fully Incremental Replication #

As a result, Weibo integrates Redis’s RDB and AOF strategies to build a fully incremental replication solution.

In the fully incremental solution, there is no longer just one AOF file, but multiple AOF files with incremental suffix IDs, such as aof.00001, aof.00002, etc. When an AOF file exceeds a threshold, the next file with an ID incremented by 1 is created to store the latest write instructions in a rolling manner. When constructing the RDB through bgsave, the RDB file not only records the current memory data snapshot, but also records the RDB construction time, as well as the ID and position of the corresponding AOF file. In this way, the RDB file and the write instructions following the AOF file position recorded in it constitute a complete record of the latest data.

During master-slave replication, the master synchronizes data to the slave through an independent replication thread. Each slave creates a replication thread. The first replication is a full replication, and subsequent replications are incremental replications as long as the AOF file has not been deleted, regardless of how long the slave has been disconnected from the replication connection.

During the first full replication, the replication thread first sends the RDB file to the slave, and then sends all the data after the AOF file position recorded in the RDB to the slave, thus completing the replication. The entire process does not require re-construction of the RDB.

During subsequent synchronization, the slave first passes the ID and position of the previously replicated AOF files. The master’s replication thread reads all the content after the corresponding AOF file position based on this information and sends it to the slave to complete data synchronization.

Since the entire replication process is performed in an independent replication thread of the master, the replication process does not affect normal user requests. In order to reduce the replication pressure on the master, the fully incremental replication solution still supports slave nesting, which means that multiple slaves can be mounted after a slave to distribute the replication pressure to different Redis instances.

Cluster Management #

As mentioned earlier, Redis-Cluster’s data storage and cluster logic are tightly coupled, which makes the code logic complex and error-prone. The mapping of slots and keys requires additional memory consumption, and it particularly affects businesses with small values. Moreover, the migration efficiency is low, and migrating large values can easily lead to blocking. In addition, the Cluster replication only supports slaves under masters and cannot support scenarios that require a large number of slaves and have a high read TPS. Furthermore, Redis is currently only a storage component, and there is either no support or inconvenient support for cluster management, daily maintenance, status monitoring, and alarm functions during online operation.

Therefore, we have also built a cluster storage system based on Redis. First, we separate the cluster functionality of Redis into an independent system, so that Redis only focuses on storage and no longer maintains slot-related information. Through the newly constructed clusterManager component, we are responsible for slot maintenance, data migration, and service status management.

Redis cluster access can be done through a proxy or smart client. For businesses that are particularly sensitive to performance, they can access Redis through a smart client to avoid extra hops. Meanwhile, general businesses can access Redis through a proxy.

The deployment of business resources and access to the proxy are all obtained and coordinated through the configuration center. The clusterManager registers the deployment of business resources to the configuration center and continuously probes the service status, performing tasks such as fault tolerance, master-slave switching, and slave promotion/demotion based on the service status. The proxy and smart client obtain configuration information from the configuration center and continuously subscribe to changes in the service status.