34 How to Design a Caching System for High Volume Computing Scenarios

34 How to Design a Caching System for High-Volume Computing Scenarios #

In the previous lesson, we discussed how to design a caching system for a spike system. In this lesson, we will specifically discuss how to design a caching service for massive counting scenarios.

Conventional Counting Scheme #

Counting services are very common in internet systems, where counts for user followers, posts, comments, etc. need to be stored. The storage format for counts is also very simple, with the key generally being the user uid or post id followed by a suffix, and the value being an 8-byte long integer.

The most common counting scheme is to use a cache + DB storage scheme. When the count changes, the count in the DB is first updated, with the count incremented by 1, and then the count in the cache, such as Memcached or Redis, is modified. This scheme is relatively general and mature, but it is not very scalable in high-concurrency access scenarios. In internet social systems, some businesses have particularly frequent count changes, such as read counts for microblog feeds. The number of count changes is equivalent to the number of accesses, with update rates ranging from tens of thousands to millions per second. If stored in a DB, it would impose huge pressure on the DB, making it the bottleneck of the entire counting service. Even if an aggregate and delayed update DB scheme is used, due to the large total count and the balanced and dispersed requests from a large number of different business ends, the tremendous write pressure on the DB remains unbearable. Therefore, this scheme is only suitable for small to medium-scale counting services.

With the emergence and maturation of Redis, many internet systems directly store all counts in Redis. By using the hash split method, the write performance of the counting service in a Redis cluster can be greatly improved. By mounting multiple slave nodes after the master through master-slave replication and utilizing read-write separation, the read performance of the counting service in the Redis cluster can be significantly enhanced. Furthermore, Redis has persistence mechanisms, so data loss will not occur. In many large and medium-sized internet scenarios, this is a more suitable counting service scheme.

In the field of internet mobile social networking, due to the enormous user base, large amounts of status data are published daily and there are numerous interaction actions among users, resulting in massive counts and extremely high-concurrency access. If Redis is directly used for storage, it will bring about huge costs and performance problems.

Massive Counting Scenarios #

Taking Weibo as an example, there are a large number of objects to be counted in the system. For example, from the user perspective, there are over 200 million daily active users and nearly 500 million monthly active users. From the feed perspective, there are trillions of historical Weibo feeds, with hundreds of millions of new feeds being added daily. These users and feeds not only need to be counted, but also require multiple counts. For example, from the user perspective, each user needs to record the number of followers, fans, and posted feeds. From the feed perspective, each feed needs to record the number of reposts, comments, likes, and views.

Moreover, in the context of Weibo business scenarios, each request typically involves requesting multiple objects’ counts. For example, when viewing a user’s profile, in addition to obtaining the user’s basic information, it is also necessary to simultaneously retrieve the user’s number of followers, fans, and posted feeds. When obtaining the Weibo feed list, apart from retrieving the feed contents, it is also necessary to simultaneously get the number of reposts, comments, likes, and views for each feed. Therefore, the total traffic volume of the Weibo counting service is particularly large and can easily reach millions of queries per second.

Therefore, in high-concurrency scenarios involving massive counting and access, if a caching + database architecture is adopted, there are several challenges. Firstly, there will be a bottleneck in the database when updating the counts. Secondly, individual requests involve querying dozens of counts at a time, and once there is a cache miss, it will result in hitting the database, making the database read operation a bottleneck as well. This is because the TPS (transactions per second) that the database can support is only between 3000 and 6000, which is far from sufficient for high-concurrency counting access.

By adopting a Redis full storage solution, performance for read and write operations will not be the main problem due to sharding and master-slave replication. However, the cost of capacity will be a huge expense.

Firstly, Redis, as a general-purpose storage, has low memory storage efficiency when storing counts. Taking storing a key as a long id type and its corresponding count value as 4 bytes as an example, Redis would require at least around 65 bytes, with slight differences across different versions. However, this count theoretically only needs to occupy 12 bytes. The effective memory load is only 12/65 = 18.5%. If we further consider that a long id type needs to store 4 different types of 4-byte counts, the effective memory load is only (8+16)/(65*4) = 9.2%.

On the other hand, all Redis data is stored in memory. Storing trillions of historical records requires more than 10 terabytes of data. If we consider a 1 master and 3 slaves configuration for the core business, it will require over 40 terabytes of memory. Furthermore, when considering deployment in multiple IDCs, it would easily occupy hundreds of terabytes of memory. Assuming each server has 100 gigabytes of memory, the counting service would require over a thousand large memory servers. The storage cost is too high.

Architecture of Massive Counting Service #

To solve the storage and access problems of massive counting, Weibo has developed a counting service system based on Redis. This counting service is compatible with the Redis protocol and stores all data in two areas: memory and disk. First, a number of Table spaces of the same size are pre-allocated in memory. Typically, each Table occupies 1GB of memory, with a maximum allocation of about 10 Tables. Table0 is used first, and when the storage fill rate exceeds the threshold, Table1 is used, and so on. In each Table, the key is the Weibo ID and the value is a user-defined set of counts.

Since the Weibo ID increases over time, each memory Table only needs to store a certain range of IDs. The memory Table is pre-allocated with key-value slots of the same size. Each insertion of a new key occupies one slot. When the slot fill rate exceeds the threshold, the next Table is used, and so on. When all pre-allocated Tables are used up, more new Table spaces can be allocated from memory according to the configuration. When the memory usage reaches the threshold, the Table with the smallest range of IDs in memory is written to the SSD disk. The written Table file is called a DDB file. Each memory Table corresponds to one DDB file on disk.

The counting service maintains an index of the written DDB files in memory so that when a query needs to access the disk from memory, it can directly locate the disk file, speeding up the query.

The counting service allows for schema policies to store multiple counts for one key’s value. The space occupied by each count is determined by the schema and can be as precise as bits. Each count in the key has a maximum storage space, so it can only support counts within a limited range. If the count exceeds the set threshold, the key needs to be deleted from the Table and moved to the aux dict auxiliary dictionary.

Each Table is responsible for a certain range of IDs. Since Weibo IDs increase over time and are not incremental, Table rolling is based on the fill rate reaching the threshold. When an exceptional event occurs in the system, or when the network between different regions is disconnected for a long time and reconnected, there may be more count keys inserted into previous Tables. If the old Table exceeds its capacity limit due to a large amount of data insertion or if it continues searching for a storage location without success and exceeds the query threshold, the new key will be inserted into the extend dict extension dictionary.

Feeds in Weibo generally have distinct hot and cold areas, and newer feeds are hotter and have larger access volume, while older feeds are colder. Hot keys are stored in memory Tables, and old cold keys are swapped to DDB files along with their respective Tables. When querying the cold keys in a DDB file, multi-threaded asynchronous parallel querying is used, which does not significantly affect normal business operations. After these cold keys are queried from the DDB, they are stored in the LRU cache for easy access in subsequent queries.

The memory data snapshot of the counting service still adopts the RDB + rolling AOF strategy mentioned earlier. The RDB records the corresponding AOF file ID and position at the time of snapshot construction. During a full replication, the master sends all DDB files on disk, as well as the RDB and AOF corresponding to the memory data snapshot, to the slave.

All subsequent replications are incremental replications. When a slave disconnects and reconnects to the master, it reports its synchronized AOF file ID and position, and the master sends all content after the corresponding file position to the slave to complete the synchronization.

The memory Table in the counting service is a one-dimensional open data structure, and each key-value occupies the same memory based on the schema policy. Internally, each key-value has the key and multiple counts deployed compactly. The process for inserting and querying a key is as follows:

First, determine the memory Table where the key is located based on the ID range of all Tables.

Then, compute the hash using a double-hash algorithm. Use two hash functions to compute two hash values and use the formula h1 + N * h2 to locate the search.

When inserting or modifying a count, if the query position is empty, it is immediately inserted as a new value for the key/value pair. Otherwise, compare the key. If the keys are the same, increase or decrease the count. If the keys are different, increment N by 1 and move to the next position, proceeding with the previous judgment. If the query position remains non-empty and the keys are different, the search is performed up to the set threshold. If the query is still not found, the key is recorded in the extend dict extension dictionary.

When searching for a count key, if the query position is empty, it means the key does not exist and the search stops immediately. If the keys match, the count is returned. Otherwise, increment N by 1 and continue with the subsequent queries. If the query reaches the set threshold without encountering an empty position and the keys are different, then query the aux dict auxiliary dictionary and the extend dict extension dictionary. If the key is not found, it means the count is 0.

Benefits of Massive Counting Service #

Weibo’s counting service stores multiple counts compactly according to the schema, sharing the same key. The size of each count is designed in bits, with no additional pointer overhead. The memory usage is only less than 10% of Redis. At the same time, if the count exceeds a threshold, an auxiliary dictionary is used for independent storage.

Since multiple counts are stored under one key and these counts typically need to be returned, querying once can retrieve multiple counts. This improves query performance by 3 to 5 times compared to storing each count independently.