35 How to Design a Caching System for Social Feed Scenarios

In the previous lesson, we discussed how to design a caching system for massive counting scenarios. In this lesson, I will explain how to design a caching system for social feed scenarios.

Feed Stream Analysis #

Feed stream is an important part of many mobile internet systems, such as Weibo, WeChat Moments, QQ Friends’ Dynamics, and Toutiao/Douyin information streams, etc. Although these products have different forms, the business processing logic is generally the same. Users “swipe” daily to get the Feed stream, which is also one of the most important use cases of the Feed stream. For the service backend, the process of refreshing and obtaining the Feed stream is to obtain the Feeds that users are interested in, and filter and dynamically assemble them.

Next, I will take Weibo as an example to introduce how the service backend processes the user’s request to refresh the Feed stream.

The operation of obtaining the Feed stream is a heavy operation, and there is a read amplification of 100 to 1000 times in data processing on the backend. In other words, when the front-end user sends an interface request, the service backend needs to request hundreds or even thousands of pieces of data, and then perform assembly and processing before returning the response. Therefore, in order to improve processing performance and respond quickly to users, the Weibo Feed platform heavily relies on caching, and almost all data are obtained from the cache. For example, the user’s follow relationship is obtained from Redis cache, the Feed sent by the user or received special Feeds are obtained from Memcached, and various counts of users and Feeds are obtained from the count service.

Feed Flow Analysis #

As the core business of the Weibo system, the Feed flow business requires a high SLA to ensure user experience. The availability of core interfaces needs to reach four nines (99.99%), and the interface response time needs to be within 50-100ms. The average backend data request time needs to be within 3-5ms. Therefore, in order to meet the massive concurrent access needs of tens of millions of users, a well-architected and continuously improved caching system is necessary.

In the Feed flow business, the cache hit rate of core business data is generally above 99%. These cached data are obtained and assembled by the Feed system through multithreaded concurrent retrieval, and then sent to users in a timely manner.

The processing flow of Feed retrieval is as follows.

First, based on user information, obtain the user’s following relationships, which usually results in 300-2000 UIDs of users being followed.

Next, obtain the user’s own Feed inbox. The inbox mainly stores the list of microblog IDs published by other users that are visible to certain specific users.

Then, obtain the list of microblog IDs of all followed users, which are the Feed IDs visible to most users or all users. These Feed ID lists are stored in the cache in the form of vector arrays. Since the number of user’s follows can reach hundreds or even thousands, this step requires retrieving hundreds or thousands of Feed vectors.

Then, the Feed system merges the inbox and all the Feed vectors of the followed users, and sorts and pages them to obtain the list of target Feed IDs.

Next, retrieve the corresponding content of the Feeds based on the Feed ID list, such as the text, videos, publication time, source microblog ID, etc.

Then, further retrieve detailed information about the users who posted all the microblogs, as well as the content of the source microblogs, and assemble the content.

Afterwards, if the user has set up filtering words, these Feeds need to be filtered to remove the Feeds that the user is not interested in.

Next, retrieve the user’s statuses, such as favorites and likes, for these Feeds, and set them to the corresponding microblogs.

Finally, retrieve the number of forwards, comments, likes, etc. for these Feeds, and perform calculations and assembly. At this point, the Feed flow retrieval is complete, and the list of Feeds is returned to the front end in JSON format, successfully refreshing the user’s homepage.

Feed Stream Cache Architecture #

In feed stream processing, the core business data for caching can be divided into 6 major categories.

The first category is the user’s inbox, where feeds that are only visible to a small number of users are pushed directly to the target users’ inboxes to improve access efficiency, instead of entering the public outbox.

The second category is the user’s outbox. Ordinary feeds posted by users enter the outbox and are visible to almost everyone. The system directly pulls and assembles these feeds when followers refresh their feed list homepage.

The third category is the Social Graph, which refers to the user’s follow relationship, such as various following lists and follower lists.

The fourth category is Feed Content, which includes the text, video, posting time, and source microblog ID of the feed.

The fifth category is the Existence cache, which is used to determine whether a user has read or liked a particular feed. For existence determination, Weibo uses its self-developed phantom system, which stores the data using a bloom filter algorithm.

The sixth category is Counter service, which is used to store various counts such as the number of followers, the number of fans, and the counts for feed reposts, comments, likes, and reads.

For the inbox and outbox of the feed, the Feed system uses Memcached for caching, storing the feed ID in a one-dimensional array format.

For the following lists, the Feed system uses Redis for caching, storing the data in a longset format. Longset is a data structure that has been introduced in previous classes and is an extension of Weibo. It is a one-dimensional array addressed using double hashing. When there is a cache miss, the business client can load the data from the database and directly construct the binary format data of the longset as the value and write it to Redis. When Redis receives it, it restores it directly to memory without adding one by one. This way, even if a user has thousands or tens of thousands of followers, it will not cause any blocking.

Feed content is stored using Memcached. Because feed content has numerous attributes and needs to be expanded according to business needs, the Feed system uses Google’s protocol buffers format for storage. The binary messages generated after serialization by protocol buffers are very compact, occupying 3 to 10 times less storage space than XML, while the performance of serialization and deserialization is more than 10 times higher. Furthermore, it is also convenient to extend and modify fields. Weibo’s feed content was initially stored using XML and JSON, but after 2011, it gradually switched to using protocol buffers for storage.

For existence determination, Weibo’s Feed system uses its self-developed phantom system for storage. Data storage is implemented using a bloom filter storage structure. In fact, phantom itself is a bloom filter structure with segmented storage. A bloom filter uses a bit array to represent a set. Initially, all bits in the array are set to 0. When inserting a key, k independent hash functions are used to calculate the corresponding hash positions, and the bits at those positions are set to 1. When checking the existence of a key, it is determined by performing multiple hash functions on the key and checking if the corresponding hash positions are 1. If any position is 0, it can be determined that the key definitely does not exist. However, if all positions are 1, it is highly probable that the key exists, but there is still a possibility that the key does not exist. This means there is a certain false positive rate, but this rate is very low. On average, when each record occupies 1.2 bytes, the false positive rate can be reduced to 1%, and when it occupies 1.8 bytes, the false positive rate can be reduced to one in a thousand. This can meet the needs of most business scenarios.

For the counter service, Weibo uses the CounterService mentioned earlier. CounterService adopts the schema strategy, supporting multiple counters for a single key, which uses only 5 to 10% of the space but improves read performance by 3 to 5 times.

Feed Stream Mc Architecture #

In the cache system of the feed stream, the L1-Main-Backup architecture is adopted for storing data in Memcached. This architecture was introduced in the previous discussion on distributed Memcached practice. In the Memcached storage system of Weibo’s feed stream, the capacity of the L1 single pool is generally 1/10 of the Main pool. There are usually 4 to 6 groups of L1, which are used to store the hottest data and can effectively solve the problem of high traffic during peak events or holidays. The Main pool has the largest capacity and stores almost all the relatively hot data from recent times. The capacity of the Backup pool is generally less than half of the Main pool, and it is mainly used to handle key access in cases of Main pool abnormalities or misses.

The three-layer L1-Main-Backup Memcached architecture can effectively resist sudden spikes of traffic, local failures, and more. In practice, if the business traffic is not high, it can also be configured as a two-layer Main-Backup system. For 2 or 3 layer Mc architectures, various strategies such as penetration and seeding are required to maintain data consistency, which makes them relatively complex. Therefore, Weibo has built a proxy to encapsulate the multi-layer read-write logic of Mc and simplify business access. Some businesses are very sensitive to response time and do not want to increase time overhead due to the addition of a proxy, so Weibo also provides corresponding clients to directly access the three-layer Mc architecture.

In the event of a sudden spike in hot events, a large number of users come online and access the feed stream in a concentrated manner, and some feeds are accessed with ultra-high concurrency. The overall traffic increases by more than 1 time, and the traffic to the cache nodes where the hot data is located increases by several times. At this time, it is necessary to quickly increase multiple sets of L1 in order to quickly distribute the access to the data of these nodes. In addition, if a node machine fails at any layer, it needs to be replaced by another machine. Therefore, the three-layer Mc architecture often needs to make some changes. The configuration of Weibo’s Mc architecture is stored in the config-server of the configuration center and managed by the captain. The proxy and client read and subscribe to these configurations when started, so they can switch connections in a timely manner when the Mc deployment changes.

When the feed stream processing program accesses the Mc architecture, for read requests, it first randomly selects a group of L1. If the L1 cache hits, it returns directly. Otherwise, it reads from the Main layer. If the Main cache hits, it first seeds the value to L1 and then returns. If the Main cache also misses, it reads from the slave. If the slave cache hits, it seeds the Main cache and the initially selected group of L1, and then returns the data. If the slave cache also misses, it loads the data from the DB and seeds it to each layer. There is an exception here, which is gets request. Because gets request is for the subsequent cas update service, and the three-layer Mc cache is based on the Main and Backup, gets requests directly access the Main layer. If the Main layer fails, it then accesses the Backup layer. As long as one layer successfully retrieves the data, the request is considered successful. When performing cas update later, the data is updated to the corresponding Main or Backup, and if the cas update is successful, the key/value set is also updated to other layers.

For data updates, the three-layer Mc cache architecture is based on the Main-Backup. That is, it first updates the Main layer. If the update to the Main layer is successful, it then writes to all Mc pool in the other three layers. If the update to the Main layer fails, it tries to update the Backup pool. If the update to the Backup pool is successful, it updates the data to the other layers. If both the Main and Backup layers fail to update, it directly returns a failure and does not update the L1 layer. During data seeding or when updating the Main layer successfully and then updating the other layers, the execution of Mc instructions is generally done in the noreply manner, which can efficiently complete multi-pool write operations.

The three-layer Mc architecture can support millions of QPS (Queries Per Second) access, and the hit rate is more than 99% in various scenarios, making it an important support for the stable operation of the feed stream processing program.

For accessing Redis storage in the feed stream, the Redis deployment in the business generally adopts a master-slave configuration. At the same time, multiple sub-businesses are divided into cluster groups according to types, and they are accessed through a multi-tenant proxy. For some businesses with small data volumes, they can also share Redis storage for mixed reading and writing. For businesses that are sensitive to response time, smart clients that directly access the Redis cluster are also supported for performance reasons. The entire Redis cluster is managed, and slot maintenance and migration are handled by the clusterManager. The configuration center records the proxy deployment and Redis configuration and deployment related to the cluster. This architecture was detailed in the previous classic distributed cache system course and will not be repeated here.

With this, all the content of this column is covered. I hope you can combine the knowledge you have learned to apply it in your projects. Thank you for your support for this column, thank you.