Extra Meal 07 Experiences We Can Learn From Microblog's Redis Practices

Extra Meal 07 Experiences We Can Learn from Microblog’s Redis Practices #

We know that Weibo extensively uses Redis in its internal business scenarios, accumulating a wealth of application and optimization experience. A Weibo expert once shared their journey of optimizing Redis in an article, which contains many valuable insights.

As the saying goes, “Collecting gems from other mountains makes one rich.” By learning and mastering these experiences, we can better apply Redis to our own business scenarios. In today’s lesson, I want to discuss Weibo’s Redis optimizations based on the expert’s sharing and my conversations with their internal specialists.

First, let’s take a look at Weibo’s Redis requirements in their business scenarios. These business needs serve as the starting point for Weibo’s optimization and improvement of Redis.

Weibo has various business scenarios, such as the “Red Envelope Rain” event, the statistics of follower count, user count, and reading count, information flow aggregation, music charts, etc. At the same time, these businesses face a large user base, and the amount of data accessed and stored in Redis often reaches the terabyte level.

As an application directly facing end users, the user experience of Weibo is crucial, and all of this requires technical support. Let’s summarize Weibo’s technical requirements for Redis:

Ability to provide high-performance and high-concurrency read and write access, ensuring low latency.
Ability to support large-capacity storage.
Flexible scalability for rapid expansion in different business scenarios.

To meet these requirements, Weibo has made numerous improvements and optimizations to Redis. In summary, they have improved Redis’s data structure and working mechanism and developed new functional components based on Redis, including RedRock for supporting large-capacity storage and RedisService for achieving service-oriented architecture.

Next, let’s delve into the specific improvements that Weibo has made to Redis itself.

Basic Improvements to Redis by Weibo #

According to a technical expert from Weibo, the basic improvements to Redis by Weibo can be divided into two categories: avoiding blocking and saving memory.

Firstly, to meet persistence requirements, they implemented a mechanism that combines full RDB (Redis Database) with incremental AOF (Append-Only File) replication, which avoids the issues of data reliability or performance degradation. Of course, Redis has also added a mechanism to hybridize RDB and AOF after version 4.0.

Secondly, an additional BIO (Background Input Output) thread is used to handle actual flush writing when AOF logs are flushed to disk. This avoids the problem of slow AOF log flushes blocking the main thread.

Thirdly, they added the aofnumber configuration option. This option can be used to set the number of AOF files, controlling the total file size during AOF writing and preventing disk space from being consumed by too many AOF log files.

Lastly, in the master-slave replication mechanism, an independent replication thread is used for master-slave synchronization, avoiding blocking the main thread.

In terms of saving memory, Weibo has a typical optimization, which is customizing data structure.

When caching a user’s follow list with Redis, they designed a custom LongSet data type for storing the follow list. This data type is a collection of Long type elements, and its underlying data structure is a hash array. Before designing the LongSet type, Weibo used a hash set type to store user follow lists, but this consumed a large amount of memory when saving a large amount of data.

Moreover, when the cached follow list is evicted from Redis, the cache instance needs to retrieve the user follow list from the backend database and write it into a hash set using HMSET. In scenarios with high concurrent request pressure, this process can degrade cache performance. Compared to hash sets, the LongSet type uses a hash array to store data, which not only avoids the overhead of many hash tables and saves memory space, but also enables fast storage and retrieval.

From the aforementioned improvements, we can see that Weibo’s optimization approach for Redis aligns with the optimization goals we have repeatedly emphasized in previous courses. I have summarized two lessons from this.

The first lesson is that high performance and memory efficiency should always be the focus when using Redis. This is closely related to Redis’ position within the entire business system.

Redis is typically deployed in front of the database layer as a cache, so it needs to be able to return results quickly. Additionally, Redis uses memory to store data, which brings advantages in terms of access speed but also requires special attention to memory optimization in operations and maintenance. I have covered many topics related to performance optimization and memory saving in earlier courses (such as Lectures 18-20), and I encourage you to review them and apply them in practice.

The second lesson is that in practical applications, it may be necessary to customize or extend Redis to meet the requirements of specific scenarios, just like Weibo customizing the data structure. However, if you want to do customization or extension work, you need to understand and master the Redis source code. Therefore, after grasping the basic principles and key technologies of Redis, I suggest making reading the Redis source code the next goal. This way, you can not only strengthen your understanding of the source code based on principles, but also engage in the development of new features or data types after mastering the source code. I have introduced how to add new data types to Redis in Lecture 13, so you can review that as well.

In addition to these improvements, to meet the demand for large capacity storage, Weibo experts mentioned in the technical sharing that they combine RocksDB with hard disks to expand the capacity of a single instance. Let’s learn more about it.

How Does Weibo Handle High Capacity Data Storage Needs? #

Weibo often needs to store data at the level of terabytes at the business layer, which requires expanding the storage capacity of Redis instances.

To address this need, Weibo differentiates hot and cold data and keeps the hot data in Redis while writing the cold data to the underlying hard disk using RocksDB.

In Weibo’s business scenarios, the concept of hot and cold data is common. For example, when certain Weibo topics are newly trending, they generate high traffic with a massive number of users accessing these topics. In such cases, it’s necessary to use Redis as a service for user requests.

However, as the popularity of these topics wanes, the number of visitors drastically decreases, turning the data into cold data. At this point, the cold data can be migrated from Redis to RocksDB and stored on the hard disk. This approach allows for saving memory in the Redis instance by storing only the hot data, and the amount of data that a single instance can handle is determined by the size of the entire hard disk.

Based on Weibo’s technical sharing, I have created an architecture diagram illustrating their use of RocksDB to assist Redis in scaling:

From the diagram, we can see that Redis uses asynchronous threads to read and write data in RocksDB.

After all, the latency of reading and writing RocksDB cannot match the latency of accessing Redis memory. Hence, this is done to prevent blocking the Redis main thread during the read and write operations on cold data. As for the layout and management of cold data on SSD, that is handled by RocksDB. RocksDB is already relatively mature and stable, making it suitable for managing cold data in Redis.

Regarding Weibo’s optimization work using RocksDB and SSD for scaling, I have summarized two key points that I would like to share with you.

Firstly, there is still a demand for implementing high-capacity single instances in certain business scenarios. While we can use sharded clusters with multiple instances to distribute data, this approach also brings the overhead of managing and maintaining a distributed system. Additionally, the scale of sharded clusters is limited. If we can increase the storage capacity of individual instances, even with a smaller cluster, it can still accommodate more data.

The second insight is that if you want to achieve high-capacity Redis instances, leveraging SSD and RocksDB is a good solution. The open-source Pika from 360 that we learned about in Lesson 28 and Weibo’s approach are both excellent references.

RocksDB enables fast data writing while also utilizing memory caching for a portion of the data, providing tens of thousands of read performance. Moreover, the performance of current SSDs is rapidly improving, with single SSD drives providing IOPS in the hundreds of thousands. When these technologies are combined, Redis can provide high-capacity data storage while maintaining a certain level of read and write performance. When you encounter similar needs, you can also implement RocksDB based on SSD to store large amounts of data.

How did Weibo transform Redis into a service for multiple business lines? #

Different businesses on Weibo have different requirements for Redis capacity, and these requirements may change with the evolution of the business.

To flexibly meet these business demands, Weibo transformed Redis into a service (RedisService). Service transformation means using a Redis cluster to serve different business scenarios, with each business having its own independent resources that do not interfere with each other.

At the same time, all Redis instances form a resource pool that can be easily scaled up. If a new business line is launched or an old one is discontinued, resources can be requested from or returned to the resource pool.

With the Redis service in place, different business lines can conveniently use Redis. Business departments no longer need to independently deploy and maintain it. They just need to let their application clients access the Redis service cluster. Even if the data volume of the business application increases, there is no need to worry about instance capacity because the service cluster can automatically scale up to support business growth.

During the Redis service transformation, Weibo adopted a solution similar to Codis, using a cluster proxy layer to connect clients and servers. From Weibo’s publicly available technical materials, it is evident that they have implemented rich service-oriented functionality support in the proxy layer.

Client connection listening and dynamic port management.
Redis protocol parsing: Identify requests that need to be routed and directly return errors for illegal or unsupported requests.
Request routing: Based on mapping rules between data and backend instances, route requests to the corresponding backend instances for processing and return the results to the clients.
Metrics collection and monitoring: Collect the running status of the cluster and send it to dedicated visualization components for monitoring and processing.

In addition, in the service cluster, there is a configuration center that manages the metadata of the entire cluster. Meanwhile, instances run in a master-slave mode to ensure data reliability. Data from different businesses is deployed on different instances to maintain isolation.

Based on my understanding, I have created a diagram that illustrates the architecture of Weibo’s Redis service cluster. You can take a look at it.

From the practice of Redis service transformation, we can conclude that providing platform-level services is a common approach when multiple business lines have the same Redis usage requirements, which is known as service-oriented architecture.

When turning a common function into a platform service, there are several key considerations, including smooth platform scaling, support for multi-tenancy and business data isolation, flexible routing rules, and rich monitoring capabilities.

If platform scaling is required, we can leverage methods such as Codis or Redis Cluster. The requirements for multi-tenancy support and business isolation are consistent, and we need to achieve these requirements through resource isolation, i.e., deploying data from different tenants or businesses separately to avoid resource sharing. For routing rules and monitoring capabilities, Weibo’s current solution is good, where these two functionalities are implemented in the proxy layer.

Only by effectively implementing these functionalities can a platform service efficiently support the needs of different business lines.

Summary #

In today’s class, we learned about the Redis practice of Weibo and summarized many experiences from it. In summary, Weibo’s technical requirements for Redis can be summarized in 3 points: high performance, large capacity, and easy scalability.

In order to meet these requirements, in addition to optimizing Redis, Weibo has also developed its own extension systems, including capacity expansion mechanisms based on RocksDB and a service-oriented RedisService cluster.

Finally, I would like to share two of my own experiences with you.

The first is about the RedisService cluster developed by Weibo. This optimization direction is the main focus of the platform department in major companies.

Vertical slicing of businesses and horizontal slicing of platforms are the basic approaches to building large-scale systems. The so-called vertical slicing means deploying different business data separately to avoid interference between them. Horizontal slicing refers to the unification of different business lines with the same requirements for the operating platform, by building a platform-level cluster service to support them. Redis is a typical foundational service required by multiple business lines, so serving it in a clustered manner helps improve the overall efficiency of the business.

The second is the important role of code practice in our journey to become experts in Redis.

I have found that the secondary transformation or development of Redis is an inevitable path for major companies, which is closely related to their diverse business and wide-ranging requirements.

The customized data structures, RedRock, and RedisService developed by Weibo are very typical examples. Therefore, if we want to become experts in Redis and join major companies, then focusing on principles before code and practicing while learning is a good approach. Principles guide the focus of code reading, but hands-on practice is crucial. It requires us to simultaneously conduct deployment and operational practice as well as code reading. Only by doing so can we truly master the knowledge. I hope you not only value learning the principles, but also truly use them to guide your practice and improve your practical skills.

One Question Per Lesson #

As usual, I have a small question for you. When you are using Redis in practice, do you have any classic optimization improvements or experiences with secondary development?

Feel free to share your experiences in the comments section. Let’s discuss and exchange ideas together. If you find today’s content helpful, feel free to share it with your friends or colleagues.