37 Accounting System Design I How to Handle Calculation of Massive Data

37 Accounting System Design I - How to Handle Calculation of Massive Data #

Hello, I’m Tang Yang.

Starting today, we officially enter the final practical section. In the previous lessons, I introduced you to how to ensure high performance, high availability, and high scalability of systems in the face of high concurrency from the perspectives of databases, caches, message queues, and distributed services. Although there were many examples to help you understand the theoretical knowledge, there was no complete example to tie the knowledge together.

Therefore, in order to put the knowledge we mentioned into practice, in the practical section, I will use Weibo as a background and use two complete cases to show you how to deal with high concurrency and high traffic from a practical perspective. I hope to provide you with a more concrete understanding and some ideas when implementing similar systems. Today, the first case I want to talk about is how to design a counting system that supports high concurrency and large storage.

Let’s take a look at a scenario: On the subway, you may often browse Weibo, like and share hot topics. If there is a lottery event, you may also repost it. These data related to Weibo are actually counting data in the Weibo scene. Specifically, there are several types of counting data:

Number of comments, number of likes, number of reposts, number of views, number of responses, and so on;

Number of followers, number of followings, number of posted Weibos, number of direct messages, and so on.

The counts at the Weibo level represent the popularity of a Weibo, and the user-level data, especially the number of followers, represent the influence of the user. So everyone generally values this counting information. And in many scenarios, we need to query counting data (such as the homepage information flow page, personal profile page), and the access to counting data is huge, so it is necessary to design a counting system to maintain it.

However, when designing a counting system, many people encounter problems such as low performance and high storage costs. For example, storing counts together with Weibo data means locking this record every time the count is updated, which reduces concurrency for writing. In my opinion, the reason for these issues is that you do not have a sufficient understanding of the design and optimization of counting systems. Therefore, in order to solve these pain points, it is necessary for you to form a complete design plan.

Characteristics of Counting in Business #

Firstly, you need to understand the characteristics of counting in business in order to design a reasonable solution. In my opinion, there are several main characteristics.

Huge amount of data. As far as I know, the number of Weibo posts in the system has exceeded billions. Just calculating the core counts of Weibo such as reposts, comments, likes, and views, the data volume is already in the trillions. Moreover, the number of Weibo posts continues to grow rapidly, and with the increasing complexity of Weibo business, the types of counts on Weibo may continue to expand (such as adding expression counts). Therefore, the count magnitude at the Weibo level has already surpassed trillions. In addition, the number of Weibo users has exceeded 1 billion, and although the magnitude of counts at the user level is much larger than at the Weibo level, it has also reached billions. So, how to store these trillions-level numbers is a big challenge for us.

High traffic volume, requiring high performance. Weibo has over 200 million daily active users, and nearly 500 million monthly active users. The access volume of core services (such as the homepage feed) reaches hundreds of thousands of requests per second, and the count system access volume exceeds millions of requests per second. Moreover, in terms of performance, it requires returning results in milliseconds.

Lastly, there is a high requirement for availability and accuracy of the numbers. In general, users are very sensitive to count numbers. For example, if you have been live streaming for several months and only gained 1000 followers, but suddenly lost hundreds of them one day, you would start wondering where the problem lies or even complain to the live streaming platform.

So, facing the challenges of high concurrency, large data volume, and strong data consistency requirements, how did Weibo design and evolve its count system? And what experiences can we learn from it?

How to Design a High-concurrency Counting System #

When we initially designed the counting system, the traffic on Weibo was not as intense as it is now. Following the principle of KISS (Keep It Simple and Stupid), we aimed to design a simple and easy-to-maintain system. Therefore, we used MySQL to store the counting data. It was the most familiar database for us, and our team had abundant experience in operations and maintenance. Let’s take a specific example.

Suppose we want to store data related to Weibo metrics (such as repost count, comment count, praise count, etc.). You can design the table structure as follows: use the Weibo ID as the primary key, and have separate columns for repost count, comment count, praise count, and view count. This way, you can retrieve the counts with a single SQL statement:

SELECT repost_count, comment_count, praise_count, view_count FROM t_weibo_count WHERE weibo_id = ?

This approach is the simplest if the data and access volume are both small. So if your system is not large in scale, you can directly implement this method.

However, as Weibo grew, the previous counting system encountered many problems and challenges.

For example, the number of Weibo users and the amount of Weibo posts increased rapidly, leading to a significant increase in the amount of counting data to be stored. When the storage volume of a single table in MySQL reached tens of millions, the performance started to degrade. To address this, we considered using sharding (splitting the data across multiple databases and tables) to distribute the data and improve the performance of counting reads.

We used the “weibo_id” as the partition key. When choosing the sharding method, we considered two options:

One approach is to use a hashing algorithm to calculate the hash value of the weibo_id and determine which database and table to store the data in based on this hash value. You can review the content of Lesson 9 on database sharding for specific implementation details.

Another approach is to shard the data based on the time the Weibo ID was generated. As we mentioned in Lesson 10 when discussing the ticket generator, it is best to have a business-meaningful field included in the generated ID, such as a timestamp. Therefore, during sharding, we can first reverse-engineer the timestamp using the algorithm of the ticket generator, and then shard the data based on this timestamp. For example, use one table per day or one table per month, and so on.

Because the more recently a Weibo is posted, the higher the access volume for its counting data, we considered both approaches. However, sharding based on time would result in uneven data access. Therefore, in the end, we chose the hashing approach for sharding.

img

Meanwhile, the access volume for counting also experienced a significant leap. In the initial version of Weibo, the counting data was not displayed in the homepage feed. At that time, using MySQL alone could handle the access volume for counting reads. However, later on, we started displaying counts for reposts, comments, and praises in the homepage feed as well. The access volume for the homepage feed was enormous, and relying solely on the database could no longer handle such high concurrency. Therefore, we considered using Redis to accelerate read requests. By deploying multiple slave nodes, we improved availability and performance. We also used hashing to shard the data, which maintained the performance of counting reads. However, this database + cache approach had a downside: it couldn’t guarantee data consistency. For example, if the database write succeeded but the cache update failed, it would lead to data inconsistencies and affect the accuracy of the counts. So we completely abandoned MySQL and fully used Redis as the storage component for counting.

img

In addition to considering the performance of counting reads, we also needed to address the issue of improving the performance of counting writes since the counts for popular Weibo posts change frequently. For example, every time a Weibo post is reposted, the repost count of that Weibo needs to be incremented. Given that a celebrity’s marriage or divorce announcement could instantly generate tens or even hundreds of thousands of reposts, how would you reduce the write pressure in such a scenario?

You may have already thought of using a message queue to smooth the write peak, which means that when we repost a Weibo, we write a message to the message queue. Then, in the message processing program, we increment the repost count of that Weibo by 1. One point to note here, you can further reduce the write pressure on Redis by batching message processing. For example, you can merge three consecutive repost count updates into a single update (I’m using SQL syntax to make it easier for you to understand):

UPDATE t_weibo_count SET repost_count = repost_count + 1 WHERE weibo_id = 1; 
UPDATE t_weibo_count SET repost_count = repost_count + 1 WHERE weibo_id = 1;  
UPDATE t_weibo_count SET repost_count = repost_count + 1 WHERE weibo_id = 1; 

At this point, you can combine them into a single update:

UPDATE t_weibo_count SET repost_count = repost_count + 3 WHERE weibo_id = 1; 

How to reduce the storage cost of a counting system #

Up to this point, I have already explained how a counting system that supports high-concurrency query requests is implemented. However, in the context of Weibo, where the magnitude of the counts is in the trillions, this poses a higher requirement for us: how to store and access the complete count data within limited storage cost.

As you know, Redis uses memory to store information. Compared to MySQL, which uses disk storage, the cost of storage is incomparable. For example, a server’s disk can be mounted with 2TB, but it may only have 128GB of memory. This means that the storage space of the disk is 16 times that of memory. In terms of generality, Redis uses memory more liberally, with a lot of pointers and additional data structures, so if we want to store a count information of the KV type, with the key being an 8-byte long weibo_id and the value being a 4-byte int type representing the number of reposts, it would occupy more than 70 bytes of space in Redis. The wasted space is huge. If you face this problem, how do you optimize it?

I suggest that you first make some modifications to the original Redis, using new data structures and data types to store count data. In the modifications I made, there are mainly two points:

First, the original Redis stores keys as strings, so for an 8-byte long data, it requires 8 (the length of the sdshdr data structure) + 19 (the length of the 8-byte number) + 1 (’\0’) = 28 bytes. However, if we store it as a long type, it only requires 8 bytes, saving 20 bytes of space;

Second, unnecessary pointers in the original Redis are removed. If we want to store a KV pair, we only need 8 (weibo_id) + 4 (repost count) = 12 bytes, which has greatly improved compared to before.

At the same time, we also use a large array to store count information, and the position in the array is calculated based on the hash value of the weibo_id. The algorithm is shown below:

Insertion:
    
h1 = hash1(weibo_id) // Calculate hash based on weibo ID
    
h2 = hash2(weibo_id) // Calculate another hash based on weibo ID to resolve conflicts caused by the previous hash algorithm
    
for s in 0,1000
    
   pos = (h1 + h2*s) % tsize // If there is a conflict, calculate hash2 multiple times
    
     if(isempty(pos) || isdelete(pos))
    
         t[pos] = item  // Write to the array
    
Query:
    
for s in 0,1000
    
   pos = (h1 + h2*s) % tsize  // Calculate the position in the array according to the logic used for inserting data
    
      if(!isempty(pos) && t[pos]==weibo_id)
    
         return t[pos]
    
return 0 
    
Deletion:
    
insert(FFFF) // Insert a special flag

After making modifications to the original Redis, you also need to further consider how to save memory usage. For example, Weibo has counts such as reposts, comments, views, and likes. If each count requires storing the weibo_id, then it would require a total of 8 (weibo_id) * 4 (4 weibo IDs) + 4 (repost count) + 4 (comment count) + 4 (like count) + 4 (view count) = 48 bytes. But we can store counts of the same weibo_id together, so we only need to record one weibo_id, saving the storage overhead of the other three weibo_ids, and further reducing the storage space.

However, even with the above optimizations, due to the huge magnitude of the counts and their rapid growth rate, if we were to store the count information in full memory, we would need a large number of machines to support it.

However, the count data of Weibo has obvious hot spot properties: the more recent the Weibo, the more likely it will be accessed, and the probability of accessing Weibo that are old in time is very low. So, in order to minimize the use of servers, we consider adding SSD disks to the counting service, and then dump the relatively old data to the disk, while keeping only the recent data in memory. When we need to read cold data, we use a separate I/O thread to asynchronously load the cold data from the SSD disk into a separate Cold Cache.

img

After these optimizations, our count service can withstand the test of high-concurrency large-scale data, meeting the requirements of performance, cost, and availability.

In conclusion, I used the example of designing a counting system for Weibo not only to explain how a counting system is implemented, but also to convey the message that when designing a system, you need to understand what the pain points of your system are and then optimize them accordingly. For example, the pain point of the Weibo counting system is the storage cost, so many of the things we did later revolved around how to store the complete count data with limited server resources. Even though deep customization of open-source components (Redis) brings a significant operational cost, it can only be considered as a necessary trade-off for implementing a counting system.

Course Summary #

That’s all for this class. In this class, I used Weibo as an example to help you understand how to implement a high-concurrency counting system capable of storing billions or even trillions of data. The key points you need to know are as follows:

The solution of combining a database and cache is suitable for the primary stage of a counting system and can fully support storage services with moderate visitation and storage volumes. If your project is still in the primary stage and the scale is not large, you can consider using this solution at the beginning.

By modifying the original Redis component, we can greatly reduce the memory overhead of stored data.

The solution of using SSD+ memory can ultimately solve the cost issue of storing counting data. This approach is suitable for scenarios with obvious hot and cold data, and you need to consider how to swap data in and out of memory when using it.

In fact, with the development of internet technology, more and more business scenarios require using hundreds of gigabytes or even hundreds of terabytes of memory resources to store business data. However, there is not as much demand for performance or latency. If we were to use all memory to store data, it would undoubtedly lead to significant cost wastage. Therefore, in the industry, there are some open-source components that also support using SSDs instead of memory to store cold data, such as Pika and SSDB. I recommend that you take a look at their implementation principles so that you can use them when needed in your project. In addition, a similar approach is also adopted in Weibo’s counting service. If your business also needs to use a large amount of memory to store data with obvious hotspots, you may consider using a similar approach.