17 How to Handle Thousands of Order Requests per Second in a Message Queue

17 How to Handle Thousands of Order Requests Per Second in a Message Queue #

Hello, I’m Tang Yang.

At the beginning of the course, I introduced the three goals of high-concurrency system design: performance, availability, and scalability. When it comes to improving system performance, we have always focused on query performance. We have also spent a lot of time explaining the distributed transformation of databases, the principles and usage techniques of various caches. The reason behind this is that in most of the scenarios we encounter, there are more reads than writes, especially in the early stages of a system.

For example, in the initial stage of a social media system, there will only be a few seed users generating content, while most users are “observing” what others are saying. At this time, the overall traffic is relatively small, and write traffic may only account for one percent of the overall traffic. So even if the overall QPS reaches 10,000 requests per second, the write requests would only be 100 requests per second. The cost-effectiveness of optimizing write requests for performance is indeed not very high.

However, as the business develops, you may encounter scenarios with high-concurrency write requests. Flash sales is the most typical scenario. Let’s say your e-commerce platform is planning a flash sale event that starts at 00:00 on the fifth day and is limited to the first 200 customers. When the flash sale is about to start, users will frantically refresh the app or browser to ensure they can see the products as early as possible.

At this point, you are still facing a high load of read requests. What measures can you take to deal with it?

Since the users are querying for a small amount of product data, which are hot data in terms of queries, you can use caching strategies to try to intercept the requests in the upper-layer cache. Data that can be cached statically, such as images and videos in the online store, can be made static to be cached by CDN nodes, reducing the query load and bandwidth burden on the web servers. The web server, such as Nginx, can directly access the distributed cache nodes, avoiding the need to forward the requests to business servers like Tomcat.

Of course, you can also implement some rate limiting strategies. For example, you can discard duplicate requests from a single user, a single IP, or a single device within a short period of time.

With these methods, you will find that you can prevent requests from reaching the database as much as possible.

After alleviating the read requests to some extent, at exactly 00:00, the flash sale event starts and users instantly send requests to the e-commerce system to generate orders and deduct inventory. These write operations from users do not go through the cache and directly access the database. Within one second, there are 10,000 database connections simultaneously reaching the database, causing the database to be on the verge of crashing. Finding a solution to handle such high-concurrency write requests becomes urgent. This is when you think of message queues.

My Understanding of Message Queues #

You may already have some understanding of what message queues are, so I won’t focus on explaining the concept. Instead, I’d like to share my own perspective on message queues. In my years of work experience, I have always seen message queues as a container for temporarily storing data. I believe that message queues are a tool for balancing the time difference between low-speed and high-speed systems. Let me give you an illustrative example.

For example, in ancient times, officials often went to meet the emperor to discuss national affairs, waiting for the emperor to make decisions. But there were many officials, and if they all went to the emperor at the same time and talked one after another, the emperor would surely be overwhelmed. Later, it became a practice for officials to wait at the Meridian Gate after arriving, and the emperor would summon them one by one into the court to discuss state affairs. This way, it relieved the pressure on the emperor in handling matters. You can think of the Meridian Gate as a container that temporarily accommodates officials, which is what we call a message queue.

In fact, you can see the presence of message queues in some components:

In Java thread pools, we use a queue to temporarily store submitted tasks, waiting for available threads to process these tasks.

In operating systems, the bottom half of interrupts also use work queues to implement deferred execution.

When implementing an RPC framework, we also write requests received from the network into a queue and start multiple worker threads to process them.

…

In summary, queues are a commonly used component in system design.

So how can we use message queues to solve problems in a flash sale scenario? Next, let’s look at the role that message queues play in a flash sale scenario with a specific example.

Reducing Peak Write Traffic in Seckill Scenes #

As mentioned earlier, in seckill scenes, the database experiences a high volume of write traffic within a short period of time. According to our previous approach, we should shard the data across multiple databases. If sharding has already been implemented, then more databases need to be added to handle the increased write traffic. However, both sharding and adding more databases can be quite complex because data migration needs to be performed, which can take days or even weeks.

In seckill scenes, high-concurrency write requests are not continuous and do not occur frequently; they only exist for a few seconds or tens of seconds after the seckill activity starts. In order to cope with this momentary peak of write traffic, it would take several days or even weeks to scale up the database, and then several more days to scale it back down after the seckill is over, which is clearly not cost-effective.

Therefore, our approach is to temporarily store seckill requests in a message queue, and the business server will respond to users with a message indicating that the “seckill results are being calculated” in order to free up system resources for processing other user requests.

We will start several queue processing programs in the background to consume messages from the message queue and execute logic such as verifying inventory and placing orders. Because only a limited number of queue processing threads are executing, the concurrent requests hitting the backend database are also limited. Moreover, requests can be temporarily accumulated in the message queue. Once the inventory is depleted, the accumulated requests in the message queue can be discarded.

This is the main role of a message queue in a seckill system: flattening the traffic peaks, which means it can smooth out short-term spikes in traffic. Although the accumulation of requests can cause a slight delay in processing, as long as we constantly monitor the accumulation length in the message queue, we can increase the number of queue processing machines when the accumulation exceeds a certain threshold to improve the processing capability of the messages. Moreover, seckill users have a certain tolerance for a brief delay in knowing the seckill results.

Here, it is important to note that I am talking about a “brief” delay. If the seckill results are not announced to users for a long time, they may suspect that something is suspicious about your seckill activity. Therefore, when using a message queue to handle traffic peaks, it is necessary to carefully evaluate the time taken for queue processing, the size of the write traffic at the frontend, and the database processing capacity. Then, based on different levels, decide how many queue processing programs should be deployed.

For example, if you have 1000 seckill items and the processing time for one purchase request is 500ms, it would take 500s in total. In this case, if you deploy 10 queue processing programs, the processing time for seckill requests would be 50s, which means users would need to wait 50s to see the seckill result, which is acceptable. At this time, 10 requests would concurrently reach the database, but this would not impose a significant burden on the database.

Simplify the business process in spike requests through asynchronous processing #

In fact, when your e-commerce system is “attacked” by a large number of write requests, apart from playing the main role of peak shaving in message queues, asynchronous processing can also simplify the business process in spike requests and improve system performance.

Think about it, in the spike scenario mentioned earlier, it takes 500ms to process a purchase request. At this point, you analyze the entire purchase process and find that there are major and minor business logics involved: for example, the main process is to generate an order and deduct inventory; the minor process may be to issue coupons and increase user points after the order is successfully placed.

If the time required for issuing coupons is 50ms and the time required for increasing user points is also 50ms, then if we put the operations of issuing coupons and increasing points in a separate queue processor, the entire process is reduced to 400ms, and the performance is improved by 20%. The time to process 1000 items becomes 400s. If we still want to see the spike result within 50s, we just need to deploy 8 queue programs.

After asynchronously processing some business processes, the deployment structure of our spike system will also change:

Implementing Loose Coupling in the Seckill System Module #

In addition to asynchronous processing and peak shaving, another function of message queues in the seckill system is to achieve loose coupling.

For example, the data team told you that they would like to analyze the data of the seckill activity after it ends, in order to evaluate the popularity of the products, characteristics of the buyers, and user satisfaction with the seckill interaction, among other indicators. We need to send a large amount of data to the data team. So how do we do it?

One idea is: we can use HTTP or RPC to make synchronous calls. In other words, the data team provides an interface that we can use to push real-time seckill data to them. However, this approach has two problems:

The overall coupling of the system is high. When the interface of the data team fails, it affects the availability of the seckill system.

When the data system requires new fields, the interface parameters need to be modified, which means the seckill system also needs to be modified accordingly.

In this case, we can consider using message queues to reduce the direct coupling between the business system and the data system.

After the seckill system generates an entry of purchase data, we can first send all the data to the message queue. Then the data team can subscribe to this topic of the message queue, so that they can receive the data and perform filtering and processing.

After decoupling the seckill system in this way, the failure of the data system will not affect the seckill system, and when the data system requires new fields, it only needs to parse the messages in the message queue to obtain the required data.

Asynchronous processing, decoupling, and peak shaving are the main functions of message queues in the design of the seckill system. Among them, asynchronous processing simplifies the steps in the business process and improves system performance; peak shaving helps remove peak traffic reaching the seckill system, making the processing of business logic smoother; and decoupling allows the seckill system and the data system to operate independently, so that any change in one system will not affect the other.

If your system wants to improve write performance, achieve low coupling, and withstand high concurrent write traffic, then you can consider using message queues to achieve that.

Summary of the Course #

In this class, I combined my own practical experience to mainly introduce you to the role of message queues in the design of highly concurrent systems, as well as some considerations. The key points you need to understand are as follows:

Peak shaving and valley filling is the main function of message queues, but it can cause delays in request processing.

Asynchronous processing is a magic tool for improving system performance, but you need to distinguish between synchronous and asynchronous processes. At the same time, there is a risk of message loss, so we need to consider how to ensure that messages arrive.

Decoupling can improve the robustness of your overall system.

Of course, you need to know that although message queues can solve existing problems, the complexity of the system will also increase. For example, in the business process mentioned above, where is the boundary between synchronous and asynchronous processes? Will messages be lost or duplicated? How can we reduce the delay in requests? Will the order of message reception affect the normal execution of the business process? Do we need to resend messages if the message processing process fails? These are all issues we need to consider. In the next two classes, I will discuss the solutions to the two main issues: how to handle message loss and duplication, and how to reduce message delay.

Introducing message queues also introduces new problems and requires new solutions. This is the challenge of system design and the unique charm of system design. We will continuously improve our technical and system design capabilities in these challenges.