42 How to Deal With Sudden Traffic Surges, Overflow, and Throttling

42 How to Deal with Sudden Traffic Surges, Overflow, and Throttling #

Hello, I’m Wen Ming.

In the previous lessons, we have learned about code optimization and cache design, both of which are closely related to the overall performance of an application and deserve our attention. However, in real-world business scenarios, we also need to consider the impact of surging traffic on performance. Surging traffic can be normal, such as traffic generated by breaking news or promotional activities, or it can be abnormal, such as DDoS attacks.

OpenResty is mainly used as a front-end web application for access layer, such as WAF and API gateway, all of which need to deal with both normal and abnormal surging traffic just mentioned. After all, if surging traffic cannot be handled properly, the backend services are prone to collapse, and the business cannot respond normally. Therefore, today we will specifically discuss methods to deal with surging traffic.

Traffic Control #

Traffic control is one of the essential functions of both WAF and API gateways. It directs and controls incoming traffic through various algorithms to ensure the normal operation of upstream services, thus maintaining the overall health of the system.

Because the processing capacity of the backend is limited, we need to consider various aspects such as cost, user experience, and system stability. Regardless of the algorithm used, it inevitably slows down or even rejects normal user requests, sacrificing part of the user experience. Therefore, balancing between business stability and user experience is necessary for traffic control.

In fact, traffic control is also common in real life. For example, during peak periods such as the Spring Festival travel rush, subway stations, train stations, airports, and other transportation hubs have a limited processing capacity. Therefore, in order to ensure the safe operation of transportation, passengers need to queue and enter in batches.

This naturally affects the passenger experience, but overall, it ensures the efficient and safe operation of the system. If there is no queuing and batching, and everyone rushes into the station, the final result is likely to be a complete system breakdown.

Returning to the technical aspect, let’s take an example. Let’s assume that the design limit of an upstream service is to process 10,000 requests per minute. During peak periods, if there is no flow control at the entrance, and the accumulated tasks reach 20,000 per minute, the processing performance of this upstream service will decline. It may only have a processing speed of 5,000 requests per minute and continue to deteriorate, possibly resulting in service unavailability. Obviously, this is not the result we want to see.

To cope with such sudden traffic, the commonly used traffic control algorithms are leaky bucket and token bucket.

Leaky Bucket Algorithm #

Let’s start by taking a look at the Leaky Bucket Algorithm. Its purpose is to maintain a constant rate of requests and smooth out bursts of traffic. But how does it achieve this?

Let’s examine the following conceptual diagram from the Wikipedia page on the Leaky Bucket Algorithm:

We can imagine the client’s traffic as water flowing out from a pipe, with an uncertain flow rate. The outer traffic processing module is like a bucket that collects the water, and there is a hole at the bottom of this bucket for the water to leak through. This is actually how the Leaky Bucket Algorithm got its name. Clearly, this algorithm has several advantages.

Firstly, it ensures that the rate at which water flows out of the bucket remains constant, regardless of whether the incoming water is a trickle or a deluge. This stable flow rate is friendly to upstream services and is the essence of traffic shaping.

Secondly, the bucket itself has a certain capacity and can accumulate some water to wait for it to flow out. This is equivalent to queuing for requests from the end user if they cannot be processed immediately.

Thirdly, water that exceeds the capacity of the bucket is not collected by the bucket but allowed to flow away directly. This corresponds to returning a failure message to the client if the number of requests exceeds the length of the queue. At this point, the server is overwhelmed, and naturally, there is no need to even put the request in the queue.

After discussing all these advantages, how can we implement this algorithm? Let’s take the resty.limit.req library that comes with OpenResty as an example, which implements the Leaky Bucket Algorithm for rate limiting. I will introduce more details in the next lesson. Today, let’s just get a brief understanding. The following are some key lines of code from the library:

local elapsed = now - tonumber(rec.last)
excess = max(tonumber(rec.excess) - rate * abs(elapsed) / 1000 + 1000,0)
if excess > self.burst then
    return nil, "rejected"
end
-- return the delay in seconds, as well as excess
return excess / rate, excess / 1000

Let me briefly explain these lines of code. Among them, elapsed is the number of milliseconds between the current request and the previous request, and rate represents the rate we set in requests per second. Since the smallest unit of rate is 0.001 s/r, it needs to be multiplied by 1000 for calculation in the above code implementation.

excess represents the number of requests still in the queue. If it is 0, it means the bucket is empty and there are no requests in the queue. burst refers to the capacity of the entire bucket. If excess is already greater than burst, it means the bucket is full, and any additional incoming traffic will be directly discarded. If excess is greater than 0 and less than burst, it enters the queue and is waiting to be processed. The excess / rate that is returned in the end represents the waiting time.

In this way, with the same processing capacity of the backend service, we can control the queuing time for bursts of traffic by adjusting the size of burst. Whether to directly inform the user that the current request volume is too large and to retry later, or to make the user wait for a longer period of time, depends on your business scenario.

Token Bucket Algorithm #

The purpose of the Token Bucket Algorithm is the same as the Leaky Bucket Algorithm - to ensure that backend services are not overwhelmed by sudden bursts of traffic. However, the implementation of these two algorithms is different.

In the Leaky Bucket Algorithm, we usually use the terminal IP as the key to perform rate limiting and throttling. This means that for each terminal user, the exit rate of the Leaky Bucket Algorithm is fixed. However, this creates a problem:

If the request frequency of user A is high while the request frequency of other users is low, even if the overall service load is not high at this time, the Leaky Bucket Algorithm may slow down or reject some of user A’s requests, even though the service can actually handle them.

This is where the Token Bucket Algorithm comes in.

While the Leaky Bucket Algorithm focuses on smooth traffic, the Token Bucket Algorithm allows for bursts of traffic to enter the backend service. The principle of the Token Bucket Algorithm is to put tokens into a bucket at a fixed rate. As long as the bucket is not full, tokens will continue to be put in. In this way, incoming requests from terminals need to fetch tokens from the token bucket before they can be processed by the backend. If there are no tokens in the bucket, the request will be rejected.

However, the built-in rate limiting and throttling libraries provided by OpenResty do not implement the Token Bucket Algorithm. So here, I will give you a simple introduction using the code of the open-source rate limiting module lua-resty-limit-rate by Upyun:

local limit_rate = require "resty.limit.rate"

-- global 20r/s 6000r/5m
local lim_global = limit_rate.new("my_limit_rate_store", 100, 6000, 2)

-- single 2r/s 600r/5m
local lim_single = limit_rate.new("my_limit_rate_store", 500, 600, 1)

local t0, err = lim_global:take_available("__global__", 1)
local t1, err = lim_single:take_available(ngx.var.arg_userid, 1)

if t0 == 1 then
    return -- global bucket is not hungry
else
    if t1 == 1 then
        return -- single bucket is not hungry
    else
        return ngx.exit(503)
    end
end

In this code snippet, we set up two token buckets: one is the global token bucket, and the other is a token bucket divided based on the user with the key ngx.var.arg_userid. By combining two token buckets in this way, we gain the following benefits:

  • If the global token bucket still has tokens available, we don’t need to check the user-specific token bucket. If the backend service is functioning properly, we can serve as many burst requests from the users as possible.
  • If the global token bucket runs out of tokens, we can’t reject requests indiscriminately. In this case, we need to check the token bucket for each individual user and reject requests from users with a high frequency of burst requests. This ensures that requests from other users are not affected.

Clearly, compared to the Leaky Bucket Algorithm, the Token Bucket Algorithm is more flexible and allows for bursts of traffic to pass through to the backend service. Of course, each algorithm has its advantages and disadvantages, so you can choose which one to use based on your own situation.

Nginx’s Rate Limiting Module #

After talking about these two algorithms, let’s take a look at how rate limiting is implemented in the familiar Nginx. In Nginx, the most commonly used rate limiting module is the limit_req module. Here is a simple configuration:

limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s;

server {
    location /search/ {
        limit_req zone=one burst=5;
    }
}

In this code snippet, the client’s IP address is used as the key, and a 10MB memory space address named one is allocated, with the rate limited to 1 request per second.

In the location block of the server configuration, the one rate limiting rule is referenced, with a burst set to 5. This means that beyond the rate of 1 request per second, up to 5 requests are allowed to queue and wait for processing, providing a certain buffer zone. It is important to note that if burst is not set, requests that exceed the rate limit will be directly rejected.

Nginx’s module is implemented based on a leaky bucket algorithm, so it is essentially the same as the resty.limit.req module we introduced earlier in OpenResty.

Closing Remarks #

In fact, the biggest problem with setting up rate limiting and throttling in Nginx is the inability to dynamically modify it. After all, the changes made to the configuration file would require a restart to take effect, which is obviously unacceptable in a rapidly changing environment. In the next class, we will explore how to implement dynamic rate limiting and throttling in OpenResty.

Lastly, I’ll leave you with a thought-provoking question. Looking at it from the perspective of a WAF (Web Application Firewall) and API gateway, are there better methods for distinguishing between legitimate and malicious requests? This is because for sudden bursts of traffic from legitimate users, we can quickly scale up the backend services to enhance their capacity; whereas for malicious requests, it would be best to deny them outright at the entry layer.

I hope you can carefully ponder over this question and leave a comment to discuss it with me. Feel free to share this article with your colleagues and friends, so that we can learn and progress together.