35 Traffic Control How Do We Manipulate Traffic in a High Concurrency System

35 Traffic Control - How Do We Manipulate Traffic in a High-Concurrency System #

Hello, I’m Tang Yang.

In the previous lesson, I introduced you to two common lossy service protection strategies in microservices architecture: circuit breaking and degradation. They both protect the availability of core systems by temporarily shutting down certain non-core services or components. However, not all scenarios can use circuit breaking and degradation strategies, such as the peak sales periods like Double Eleven (November 11th) and June 18th in e-commerce systems.

In these scenarios, the peak traffic of the system will exceed the estimated peak, which also has a significant impact on the core services. You can’t simply degrade the core services as a whole. So how do you ensure the stability of the services in such cases? You may consider using traffic throttling. And when it comes to throttling, I believe you may have made some mistakes in the following areas:

Choosing inappropriate throttling algorithms resulting in ineffective throttling;

Enabling throttling but finding that overall performance is compromised;

Implementing throttling only on a single machine without applying it to the entire system.

To put it simply, the reason why you encounter these problems is that you are not proficient in throttling algorithms and their practical applications. In this lesson, I will guide you through these topics, hoping that you can apply this knowledge to practical projects and improve the robustness of the entire system.

What is Flow Control #

Flow control refers to limiting the number of concurrent requests reaching a system to ensure that the system can respond to a portion of user requests while denying service to traffic that exceeds the limit in order to ensure the overall availability of the system. Flow control strategies are generally deployed at the entry layer of a service, such as an API gateway, to shape the overall system traffic. In a microservices architecture, you can also introduce flow control strategies in RPC clients to ensure that individual services are not overwhelmed by excessive traffic.

In fact, you may have applied flow control strategies in both work and study. Let me give you a few examples.

For example, during the National Day holiday, you want to visit Jiuzhaigou, but when you arrive, you find out that there is a temporary notice that only 100,000 tickets are sold daily, and tourists who fail to get a ticket that day can only try again early the next day. This is a common flow control strategy, which controls the flow within a certain period of time (in this case, one day) to avoid overcrowding in the scenic area and ensure the safety of tourists. And if you have squeezed onto a subway during rush hour, you can relate to this even more. The subway in Beijing during the morning rush hour implements flow control, which directly controls the number of people entering the subway to prevent overcrowding and ensure people’s safety.

Another example is the sliding window concept in the TCP protocol, which can control network transmission flow. Imagine if there is no flow control, when the receiving end processes data at a slower speed and the sending end continues to send data at the previous rate, it will inevitably lead to congestion. The sliding window in TCP can be understood as the size of the buffer that the receiving end can provide.

In the ACK message sent by the receiving end to the sending end, the size of this window is included. In this way, the sending end can determine the rate of sending data based on the size of this sliding window. If the receiving end processes some data in the buffer, the sliding window will increase, and the sending end will increase the rate of sending data; conversely, if the receiving end receives some data but has not had time to process it, the sliding window will decrease, and the sending end will slow down the rate of sending data.

Regardless of whether it is in an integrated architecture or a microservice architecture, there are multiple dimensions in which we can control the flow of incoming traffic to the system, including:

You can limit the number of requests processed by the system per minute.

You can set a limit on the request flow per minute for individual interfaces.

You can restrict the number of requests sent by a single IP, user ID, or device ID within a certain period of time.

For an open platform serving multiple third-party applications, each third-party application is uniquely identified by an app key, so you can also limit the access rate of individual app keys to the interface.

To implement the above rate limit, it is based on some flow control algorithms. So what are the common flow control algorithms, and what are the ways you implement flow control?

Rate Limiting Algorithms You Should Know #

Fixed Window and Sliding Window Algorithms #

We know that the purpose of rate limiting is to restrict the overall number of requests sent to a system within a certain time period. For example, if we want to limit the system to handling only 10,000 requests in one minute, the most straightforward way is to record the total number of requests made to the system within that minute. If the number exceeds the limit of 10,000, then the rate limiting strategy triggers and returns a request failure error. If the number of requests in that minute does not exceed the limit, then the request count is reset when the next minute arrives, and the number of requests in that minute is again counted to check for compliance.

This algorithm is called the fixed window algorithm. When implementing it, the first step is to start a timer that periodically resets the request count. For example, if you want to limit the number of accesses per second, the implementation code would be as follows:

private AtomicInteger counter;

ScheduledExecutorService timer = Executors.newSingleThreadScheduledExecutor();

timer.scheduleAtFixedRate(new Runnable(){

    @Override

    public void run() {

        counter.set(0);

    }

}, 0, 1, TimeUnit.SECONDS);

The rate limiting logic is very simple. You only need to compare the count value to the threshold:

public boolean isRateLimit() {

  return counter.incrementAndGet() >= allowedLimit;

}

Although this algorithm is very simple to implement, it has a major flaw: It cannot effectively limit high concentrations of traffic within a short period of time. For example, if we need to limit the system to processing only 10 requests per second, and in the previous second, all 10 requests were made within the last 10 milliseconds, and in the next second, 10 requests were made in the first 10 milliseconds, then a total of 20 requests were made within 20 milliseconds, exceeding the rate limit threshold. However, because these 20 requests are distributed across two time windows, the rate limiting is not triggered, resulting in the rate limiting strategy not being effective.

To solve this defect, we have the algorithm based on the sliding window. The principle of this algorithm is to divide the time window into multiple small windows, each of which has a separate request count. For example, in the figure below, we divide the 1s time window into 5 parts, each part being 200ms; so when a new request comes between 1s and 1.2s, we need to count the number of requests in the previous one second, which is the total number of requests in the range of 0.2s to 1.2s. If the number of requests exceeds the rate limit threshold, the flow control strategy will be executed.

The sliding window algorithm solves the problem of uncontrollable burst traffic at critical time points, but it increases the space complexity because it needs to store the count in each small time window.

Although the sliding window algorithm solves the problem of high traffic at window boundaries, like the fixed window algorithm, it still cannot limit the concentrated traffic in a short period of time, that is, it cannot smooth the traffic. Therefore, in actual projects, I rarely use time window-based rate limiting algorithms, but instead use other rate limiting algorithms: one is called the leaky bucket algorithm, and the other is called the token bucket algorithm.

Leaky Bucket Algorithm and Token Bucket Algorithm #

The principle of the leaky bucket algorithm is simple. It is like adding a leaky bucket between the traffic generator and the receiver. The traffic enters and is stored in the bucket, and the outlet of the leaky bucket will leak the traffic to the receiver (i.e., the service interface) at a fixed rate.

If the incoming traffic increases rapidly within a certain period of time and exceeds the capacity of the leaky bucket, the excess traffic will trigger the flow control strategy and be rejected.

After applying the leaky bucket algorithm, the randomly generated traffic will be shaped into smoother traffic before reaching the server, thus avoiding the impact of sudden high traffic on the service interface. This is similar to the mantra of the Nine Yang Manual in the novel “The Heaven Sword and Dragon Saber”: “He is strong because he is strong, the clear wind brushes the mountain ridge; he is horizontal because he is horizontal, the bright moon shines on the great river.” In other words, regardless of how strong or irregular the incoming traffic is, after being processed by the leaky bucket, the outgoing traffic will become relatively smooth.

In implementation, we generally use message queues as the implementation of the leaky bucket. The traffic is first put into the message queue for queuing, and then consumed by a fixed number of queue handlers. If the traffic in the message queue overflows, the subsequent traffic will be rejected. Is the thinking behind this algorithm similar to smoothing peaks and filling valleys with message queues?

The basic algorithm of the token bucket algorithm is as follows:

If we need to limit the number of accesses to N within one second, then every 1/N time, we put a token into the bucket.

Before processing a request, we need to obtain a token from the bucket. If there are no more tokens in the bucket, we need to wait for new tokens or directly reject the service.

The total number of tokens in the bucket also needs to have a limit. If it exceeds the limit, no more tokens can be added to the bucket. This way, the total number of tokens can be limited, which can to a certain extent avoid the problem of instantaneous peak traffic.

If we have to choose between these two algorithms, I prefer to use the token bucket algorithm. The reason is that when facing burst traffic, the leaky bucket algorithm solves it by buffering in the leaky bucket. This increases the response time of the traffic, which is not in line with the low-latency requirements of Internet business. On the other hand, the token bucket algorithm can temporarily store a certain amount of tokens in the bucket and can cope with a certain amount of burst traffic. So generally I would use the token bucket algorithm to implement rate limiting solutions. The rate limiting solution in Guava is also implemented using the token bucket algorithm.

As you can see, using the token bucket algorithm requires storing the number of tokens. If rate limiting is implemented on a single machine, we can use a variable in the process to store it. However, in a distributed environment, different machines cannot share variables in the process. In this case, we generally use Redis to store the number of tokens. In this way, every time a request is made, we need to request Redis to get a token, which increases the delay by a few milliseconds and incurs some performance overhead. **Therefore, a compromise is to retrieve a batch of tokens instead of just one token each time when fetching tokens. This can minimize the number of requests to Redis.

Summary of the Course #

The above is the complete content of this lesson. In this lesson, I introduced you to the definition and function of rate limiting, as well as several common rate limiting algorithms. The key points you need to know are:

Rate limiting is a common service protection strategy, which allows you to control traffic on multiple dimensions, such as overall service, individual service, individual interface, individual IP, or individual user.

Algorithms based on the time window dimension include fixed window algorithm and sliding window algorithm. Although both can achieve a certain degree of rate limiting, they cannot make the traffic smoother.

Token bucket algorithm and leaky bucket algorithm are able to shape traffic and make it smoother. However, the token bucket algorithm can also handle a certain amount of burst traffic, so it is more commonly used in actual projects.

Rate limiting strategy is a standard strategy in microservice governance. However, it is difficult to determine the threshold for rate limiting in practice. Setting it too low can result in false positives for normal requests, while setting it too high cannot achieve the purpose of rate limiting. Therefore, in actual projects, we usually put the threshold in the configuration center for easy dynamic adjustment. At the same time, we can obtain the actual capacity of the overall system and each microservice through regular stress testing, and then set appropriate thresholds based on these stress test results.