11 Load Balancing Why Is There Such a Big Discrepancy in the Load Received by the Node

11 Load Balancing - Why is there such a big discrepancy in the load received by the node #

Hello, I’m He Xiaofeng. In the previous lecture, I explained “routing selection for multiple scenarios,” with the core being “how to choose the appropriate target machine based on different scenarios.” Today, let’s talk about a new topic and see how load balancing is implemented in RPC.

A Requirement #

Before we dive into the topic, I’d like to share a requirement with you that was raised by our company’s business department.

The problem they encountered was this: during a traffic peak, they suddenly noticed a decrease in the availability of the online service. After investigation, it was discovered that a few of the machines were quite old. The initial batch of container configurations had low settings, and when scaling down, a few machines were left behind. When the traffic reached its peak, these containers couldn’t handle the high load. The business department asked if we had any good service governance strategies.

This problem can actually be easily solved. The solution we initially proposed was to lower the weight of these machines on the governance platform. This would naturally reduce the amount of traffic routed to them.

However, the business department provided additional feedback, stating that by the time they discovered the decrease in service availability, the business requests had already been affected. In this case, applying the solution would still require time, during which the business might have already suffered losses. They then raised a requirement, asking if there is any intelligent load balancing mechanism in the RPC framework that can automatically control the amount of requests received by service nodes in a timely manner.

This requirement is actually very reasonable and a common problem. While our service governance platform can dynamically control the amount of traffic received by online service nodes, adjusting the weights of nodes after the business side notices high load or slow response is likely to have already impacted the availability of the online service.

Having read this, have you come up with any good solutions? Next, let’s explore the topic of load balancing in RPC frameworks based on this problem.

What is Load Balancing? #

Let me briefly introduce load balancing. When a single service node is unable to handle the current volume of traffic, we deploy multiple nodes to form a cluster. Then, through load balancing, requests are distributed to each service node in the cluster, so that the load is shared among multiple service nodes.

Load balancing can be divided into software load balancing and hardware load balancing. Software load balancing involves installing load balancing software on one or more servers, such as LVS, Nginx, etc. Hardware load balancing, on the other hand, is achieved through hardware devices, such as F5 servers. Load balancing algorithms include random selection, round-robin, and least connections.

The load balancing I just introduced is mainly applied to web services. The domain name of the web service is bound to the load balancing address, and the load balancer distributes user requests to the backend services.

Load Balancing in RPC Frameworks #

Is load balancing in RPC frameworks the same as what we discussed earlier? Do you think there are any differences?

Let’s recall the beginning of Lesson 08: I explained why we do not use DNS for “service discovery” and why we do not use load balancing devices or TCP/IP layer 4 proxies with domain binding to achieve load balancing.

My answer was that this approach faces several issues:

Setting up load balancing devices or TCP/IP layer 4 proxies requires additional costs.
All request traffic goes through the load balancing devices, resulting in an additional network transmission and performance overhead.
Adding or removing nodes to the load balancer usually requires manual operation, which becomes problematic when scaling up or down in large volumes. “Service discovery” poses challenges in terms of operations.
For service governance, load balancing strategies may need to be adaptable for different interface services or service groups. If everyone goes through a single load balancer, it becomes difficult to configure different load balancing strategies for different scenarios.

By now, I believe you should already understand the differences between load balancing in RPC implementations and load balancing in traditional web services.

Load balancing in RPC is completely implemented by the RPC framework itself. The RPC service caller establishes long connections with all service nodes provided by the “registry center”. Each time an RPC call is made, the service caller autonomously selects a service node through the configured load balancing plugin and initiates the RPC call request.

RPC load balancing strategies generally include random weight, hash, and round-robin. Of course, this mainly depends on the implementation of the RPC framework. Among these strategies, random weight is probably the most commonly used. Through random algorithms, we can ensure that the request traffic received by each node is evenly distributed. Additionally, we can control the traffic flow by adjusting the weight of the nodes. For example, we can set the weight of one node to 50 while keeping the default weight of other nodes as 100, so that it receives half of the traffic compared to the other nodes.

The implementation of these load balancing algorithms is relatively simple, and there is a lot of information available online. I won’t go into too much detail here. If you have any questions, we can discuss them in the comments section.

Since the load balancing mechanism is fully implemented by the RPC framework itself, it no longer depends on any load balancing devices. Therefore, the issue of single point failures in load balancing devices does not arise. The load balancing strategy for the service caller can be fully configurable, and we can govern the load balancing by controlling the weights.

Now that we understand load balancing in RPC frameworks, we can return to the initial requirement mentioned at the beginning of this lesson: Is there any way to dynamically and intelligently control the request traffic received by online service nodes?

Now the answer seems obvious. The key to solving this problem lies in the load balancing of the RPC framework. Our solution to this problem at the time was to design an adaptive load balancing strategy.

How to design an adaptive load balancing system? #

As I mentioned earlier, the load balancing in RPC is fully implemented by the RPC framework itself. When the service consumer initiates a request, it autonomously selects a service node through the configured load balancing plugin. So, does that mean as long as the caller knows the processing capacity of each service node, it can determine how much traffic to send to it based on its processing capacity? When a service node is overloaded or responds slowly, we send fewer requests to it, and vice versa, we send more requests to it.

This is somewhat similar to allocating tasks in our daily work, where we should consider the actual situation. When one subordinate has poor health, we assign them less work. If another subordinate happens to be in good condition and not overloaded with tasks, we give them a bit more work.

So how can the service consumer node determine the processing capacity of a service node?

Here, we can use a scoring strategy. The service consumer collects metric data from each service node it establishes a long connection with, such as the node’s load metrics, CPU cores, memory size, and request processing time metrics (such as average request time, TP99, TP999), as well as the node’s status metrics (such as normal, sub-health). Using these metrics, a score can be calculated. For example, if the CPU load reaches 70%, we can deduct 3 points from its score. Of course, deducting 3 points is just an analogy, the actual deduction will depend on a specific calculation strategy.

So how should we score based on these metrics?

This is somewhat similar to a company’s year-end performance evaluation for its employees. Let’s assume I’m the boss, and I want to evaluate professional abilities, communication skills, and work attitude, with weights of 30%, 30%, and 40%, respectively. If I give an employee a score of 10, 8, 8, then their overall score will be calculated like this: 10 * 30% + 8 * 30% + 8 * 40% = 8.6 points.

Scoring the service nodes is the same. We can set a weight for each metric and then calculate the score based on these metric data.

After the service consumer scores each service node, it will send requests. How should we control the amount of traffic sent to each service node based on the scores?

We can use a load balancing strategy with random weighting to control the weights of the service nodes based on the final scores. For example, if a service node has a score of 8 (out of 10), and its weight is 100, then its final weight after calculation will be 80 (100 * 80%). When the service consumer sends a request, it will choose a service node based on the random weights strategy. The node receiving the traffic will then receive 80% of the traffic compared to other normal nodes (assuming other nodes have a default weight of 100 and a score of 10).

With this, we have completed the design of an adaptive load balancing system. The overall design is shown in the following diagram:

Let me explain the key steps:

Add service metric collectors as plugins. By default, there are runtime status metric collectors and request duration metric collectors.
The runtime status metric collector collects metrics such as CPU cores, CPU load, and memory usage from the service nodes, obtained from heartbeat data between the service consumer and provider.
The request duration metric collector collects request duration data, such as average request time, TP99, TP999, etc.
You can configure which metric collectors to enable and set the metric weights for these reference metrics. Then, based on the metric data and weights, calculate the overall score.
Based on the service node’s overall score and its weight, calculate the final weight of the node. The service consumer will then choose a service node based on the random weight strategy.

Summary #

Today we have discussed in detail the load balancing of RPC frameworks. The difference between RPC load balancing and Web service load balancing is that RPC frameworks do not rely on a load balancing device or server to achieve load balancing, but rather implement it within the RPC framework itself. The service invoker can independently choose the service node to initiate the service call.

The advantage of this approach is that the RPC framework no longer needs to rely on dedicated load balancing devices, reducing costs. It also reduces additional network transmission between the RPC framework and the load balancing device, improving transmission efficiency. Additionally, the load balancing strategy can be configured, making service governance easier.

In addition, today’s focus was on “how to design an adaptive load balancing system.” Through this system, we can intelligently control the request traffic sent to each service node based on the current state of each node in the service cluster that the service invoker depends on. This prevents high loads or slow request processing on a specific service node from affecting the overall availability of the service cluster.

This adaptive load balancing implementation is not only applicable to RPC load balancing, but can also serve as a solution for intelligent load balancing. If you need to design an intelligent load balancing service in your work, you can refer to this solution.

Post-class Reflection #

Do you know what other load balancing strategies are available in RPC frameworks? What are their advantages and disadvantages? I look forward to you sharing your implementation methods in the comments section and discussing together.

Feel free to share this article with your friends and invite them to join the learning. See you next class!