26 Load Balancing How to Increase the System's Horizontal Scaling Capability

26 Load Balancing - How to Increase the System’s Horizontal Scaling Capability #

Hello, I’m Tang Yang.

In the previous section, I mentioned three common approaches to designing high concurrency systems: caching, asynchronous processing, and horizontal scalability. So far, you’ve learned about caching techniques and how to use message queues for asynchronous processing. In this lesson, I will introduce how to improve the horizontal scalability of a system.

In the previous courses, I also mentioned some cases of improving horizontal scalability. For example, in Lesson 08, I talked about using multiple slave databases to improve the scalability of a database and enhance query performance. This requires using a component to distribute the database query requests to multiple slave databases based on certain strategies. This is where load balancer servers come into play, and we usually use DNS servers to fulfill this role.

However, in practical work, the load balancing component you often use is probably Nginx. Its role is to receive HTTP requests from the frontend and distribute them to multiple backend servers using various strategies. This way, we can handle sudden spikes in traffic by scaling up the backend servers as needed. Unlike DNS, Nginx can distribute traffic at both the domain and request URL levels and provides more advanced load balancing strategies.

You may also think of multiple service nodes in a microservices architecture, which serve as the entry point for requests from clients to application servers. Naturally, a load balancer server is needed to distribute the traffic. So how do we use load balancer servers in a microservices architecture?

Before answering these questions, let me first introduce the different types of commonly used load balancer servers, as this will help you make choices based on their characteristics.

Types of Load Balancing Servers #

The concept of load balancing is to distribute the load (requests) evenly across multiple processing nodes. This can reduce the load on individual processing nodes and improve the overall performance of the system.

At the same time, load balancing servers serve as traffic entry points and can shield the deployment details of service nodes from the requesters, achieving seamless scalability for business parties. It is like a traffic police officer that continuously directs traffic and guides cars onto the appropriate roads.

In my opinion, load balancing services can generally be divided into two categories: proxy-based load balancing services and client-based load balancing services.

Proxy-based load balancing services are deployed as separate services, and all requests must pass through the load balancing service. In the load balancing service, a suitable service node is selected, and the load balancing service then calls this service node to distribute the traffic.

Since this type of service needs to handle the entire load, the performance requirements are extremely high. There are many open-source implementations of proxy-based load balancing services, with famous examples including LVS and Nginx. LVS operates at the fourth layer of the OSI network model, the transport layer, so it can also be called a layer 4 load balancer; whereas Nginx runs at the seventh layer of the OSI network model, the application layer, so it can be called a layer 7 load balancer (you can review the content of lecture 02).

In the architecture of a project, we generally deploy LVS and Nginx together for load balancing HTTP application services. In other words, we deploy LVS at the entrance to distribute the traffic to multiple Nginx servers, which will then distribute the traffic to application servers. Why do we do this?

This is mainly due to the characteristics of LVS and Nginx. LVS forwards request packets at the network stack’s fourth layer, and after the packets are forwarded, the client and the backend service directly establish a connection. Subsequent response packets do not pass through the LVS server, so its performance compared to Nginx is higher, and it can handle higher concurrency.

However, LVS works at the fourth layer, while the concept of a request URL is at the seventh layer. This means that LVS cannot perform more precise request distribution based on the URL, and it does not provide a mechanism to detect the availability of backend services. On the other hand, although Nginx’s performance is much lower than LVS, it can still handle tens of thousands of requests per second, is more flexible in configuration, and can detect whether backend services are experiencing problems.

Therefore, LVS is suitable for distributing high-traffic requests at the entrance, while Nginx should be deployed before the business servers to perform more fine-grained request distribution. My suggestion is that if your QPS (queries per second) is below 100,000, you can consider directly using Nginx as the sole load balancing server, which will reduce the maintenance of one component and the overall maintenance cost of the system.

However, these two load balancing services are not suitable for microservices architecture. In microservices architecture, service nodes are stored in a service registry, and it is difficult to interact with the service registry using LVS. Additionally, microservices architecture typically uses the RPC protocol instead of the HTTP protocol, so Nginx cannot meet the requirements.

Therefore, we will use another type of load balancing service called client-based load balancing service, which embeds the load balancing service in the RPC client.

This type of service is generally deployed in the same process as the client application and provides multiple options for node selection strategies, ultimately providing the client application with the best available server node. This type of service is usually used in conjunction with a service registry, which provides a complete list of service nodes. After the client obtains the list, it uses the load balancing service’s strategy to select a suitable node and sends the request to that node.

Understanding the classification of load balancing services is the first step in learning about load balancing services. Next, you need to master load balancing strategies so that when you configure load balancing services in your actual work, you can have a deeper understanding of the principles.

Common Load Balancing Strategies #

Load balancing strategies can be broadly divided into two categories:

Static strategies: Load balancing servers do not consider the actual running status of the backend services when selecting service nodes.
Dynamic strategies: Load balancing servers decide which service node to select based on certain load characteristics of the backend services.

There are several common static strategies. Among them, Round Robin (RR) is the most widely used. This strategy keeps track of the last selected backend service address or index and requests the next backend service node according to the order of the service list. The pseudocode is as follows:

AtomicInteger lastCounter = getLastCounter(); // get the index of the last selected service node
List<String> serverList = getServerList(); // get the list of services
int currentIndex = lastCounter.addAndGet(); // increase the index
if (currentIndex >= serverList.size()) {
  currentIndex = 0;
}
setLastCounter(currentIndex);
return serverList.get(currentIndex);

Round Robin is actually a generic strategy that is supported by most load balancing servers. It can evenly distribute requests to all service nodes. However, it does not take into account the specific configurations of the service nodes. For example, if you have three service nodes, one with a configuration of 8 cores and 8GB memory, and the other two with configurations of 4 cores and 4GB memory each, using Round Robin to distribute requests equally will result in the 8-core 8GB node receiving the same number of requests as the 4-core 4GB nodes, thus not utilizing its performance advantage.

To address this issue, we can assign weights to the nodes. For example, we can assign a weight of 2 to the 8-core 8GB machine, which will result in it receiving double the traffic. This strategy is called Weighted Round Robin.

In addition to these two strategies, there are many other static strategies provided by open-source load balancing services:

Nginx provides the ip_hash and url_hash algorithms.
Linux Virtual Server (LVS) provides hash-based strategies based on the source and destination addresses of requests.
Dubbo provides random selection and consistent hashing strategies.

However, in my opinion, Round Robin and Weighted Round Robin are capable of evenly distributing requests to backend service nodes, achieving load balancing, and should be used as the primary strategies before considering dynamic strategies. For example, Nginx prioritizes the use of Round Robin.

Currently available open-source load balancing services also provide some dynamic strategies. I will highlight their principles.

Load balancing servers collect information about the calls to backend services, such as the number of active connections from the load balancer to the backend services or the response time of the calls. Based on this information, they select the service with the fewest connections or the shortest response time. Here are a few examples:

Dubbo provides the LeastActive strategy, which prioritizes selecting the service with the fewest active connections.
Ribbon, part of the Spring Cloud ecosystem, provides the WeightedResponseTimeRule which uses response time to calculate a weight for each service node, and then allocates the service node to the caller based on this weight.

The thinking behind these strategies is to choose the service with the lowest load and the most idle resources from the caller’s perspective, in order to achieve better service calling performance, maximize the use of idle server resources, and achieve faster response times. Therefore, I recommend prioritizing the use of dynamic strategies in practical development.

So far, based on the analysis above, you can choose a suitable load balancing strategy and select the optimal service node. However, a question arises: How can you ensure that the selected node is a properly functioning one? What if a faulty node is selected when using the Round Robin strategy? To reduce the likelihood of requests being allocated to a faulty node, some load balancing servers provide fault detection mechanisms for service nodes.

How to Detect Node Failures #

In Lesson 24, I introduced to you that in a microservice architecture, service nodes regularly send heartbeats to the registry center so that the registry center can determine whether a service node has failed, ensuring that only available nodes are passed to the load balancing service.

But for Nginx, how can we ensure that the configured service nodes are available?

Thanks to the open-source Nginx module called nginx_upstream_check_module developed by Taobao, this module allows Nginx to periodically probe a specified interface of the backend service, and based on the returned status code, determine whether the service is still alive. When the number of consecutive failures reaches a certain threshold, the backend service is automatically removed from the load balancing server. Here is an example configuration:

upstream server {

        server 192.168.1.1:8080;

        server 192.168.1.2:8080;

        check interval=3000 rise=2 fall=5 timeout=1000 type=http default_down=true;// Probe interval is 3 seconds, probe timeout is 1 second, using the http protocol. If the number of consecutive failures reaches 5, the service is considered unavailable; if the number of consecutive successes reaches 2, the service is considered available. The backend service is initially unavailable.

        check_http_send "GET /health_check HTTP/1.0\r\n\r\n"; // Probe URL

        check_http_expect_alive http_2xx; // Probe is considered successful if the returned status code is 200

}

After configuring Nginx in the above way, your business server should also implement a “/health_check” interface, which should return an HTTP status code. This returned status code can be stored in a configuration center, so that when the status code needs to be changed, there is no need to restart the service (the configuration center will be discussed in Lesson 33).

The node detection feature can also help us achieve the graceful shutdown of web services. In Lesson 24, when introducing the registry center, I mentioned that a graceful shutdown of a service requires cutting off traffic before shutting down the service. With the use of the registry center, you can first remove the node from the center, and then restart the service, in order to achieve graceful shutdown. So, how can a web service achieve graceful shutdown? In the following, I will explain how services start and shut down when the node detection feature is available.

When the service is just started, you can initialize the default HTTP status code as 500. This way, Nginx will not quickly mark this service node as available, allowing the service to wait for the initialization of the dependent resources, avoiding fluctuations at the initial startup of the service.

After the complete initialization, change the HTTP status code to 200. After two probes, Nginx will mark the service as available. When the service is being shut down, the HTTP status code should also be changed to 500, waiting for Nginx to mark the service as unavailable. Only then will the frontend traffic no longer be sent to this service node. After waiting for all requests being processed by the service to complete, the service can be restarted, avoiding the problem of failed requests caused by a direct restart. This is the standard procedure for starting and shutting down web services in production, which you can refer to when working on your project.

Course Summary #

In this lesson, I introduced you to some knowledge and practical skills related to load balancing services, as well as their application in actual work. I would like to emphasize a few key points:

Deploying a website load balancing service involves using LVS to handle incoming traffic, deploying Nginx for fine-grained traffic distribution and faulty node detection before the application servers. Of course, if your website has low concurrency, you can consider not implementing LVS.

For load balancing strategies, it is preferable to prioritize dynamic strategies to ensure that requests are sent to the most optimal performing nodes. If there are no suitable dynamic strategies, then round-robin can be used to evenly distribute requests to all service nodes.

Nginx can introduce the nginx_upstream_check_module to perform regular health checks on backend services. When a backend service node is restarted, it should follow the principle of “switch traffic before restart” to minimize the impact of node restarts on the overall system.

You may think that components like Nginx and LVS are the concern of operations teams and that as a developer, you don’t need to worry about maintenance. However, through today’s lesson, you should be able to see that load balancing services are important components for improving system scalability and performance. In the design of high-concurrency systems, their role is irreplaceable. Understanding their principles and mastering the correct way to use them should be a required course for every backend developer.