14 Graceful Boot How to Prevent Traffic From Hitting Nodes That Have Not Fully Started

14 Graceful Boot - How to prevent traffic from hitting nodes that have not fully started #

Hello, I am He Xiaofeng. In the previous lecture, we introduced graceful shutdown, which aims to ensure that when a service provider is shutting down an application, all calling parties can safely divert traffic, stop calling the service provider, and achieve no loss to the business. The key to implementing this is to ensure that the service provider application that is shutting down is in a state, so that calling parties are aware that the service provider is shutting down.

Continuing from the previous lecture, today let’s talk about graceful startup.

Are you surprised? Do we really need to pay attention to application startup? It’s like warming up a car before driving, letting the engine run for a while before getting on the road, so that all parts of the car can “heat up” and reduce wear and tear.

The same principle applies to applications. An application that has been running for a while will execute faster than a newly started one. This is because in Java, during runtime, the JVM will compile frequently executed code into machine code, and loaded classes will be cached in the JVM cache, so that temporary loading is not triggered when they are used again. This improves execution speed without the need for interpretation for “hotspot” code.

However, these “temporary data” disappear once our application restarts. After the loss of these “benefits” upon restart, if we let our newly started application bear the same amount of traffic as before the shutdown, it will be under high load at the beginning, causing incoming requests from calling parties to possibly time out on a large scale, which will in turn damage the online business.

As we mentioned in the previous lecture, in a microservice architecture, deployments occur frequently, so we certainly cannot allow incoming requests to experience widespread timeouts just because of a deployment. Therefore, we need to find a solution. Since the key problem lies in the fact that “the newly restarted service provider bears a large amount of traffic without pre-warming”, can we use some method to start the application with only a small amount of traffic at the beginning? This way, the application can run at a low power level for a period of time before gradually increasing to its optimal state.

This is actually the main point I want to share with you today, a practical feature in RPC called startup preheating.

Preheating #

So what is preheating?

In simple terms, it means that the newly started service provider application does not bear the full load of traffic immediately. Instead, the number of times it is called increases slowly over time, until the traffic gradually increases to the same level as it would be after running for a while.

So, how can we achieve this in RPC?

Now we want to control the traffic sent from the caller to the service provider. Let’s briefly review the process of RPC invocation initiated by the caller. The caller application can obtain the IP address of the service provider through service discovery. Before sending each request, it needs to select an available connection from the connection pool using the load balancing algorithm. In this case, can’t we make the load balancer differentiate between newly started applications? For newly started applications, we can lower the probability of them being selected, but this probability will gradually increase over time, thus achieving a process of dynamically increasing traffic.

Now that we have a plan, we can consider how to implement it concretely.

First of all, for the caller, we need to know the time the service provider started. How can we obtain this? Here are two methods I can suggest: one is for the service provider to inform the registration center of its start time when it starts; the other is for the registration center to record the registration time when it receives the request from the service provider. Either of these two times should work, but you might hesitate about how to ensure that the date and time of all machines are the same. Actually, this is not a big concern because the time required for the entire preheating process is a rough estimation. Even if there is a 1-minute error between machine dates and times, it does not matter. Moreover, in the real environment, machines usually have the NTP time synchronization function enabled by default to ensure the consistency of all machine times.

Regardless of which time you choose, the ultimate result is that the caller can obtain not only the IP list through service discovery but also the corresponding starting time. We need to apply this time to load balancing. In [[Lesson 11]], we introduced a load balancing method based on weights. However, the weights in that case were set by the service provider and belonged to a fixed state. Now we need to make these weights dynamic and gradually increase them to the fixed value set by the service provider over time. The entire process is shown in the following figure:

With this slight adjustment to the logic, we can ensure that when the service provider has been running for less than the preheating time, its weight is reduced, thus reducing the probability of being selected by the load balancer. This avoids putting the application under high load from the beginning of its start, thus achieving a preheating process for the service provider after it starts.

When you reach this point, you may have another question: When I restart a large number of service providers, will the ones that have not been restarted encounter problems due to the heavy traffic they bear?

Regarding this issue, here is how I think about it. When you restart a large number of service providers, for the caller, the weights of these newly restarted machines are basically the same. In other words, the probability of these machines being selected is the same, and everyone has the same low weight. So there is no problem of weight differentiation. But for those applications that haven’t been restarted, their probability of being selected by the load balancer is relatively high. However, we can smoothly transition through the method of adaptive load balancing that we learned in [[Lesson 11]], so there is no problem either.

Preheating is more about solving the cold start problem of service provider applications from the perspective of the caller, allowing the volume of requests from the caller to transition through a time window and gradually reach a normal level, thus achieving a smooth online process. However, is there any solution for the service provider itself to achieve this effect?

Of course, there is. This is another key point I want to share today, which is closely related to hot startup: delayed exposure.

Delayed Exposure #

When our application starts, it is loaded through the main entry point, and then various related dependencies are loaded in order. Taking a Spring application startup as an example, during the loading process, the Spring container will sequentially load Spring Beans. If a Bean is an RPC service, we not only need to register it in the Spring BeanFactory, but also register the corresponding interface of this Bean in the registry. When the registry receives the address of a new online service provider, it will push this address to the memory of the calling application. When the calling party receives the address of the service provider, it will establish a connection and send a request.

But is it possible that the service provider has not yet completed the startup at this time? Because the service provider application may still be loading other Beans. For the calling party, as long as it obtains the IP of the service provider, it is possible to make an RPC call. However, if the service provider has not yet completed the startup at this time, the call will fail, causing damage to the business.

Is there a way to avoid this situation?

Before solving the problem, let’s first look at the root cause of the aforementioned problem. This is because when the service provider application has not yet completed the startup, the request from the calling party arrives, and the reason for the request from the calling party is that the service provider application registers the parsed RPC service with the registry during the startup process. This results in the address of the service provider being sensed by the service caller even when the subsequent loading is not complete.

In this case, we can actually move the time for registering the interface with the registry to after the application has completed its startup. The specific approach is that when loading and parsing Beans during application startup, if we encounter a Bean of an RPC service, we only register this Bean in the Spring BeanFactory, but we do not register the corresponding interface of this Bean in the registry. Only after the application has completed its startup, do we register the interface with the registry for service discovery, thereby achieving the goal of delaying the service caller from obtaining the address of the service provider.

This ensures that the application starts to handle traffic only after it has fully started, but in fact, we still have not achieved the original goal. Because although the application has completed its startup at this time, it has not yet executed the relevant business code, so the JVM memory remains cold. If a large number of requests come at this time, it will still cause the entire application to run in high load mode, thereby resulting in a delay in returning request results in a timely manner. Moreover, in actual business scenarios, the internal business logic of a service generally relies on other resources, such as cached data. If we can complete the initialization of the cache before the service starts to provide services, instead of loading it when the request comes, we can reduce the probability of the first request returning an error after a restart.

So how can we specifically implement this?

We still need to utilize the time when the service provider registers the interface with the registry. We can reserve a hook process before the interface is registered with the registry in the service provider application startup, allowing users to implement customizable hook logic. Users can simulate the calling logic in the hook to warm up the JVM instructions, and users can also preload some resources in the hook in advance. Only when all resources are loaded, the interface is finally registered with the registry. The entire application startup process is shown in the following figure:

Summary #

Including [Lecture 11], we have now completed the entire startup and shutdown process in RPC. As mentioned earlier, although the startup and shutdown process may not be part of the mainstream RPC process, if you can do these “small” tasks well in RPC, you can provide your technical team with more benefits brought by microservices.

In addition, the two key points we discussed today—startup warm-up and delayed exposure—are not exclusive features of RPC. We can also use these two points to minimize the impact of cold startup on business when developing other systems.

Post-class Reflection #

In the warm-up section, we mentioned a question: “When a large batch of service providers are restarted, there is a high probability that requests will be sent to machines that have not been restarted. In such cases, the service providers may not be able to handle the load.” I’m curious to know your thoughts on this issue and if you have any good solutions.

Please feel free to leave a comment and share your thoughts with me. You are also welcome to share this article with your friends and invite them to join the learning. See you in the next class!