17 Asynchronous Rpc Pressure Measured by the Swallowing Capacity of a Single Machine

17 Asynchronous RPC - Pressure measured by the swallowing capacity of a single machine #

Hello, I’m He Xiaofeng. Starting from today, we will officially enter the advanced section.

In the previous chapter, we learned about the basic architecture of RPC framework, a series of governance features, and some advanced features related to cluster management, such as service discovery, health check, routing strategy, load balancing, graceful start and stop, etc.

With these knowledge reserves, you already have a fairly comprehensive understanding of the RPC framework. However, if you want to have a deeper understanding of RPC and make better use of RPC, you must consider the overall performance of the RPC framework. You need to know how to improve the performance, stability, security, throughput of the RPC framework, and how to quickly locate problems in distributed scenarios, etc. These are the key points that we will explain in the advanced section. The difficulty level will increase, so I hope you can persist in learning!

So today, let’s first talk about how the RPC framework squeezes the single machine throughput.

How to improve single-machine throughput? #

During my operation of RPC, “how to improve throughput” is a question that I often discuss with the business team.

I remember that the business team once raised a problem: our TPS has never been able to go up. During stress testing, the CPU was pushed to 40%~50%, but it could not be pushed any further, and TPS did not increase either. They asked if we had any solutions to improve the throughput of the business.

Afterwards, I took a look at the business logic of their services and found that on top of executing time-consuming business logic, they also synchronously called several other services. Due to the long duration of these services, the business logic of this service also became time-consuming. The CPU was mostly idle, waiting, and not fully utilized. Therefore, the CPU utilization rate and the throughput of the service naturally cannot increase.

What affects the throughput of RPC calls?

When using RPC, when it comes to performance and throughput, our first reaction is to choose a high-performance, high-throughput RPC framework. So, what is the root cause that affects the throughput of RPC calls?

The root cause is actually that the processing of RPC requests is time-consuming, and the CPU is mostly idle, waiting, and not utilized for computation. This is similar to a person working, but not planning the time properly, and having a long period of idle time. Naturally, they cannot complete much work.

So, is the reason why RPC requests are time-consuming mainly due to the RPC framework itself? In fact, unless the network is slow or the user misuses it, in most cases, apart from the time-consuming business logic processing, the efficiency of RPC request processing itself is no more than milliseconds even in the worst-case scenario. It can be said that most of the time consumed by RPC requests is due to business logic, such as accessing the database to execute slow SQL operations. Therefore, in most cases, the reason that affects the throughput of RPC calls is that the business logic is slow, and the CPU is mostly waiting for resources.

Once we understand the reason, we can solve the problem. So how to improve single-machine throughput?

This is not a new topic. For example, the responsiveness development that we often talk about now is to improve the throughput of business processing. To improve throughput, the key is only two words: “asynchronous”. Our RPC framework should be fully asynchronous, achieving fully asynchronous RPC. Just imagine, if we send an asynchronous request every time, and the request ends immediately after sending, and then all the business logic is executed asynchronously and the result is asynchronously notified, how much observable throughput can be increased?

I believe you understand the effect without me saying it. So, what are the asynchronous strategies in RPC frameworks?

How to Use Asynchronous Calls in the Client? #

When it comes to asynchronous calls, the most commonly used approach is to return a Future object or use a callback object as an input parameter. The Future approach can be considered the simplest form of asynchronous call. We initiate an asynchronous request and obtain a Future from the request context. Afterwards, we can call the get method of the Future to retrieve the result.

For example, let’s go back to the problem mentioned earlier by the business team. In their business logic, they made several calls to other services. If these calls were synchronous, assuming each call takes 10 milliseconds, the business logic would take at least 40 milliseconds to complete.

But what if we use the Future approach?

We send 4 asynchronous requests sequentially and obtain 4 Future objects. Since the calls are asynchronous, the time spent in this process is almost negligible. Later, we can call the get method on these Future objects together. In this case, what is the ideal time for the completion of the business logic? That’s right, 10 milliseconds. The execution time is reduced to one-fourth, which means that our throughput can potentially increase fourfold!

example

Now, how can we implement asynchronous calls with a RPC framework that uses Future objects?

Through the study of the basics, we know that a RPC call essentially involves the client sending a request message to the server, the server processing the message and responding with a response message, and the client receiving and processing the response message, and finally returning the ultimate result to the dynamic proxy.

Here, we can see that for the client, sending the request message to the server and receiving the response message from the server are two completely independent processes, and in most cases, they are not even executed in the same thread. Does this mean that the internal implementation of the RPC framework on the client side for handling RPC calls is asynchronous?

Indeed, for RPC frameworks, whether it is synchronous or asynchronous calls, the internal implementation on the client side is always asynchronous.

In the previous lesson, we learned that each message sent by the client has a unique message ID. In fact, before sending the request message to the server, the client creates a Future and stores the mapping between the message ID and the Future. The ultimate return value obtained by the dynamic proxy is retrieved from this Future. When receiving the response message from the server, the client uses the unique message ID of the response message to find the corresponding Future from the previously stored mapping and injects the result into that Future. Then it goes through a series of processing logic, and finally, the dynamic proxy retrieves the correct return value from the Future.

The so-called synchronous call is simply the RPC framework actively executing the get method of the Future in the client’s processing logic, making the dynamic proxy wait for the return value. In contrast, asynchronous call means that the RPC framework does not actively execute the get method of the Future, and the user can obtain the Future from the request context and decide when to execute the get method.

Now you should have a clear understanding of how RPC frameworks implement asynchronous calls using the Future approach.

future_example

How to achieve fully asynchronous RPC invocation? #

Just now I explained the asynchronous method using Future. The Future method can be considered as one way of asynchronous invocation on the client side. What about the server side? Does the server need to be asynchronous, and how can it be implemented?

From the basics we learned, when the RPC server receives a request, it will unpack and decode the binary message according to the protocol, then decode and deserialize the complete message to obtain the input parameters, and then execute the business logic through reflection. Have you ever wondered which thread executes these operations in a production environment? Are they executed in the same thread?

Of course not in the same thread. The processing of unpacking and decoding binary messages must be done in the thread that handles network IO. If the network communication framework uses the Netty framework, the processing of binary messages is done in the IO thread, and the decoding and deserialization process often takes place in the IO thread as well. What about the server’s business logic? Should it also be processed in the IO thread? In principle, it shouldn’t. The business logic should be handed off to a dedicated business thread pool to prevent the slow processing of business logic from affecting the network IO handling.

Now here’s the problem. The number of threads in the business thread pool we configured is limited. Based on my experience of operating RPC, the number of threads in the business thread pool is generally only configured up to 200. If the thread count needs to be increased beyond that, it indicates that the business logic needs to be optimized. So what if we encounter a special business scenario? Let’s say we encounter a situation where the configured business thread pool is completely filled, such as the following scenario.

Here I start a service, and the business logic processing is relatively slow. As the traffic volume gradually increases, the business thread pool is easily filled, the throughput is not ideal, and the CPU utilization is also low.

Have you thought of any solutions to this problem? Would you immediately think of increasing the number of threads in the business thread pool? Is that a viable solution? Is there a better solution?

I think making the server’s business processing logic asynchronous is a good method.

Increasing the number of threads in the business thread pool can indeed solve this problem, but for RPC frameworks, there are often multiple services that share the same thread pool. Even if the business thread pool is increased, time-consuming services may still affect other services. So the best solution is to make the business thread pool release as quickly as possible. For this reason, the RPC framework needs to support asynchronous processing of server-side business logic, which is very important for improving service throughput.

So how can the server support asynchronous processing of business logic?

This is a relatively difficult problem to deal with because after the server completes the business logic, it needs to serialize and encode the return value, and send the message back to the calling party. However, if it is processed asynchronously, the method completes once the business logic is triggered, and there is no time to serialize and encode the actual result and respond to the calling party.

In this case, the RPC framework needs to provide a callback mechanism that allows the business logic to be processed asynchronously, and after processing, call the callback interface of the RPC framework to respond to the calling party with the final result through callbacks.

Speaking of server-side support for asynchronous processing of business logic, combined with the Future asynchronous method I just explained, have you thought of a better way to handle it? In fact, we can make the RPC framework support CompletableFuture and achieve complete asynchrony in RPC calls between the calling and server sides.

CompletableFuture is natively supported in Java 8. Just think, if the RPC framework can support CompletableFuture, and I now publish an RPC service with a service interface that defines the return value as a CompletableFuture object, the entire call process will be divided into several steps:

The service caller initiates an RPC call and directly obtains the CompletableFuture object as the return value. After that, no additional operations related to the RPC framework are needed (such as the operation of obtaining the Future through the request context that I just explained when discussing the Future method), and it can be processed asynchronously directly.
In the server-side business logic, create a CompletableFuture object as the return value. Then the real business logic on the server side can be processed asynchronously in a thread pool. After the business logic is completed, call the complete method of this CompletableFuture object to complete the asynchronous notification.
After receiving the response sent by the server, the RPC framework automatically calls the complete method of the CompletableFuture object obtained by the caller. This way, the asynchronous call is completed.

Through support for CompletableFuture, the RPC framework can truly achieve complete asynchrony between the calling and server sides, while improving the single-machine throughput of both ends, and CompletableFuture is natively supported in Java 8 without any intrusive code in the business logic. Isn’t that cool?

Summary #

Today, we mainly discussed how to maximize the throughput of a single machine through asynchronous RPC.

The main reason that affects the throughput of RPC calls is that the server’s business logic is time-consuming, and the CPU spends most of its time waiting instead of computing, resulting in insufficient CPU utilization. The best way to increase the single machine throughput is to use asynchronous RPC.

The asynchronous strategy of the RPC framework mainly includes asynchronous calls on the client side and asynchronous processing on the server side. Asynchronous calls on the client side are implemented through the Future pattern. The client initiates an asynchronous request and obtains a Future from the request context. The result is then obtained through the get method of the Future. If multiple other services are called simultaneously in the business logic, the Future can reduce the time consumption and increase the throughput. Asynchronous processing on the server side requires a callback mechanism to allow the business logic to be processed asynchronously. Then, the RPC framework’s callback interface is called to asynchronously notify the final result to the client.

In addition, we can achieve complete asynchronous RPC calls between the client and server, and increase the single machine throughput of both sides by leveraging the support for CompletableFuture.

In fact, RPC frameworks can also have other asynchronous strategies, such as integrating with RxJava, or using the StreamObserver input object of gRPC. However, CompletableFuture is a native feature provided by Java 8, has no code intrusion, and is more convenient to use. If you are developing in Java, making the RPC framework support CompletableFuture can be considered the best asynchronous solution.

After-class Reflection #

Do you have any other solutions to improve throughput for RPC calls? What other asynchronous strategies can you think of for RPC frameworks?

Feel free to leave a comment and share your answers. You can also share this article with your friends and invite them to join the learning. See you in the next class!