23 Architectural Design, How to Implement a High Performance Distributed Rpc Framework

23 Architectural design, how to implement a high-performance distributed RPC framework #

In the previous courses, we have gradually explained the basics and implementation principles of Netty, and dissected the core source code of Netty. I believe you have already experienced the power of Netty. Learning a technology itself is a relatively long process, and congratulations on sticking with it. What is written on paper may seem shallow, but true understanding comes from hands-on practice. Are you eager to use Netty in your project now? I will guide you through the process of creating a relatively complete prototype of an RPC framework, helping you deepen your understanding of Netty. I hope you can personally follow along and complete it.

First, let me explain why we choose an RPC framework as a practical project. The RPC framework is a middleware framework that large enterprises frequently use to solve the problem of communication between services in distributed systems. The design of an RPC framework involves many important knowledge points such as thread models, communication protocol design, synchronous/asynchronous calls, load balancing, etc., which greatly helps improve our technical capabilities.

What are the goals we want to achieve in this practical course? There are many well-known RPC frameworks on the market, such as Dubbo, Thrift, gRPC, etc. It is not possible for us to cover all aspects of an RPC framework. Instead, we will focus on the core processes and essential components of an RPC framework and develop a small but feature-rich RPC framework. Although it is small, it is complete in all aspects.

Before we start the RPC practical project, we need to learn about the architecture design of RPC, which is a very important step in the early stage of project planning.

Architecture Design of RPC Framework #

RPC, also known as Remote Procedure Call, is used to solve the problem of communication between services in distributed systems. In simple terms, developers can call remote services as if they were calling local methods. Now let’s use a diagram to explain the basic architecture of an RPC framework.

The RPC framework consists of three most important components: the client, the server, and the registry center. In a typical RPC call process, these three components interact as follows:

After the server starts, it will publish the list of services it provides to the registry center, and the client will subscribe to the service addresses from the registry center.
The client will call the server through the local proxy module Proxy. The Proxy module is responsible for converting the method, parameters, and other data into network byte streams.
The client selects one of the service addresses from the service list and sends the data to the server over the network.
The server receives the data, decodes it, and obtains the request information.
The server calls the corresponding service based on the decoded request information, and then returns the invocation result to the client.

Although the RPC call process is easy to understand, implementing a complete RPC framework involves many aspects, such as service registration and discovery, communication protocol and serialization, load balancing, dynamic proxy, etc. We will explain each of them in the subsequent sections.

Service Registration and Discovery #

In a distributed system, how should different services communicate with each other? The traditional approach is to use HTTP request calls, save the service list on the server side, etc. This requires developers to actively perceive the information exposed by the server, resulting in a serious coupling between systems. In order to decouple the client and server, and achieve graceful service online and offline, a registry center appeared.

In the RPC framework, the registry center is mainly used to implement service registration and discovery. After the server node comes online, it registers the service list with the registry center. When the node goes offline, it needs to remove the node’s metadata information from the registry center. When the client initiates a call to the server, it is responsible for obtaining the service list from the registry center and then selecting one of the service nodes for invocation using a load balancing algorithm. This is the simplest and most direct publishing and subscribing mode between the server and the client, without the need for any intermediate servers and with minimal performance overhead.

Now let’s think about a problem. When a service goes offline, how can the registry center be aware of it? The first method that comes to mind is the implementation of active notification by the node. When a node needs to go offline, it sends a request to the registry center to remove its metadata information. However, if the node exits abnormally, such as a network disconnection or process crash, the registry center will continue to retain the metadata of the abnormal node, which may cause problems with service invocation.

To avoid the above problem, a good way to achieve graceful service shutdown is to use active notification + heartbeat detection. In addition to actively notifying the registry center to go offline, heartbeat detection between the node and the registry center needs to be added. This process is also called probing. Heartbeat detection can be done by either the node or the registry center. For example, the registry center can send a heartbeat packet to the service node every 60 seconds. If it does not receive a response after 3 consecutive heartbeat packets, it can consider that the service node has gone offline.

Therefore, the advantage of using a registry center is that it can decouple the complex relationship between the client and the server, and can achieve dynamic management of services. Service configuration can be dynamically modified and then pushed to the client and server without the need to restart any services.

Communication Protocols and Serialization #

Since RPC is remote procedure call, it inevitably involves network communication protocols. Before the client initiates a call to the server, it needs to consider how to encode the call information and transmit it to the server. Because RPC frameworks have very high performance requirements, the communication protocol should be as simple as possible to reduce the performance overhead of encoding and decoding. RPC frameworks can be implemented based on different protocols. Most mainstream RPC frameworks choose TCP, HTTP protocols, and the famous gRPC framework uses HTTP2. TCP, HTTP, and HTTP2 are all stable and reliable, but UDP protocol can also be used depending on the specific business scenario. Mature RPC frameworks can support multiple protocols. For example, the Dubbo framework, an open-source framework developed by Alibaba, is widely used by many Internet companies. Its plug-and-play protocol support is a major feature of Dubbo, which not only provides developers with various choices, but also provides convenience for integrating heterogeneous systems.

What data needs to be transmitted between the client and the server during communication? How should this data be encoded and decoded? If TCP protocol is used, you need to serialize the interface, method, request parameters, and invocation attributes of the call into a binary byte stream to be transmitted to the service provider. After receiving the data, the service provider deserializes the binary byte stream to obtain the call information, and then uses the reflection principle to invoke the corresponding method, and finally returns the result, return code, exception information, etc. to the client. Serialization and deserialization refer to the process of converting objects to binary streams and converting binary streams back to objects. Because network communication relies on byte streams, and these request information is uncertain, a general and efficient serialization algorithm is generally used. Commonly used serialization algorithms include FastJson, Kryo, Hessian, Protobuf, etc. These third-party serialization algorithms are more efficient than native Java serialization operations. Dubbo supports multiple serialization algorithms and defines the Serialization interface specification. All serialization algorithm extensions must implement this interface. The default serialization algorithm used is Hessian.

RPC Invocation Modes #

Mature RPC frameworks generally provide four invocation modes: synchronous (sync), asynchronous (future), callback, and oneway. The performance and throughput of RPC frameworks are closely related to the appropriate use of invocation modes. Let’s introduce the implementation principles of these four invocation modes one by one.

Sync synchronous invocation. When the client thread initiates an RPC call, the current thread will block until the server returns the result or a timeout exception occurs. Sync synchronous invocation is generally the default calling mode of RPC frameworks. To ensure system availability, it is important to set a reasonable timeout for the client. Although Sync is a synchronous call, the client thread and the server thread are not the same thread. In fact, they are processed asynchronously internally in the RPC framework. The process of Sync synchronous invocation is shown in the following figure.

Future asynchronous invocation. After the client initiates the call, it does not block and wait anymore, but gets the Future object returned by the RPC framework. The call result will be cached by the server, and the client decides when to get the result. When the client actively gets the result, the process is blocking and waiting. The process of Future asynchronous invocation is shown in the following figure.
Callback Invocation. As shown in the figure below, when the client initiates an invocation, it passes a Callback object to the RPC framework, which returns directly without synchronously waiting for the result. It then executes the user’s registered Callback when it receives the server’s response or encounters a timeout exception. Therefore, the Callback interface generally includes two methods, onResponse and onException, corresponding to successful response and exceptional response, respectively.

Oneway Invocation. After the client initiates a request, it returns directly, ignoring the result. The Oneway invocation is the simplest, as shown in the figure below.

Each of the four invocation methods has its own advantages and disadvantages. It is difficult to say that the asynchronous method will always perform better than the synchronous method. The appropriate invocation method can be selected according to different business scenarios.

Thread Model #

The thread model is a key part of the RPC framework, but how does it differ from and relate to the Netty Reactor thread model we introduced earlier?

First, we need to understand the difference between I/O threads and business threads. Taking the Dubbo framework as an example, Dubbo uses Netty as the underlying network communication framework and adopts the well-known Reactor thread model with a main and sub reactor. The Boss and Worker thread pools can be regarded as I/O threads. I/O threads primarily handle network data, such as event polling, encoding/decoding, and data transmission. If the business logic can be completed immediately, it can also be handled by the I/O thread to avoid the overhead of thread context switching. However, if the business logic takes a long time, such as querying a database or performing complex rule calculations, the I/O thread must dispatch these requests to the business thread pool for processing to prevent blocking the I/O thread.

So, which requests need to be executed in the I/O thread, and which ones need to be executed in the business thread pool? The approach used by the Dubbo framework is worth learning from. It provides users with multiple choices, offering a total of five dispatch strategies, as shown in the table below.

Load Balancing #

In a distributed system, both the service providers and service consumers can have multiple nodes. How can we ensure load balancing across all nodes of the service providers? Before making a call, the client needs to be aware of the available server nodes and then select one for the call. The client needs to obtain the status information of the server nodes and implement load balancing algorithms based on different strategies. Load balancing strategies are an important factor affecting the throughput of RPC frameworks. Below, we introduce several commonly used load balancing strategies:

Round-Robin. Round-Robin is the simplest and most effective load balancing strategy. It does not consider the actual load level of the server nodes, but instead selects the server nodes in a round-robin fashion.
Weighted Round-Robin. Different server nodes with different load levels are assigned weight coefficients. This allows lower-performing or lower-configured nodes to handle less traffic. The weight coefficients can be adjusted in real-time based on the server load level, achieving a relatively balanced state for the cluster.
Least Connections. Load balancing is based on the current number of connections of the server nodes. The client selects the server with the fewest connections for the call. The Least Connections strategy is just one dimension of load balancing on the server. Other load balancing schemes can be developed based on dimensions such as the fewest number of requests or lowest CPU utilization.
Consistent Hash. Consistent Hash is the recommended load balancing strategy currently. Consistent Hash is a special hash algorithm that tries to ensure that client requests are consistently assigned to the same server node, even when the server nodes are added or removed. The Consistent Hash algorithm is implemented using a hash ring, placing objects and server nodes on the ring using a hash function. Usually, servers can be selected using the IP + Port for hashing, and the corresponding server node is chosen for the object by finding the server node closest to the object’s hash value in the hash ring.

In addition, load balancing algorithms can take on various forms. Clients can record richer information such as health status, connection count, memory, CPU, and load, and make better decisions based on comprehensive factors.

Dynamic Proxy #

How can an RPC framework make remote service calls as if they were local interface calls? This requires dynamic proxies. A proxy object needs to be created, which handles data packet encoding and sends the call to the service provider, thus shielding the RPC framework’s call details. Since the proxy class is generated at runtime, the speed of generating the proxy class and the size of the generated bytecode will affect the overall performance and resource consumption of the RPC framework. Therefore, the implementation of dynamic proxies needs to be carefully chosen. The following are some popular implementation options for dynamic proxies: JDK dynamic proxy, Cglib, Javassist, ASM, and Byte Buddy. Let’s do a brief comparison and introduction.

JDK Dynamic Proxy. Dynamic proxy classes can be created at runtime, but the functionality of JDK dynamic proxy is limited. The proxy object must implement an interface, otherwise an exception will be thrown. Since the proxy class inherits from the Proxy class, and Java does not support multiple inheritance, only polymorphism through interfaces is possible. The proxy class generated by JDK dynamic proxy is an implementation class of the interface and cannot proxy methods that do not exist in the interface. JDK dynamic proxy proxies methods in the class through reflection, which is slower than direct invocation for sure.
Cglib Dynamic Proxy. Cglib is implemented based on the ASM bytecode generation framework, generating proxy classes with no type restrictions. Moreover, the proxy class generated by Cglib inherits from the class being proxied, providing more flexible functionality. Cglib has advantages in method invocation. It adopts the FastClass mechanism, creating a Class for both the proxy class and the class being proxied. This Class assigns an index to the methods of the proxy class and the class being proxied, allowing FastClass to locate and directly invoke the intended method through the index. This optimization approach trades space for time.
Javassist and ASM. Both are Java bytecode manipulation frameworks. They are more difficult to use as they require understanding of the Class file structure and JVM. However, they have better performance than reflection. Byte Buddy is also a library for generating and manipulating bytecode. It offers powerful features and provides a more convenient API for creating and modifying Java classes, without understanding the format of the bytecode. Additionally, Byte Buddy is lightweight and performs better than Javassist and ASM.

So far, we have provided a rough introduction to several key points in implementing an RPC framework. We will have dedicated practical sessions to provide detailed explanations on how communication protocols, load balancing, and dynamic proxies are implemented in an RPC framework. For this session, a general understanding is sufficient.

Conclusion #

If you can complete the core functionalities of the mentioned RPC framework, then you have accomplished an MVP prototype of a simple RPC framework. This is the goal of our practical sessions. However, it is not easy to implement a high-performance and highly reliable RPC framework. There are many other considerations, such as exception retries, service-level thread pool isolation, circuit breaking and rate limiting, cluster fault tolerance, and graceful shutdown, among others. In the final sessions of our practical course, we will introduce advanced content to expand your RPC framework.