10 Network Communication Optimization How to Optimize Rpc Network Communication Protocol

10 Network communication optimization How to optimize RPC network communication protocol #

Hello, I am Liu Chao. Today I will take you through the optimization of network communication between services.

In the previous lesson, I mentioned microservice frameworks, among which Spring Cloud and Dubbo are the most widely used. There has always been a comparison between the two in the industry, and many technicians would argue about which framework is better.

I remember that our department had a long discussion when building the microservice framework, and there was intense debate over the technical choices. Currently, Spring Cloud is very popular and has a complete microservice ecosystem, gaining votes from many colleagues. However, our final choice was Dubbo. Why is that?

RPC Communication is the Core of Large-Scale Service Architecture #

We often discuss microservices, and it is essential to understand what the core of microservices really is in order to make accurate technology choices.

In my personal understanding, I believe that the core of microservices is remote communication and service governance. Remote communication provides the bridge for communication between services, while service governance provides the logistical support for services. Therefore, when making technology choices, we need to consider more about the requirements of these two cores.

We know that service decomposition increases the cost of communication, especially in business scenarios such as flash sales or promotions, where method calls between services are required. For example, after a successful flash sale, we may need to call the order system, payment system, coupon system, etc. In this case, remote communication can easily become a bottleneck for the system. Therefore, under the premise of meeting certain service governance requirements, the performance requirements of remote communication are the main influencing factors for technology choices.

Currently, many microservice frameworks implement service communication based on RPC communication. Without component extension, Spring Cloud implements RPC communication based on the Feign component (implemented based on Http+Json serialization), while Dubbo extends many RPC communication frameworks based on SPI, including RMI, Dubbo and Hessian RPC communication frameworks (default is Dubbo+Hessian serialization). The choice and optimization criteria for RPC communication also vary in different business scenarios.

For example, as I mentioned earlier, when our department was selecting a microservice framework, we chose Dubbo. The selection criteria at the time were that RPC communication could support high concurrency in flash sale scenarios, where the characteristics of requests are instant peaks, high request volume, and small incoming and outgoing parameter data. The Dubbo protocol in Dubbo provided excellent support for these requests.

The following are simple performance tests based on Dubbo 2.6.4. I tested the communication performance of Dubbo+Protobuf serialization and Http+Json serialization (mainly simulating the performance comparison between single TCP long connection + Protobuf serialization and short connection Http+Json serialization). In order to verify the performance of the two in different data volume scenarios, I prepared performance tests for small objects and large objects. Through this approach, we can indirectly understand the level of the two in terms of RPC communication.

This test is based on my previous accumulation, and due to the complexity of the test environment, I’ll directly provide the results here. If you are interested, you can leave a comment to discuss with me.

Based on the above test results, it can be observed that both in terms of response time and throughput, RPC communication frameworks implemented using a single TCP long connection + Protobuf serialization have significant advantages.

In high-concurrency scenarios, when we choose a backend service framework or design a service framework internally within the middleware department, RPC communication is the key area for optimization.

In fact, there are many mature RPC communication frameworks available. If your company does not have its own middleware team, you can also extend an open-source RPC communication framework. Before formal optimization, let’s briefly review RPC.

What is RPC Communication #

When we mention RPC, do you also think of concepts like MVC and SOA? If you haven’t experienced the evolution of these architectures, these concepts can be easily confused. You can understand the evolution of these architectures through the following diagram.

Whether it’s microservices, SOA, or RPC architecture, they are all distributed service architectures that require communication between services. We generally refer to this type of communication as RPC communication.

RPC (Remote Process Call) is a communication technology for remotely calling computer program services via network requests. The RPC framework encapsulates underlying network communication, serialization, and other technologies. We only need to import the interface packages of various services into the project to call the RPC services in our code, just like calling local methods. Because of this convenient and transparent remote invocation, RPC is widely used in enterprise-level and internet projects today and is the core of implementing distributed systems.

RMI (Remote Method Invocation) is one of the first frameworks in the JDK to implement RPC communication. RMI implementation is crucial for building distributed Java applications and is a very important underlying technology in the Java system. Many open-source RPC communication frameworks are designed based on the implementation principles of RMI, including Dubbo framework, which also integrates the RMI framework. Next, let’s understand the implementation principles of RMI and see what performance bottlenecks need to be optimized.

RMI: JDK’s built-in RPC communication framework #

Currently, RMI is widely used in EJB and Spring frameworks and is considered the core solution for pure Java distributed application systems. RMI allows remote method invocations to be made as easily as local method invocations, as it handles the remote communication details for us.

Implementation Principle of RMI #

The remote proxy object is the most critical component of RMI. Besides being located in the same virtual machine as the object itself, other virtual machines can also invoke methods on this object. Furthermore, these virtual machines can be located on different hosts and can communicate with the server through network protocols using the remote proxy object.

We can understand the entire RMI communication process in detail through the following diagram:

Performance Bottlenecks of RMI in High-Concurrency Scenarios #

Default Java serialization

RMI uses the default Java serialization method, which I have discussed in detail in Lecture 09. We are well aware that its performance is not optimal and that other language frameworks do not currently support Java serialization.

TCP short connections

Since RMI is based on TCP short connections, in high-concurrency scenarios, a large number of requests will result in a significant overhead due to the creation and destruction of connections.

Blocking network I/O

In Lecture 08, I mentioned that network communication faces I/O bottlenecks. If the traditional I/O model is used for socket programming, short connection-based network communication in high-concurrency scenarios is prone to I/O blocking, significantly reducing performance.

Optimizing RPC Communication in High-Concurrency Scenarios #

The performance bottlenecks of RPC communication in SpringCloud and RMI communication are very similar. SpringCloud is based on HTTP communication protocol (short connections) and JSON serialization, which does not have an advantage in high-concurrency scenarios. So, how can we optimize RPC communication in scenarios with high-concurrency bursts?

RPC communication includes establishing communication, implementing messages, transmission protocols, and data encoding and decoding. We will start with optimizing each layer to achieve overall performance improvement.

1. Choosing an appropriate communication protocol #

To achieve network communication between different machines, we first need to understand the basic principles of computer network communication. Network communication is the process of data exchange between two devices, which is implemented based on network transmission protocols and data encoding/decoding. There are TCP and UDP protocols for network transmission, both of which are extended transmission protocols based on the Socket programming interface. The following two diagrams show the general flow of Socket network communication implemented using TCP and UDP protocols.

Socket communication implemented using TCP protocol is connection-oriented, and data transmission is achieved through three-way handshake to ensure reliability. The transmission does not have boundaries and uses a byte-stream mode.

Socket communication implemented using UDP protocol does not require establishing a connection from the client side. Instead, the client creates a socket and sends a data packet to the server. This means that the data packet may not always reach the server, making the transmission using UDP protocol unreliable. UDP sends data using a datagram mode where each UDP datagram has a length, and this length is sent to the server along with the data.

By comparing the two, we can come up with an optimization method: to ensure data transmission reliability, we usually use the TCP protocol. However, if there is no requirement for data transmission reliability and the communication is within a local area network (LAN), we can consider using the UDP protocol as it has higher efficiency compared to TCP.

2. Using a single long-lived connection #

If Socket communication is implemented based on the TCP protocol, what other optimizations can we make?

Communication between services is different from communication between clients and servers. Due to the large number of clients, it is beneficial to use short-lived connections for client-server communication to avoid occupying connections for a long time and wasting system resources.

However, in the case of communication between services, the number of consumer connections is not as high as that of clients, but the number of requests made by each consumer to the server is the same. By using long-lived connections, we can avoid the overhead of establishing and closing TCP connections, reducing system performance consumption and saving time.

3. Optimizing Socket communication #

When establishing network communication between two machines, we generally use Java’s Socket programming to implement a TCP connection. Traditional Socket communication suffers from I/O blocking, deficiencies in thread models, and memory copying. To overcome these issues, we can use mature communication frameworks such as Netty. Netty 4 has made various optimizations in Socket communication programming. The specifics are as follows.

Implementing non-blocking I/O: In Lesson 08, we mentioned that multiplexers (Selectors) enable non-blocking I/O communication.

Efficient Reactor thread model: Netty uses a master-slave Reactor multi-threading model. The server uses a main thread for client connection request operations. Once a connection is established, it listens for I/O events and creates a chain request when an event is detected.

The chain request is then registered to the I/O working thread responsible for subsequent I/O operations. This threading model helps address the issues caused by a single NIO thread being unable to handle a large number of clients and perform numerous I/O operations in high-load and high-concurrency scenarios.

Serial design: After receiving a message, the server performs encoding, decoding, reading, and sending operations. If these operations are implemented in parallel, it will lead to severe lock contention and a decrease in system performance. To improve performance, Netty uses a serial non-locking mechanism to perform chain operations. Netty provides a Pipeline structure that allows all the operations within a chain to be executed without thread switching during runtime.

Zero copy: In Lesson 08, we mentioned that when sending data from memory to the network, there are two copying actions: copying from user space to kernel space, and then from the kernel space to network I/O. NIO provides ByteBuffer, which can be used with Direct Buffers mode to directly allocate non-heap physical memory. This avoids the need for a second copy of the byte buffer and enables direct writing of data into the kernel space.

In addition to the above optimizations, we can also improve network throughput by configuring TCP parameters provided by socket programming. Netty can set these parameters using ChannelOption. TCP_NODELAY: The TCP_NODELAY option is used to control whether to enable the Nagle algorithm. The Nagle algorithm combines small packets into a larger packet by caching them, thereby avoiding the blocking of network transmission caused by a large number of small packets and improving the efficiency of network transmission. We can disable this algorithm to optimize for latency-sensitive applications.

SO_RCVBUF and SO_SNDBUF: These options can be used to adjust the size of the socket’s send and receive buffers according to the scenario.

SO_BACKLOG: The backlog parameter specifies the size of the client connection request queue. The server handles client connection requests sequentially, so only one client connection can be processed at a time. When multiple clients come in, the server puts the unprocessable client connection requests in the queue waiting for processing.

SO_KEEPALIVE: When this option is set, the connection will check the connection status of clients that have not sent data for a long time. When the client’s disconnection is detected, the server will reclaim the connection. We can set this time shorter to improve the efficiency of reclaiming connections.

4. Customizing the Message Format #

Next is the implementation of the message. We need to design a set of messages to describe specific validations, operations, data transmissions, and other contents. In order to improve transmission efficiency, we can consider designing based on our own business and architecture to achieve characteristics such as small message body, functional fulfillment, and easy parsing. We can refer to the following data format:

5. Encoding and Decoding #

In lesson 09, we analyzed the process of serialization encoding and decoding. For implementing a good network communication protocol, it is important to have compatibility with excellent serialization frameworks. If we are only transferring data objects, we can choose the relatively efficient Protobuf serialization, which helps improve network communication performance.

6. Adjusting Linux TCP Parameter Options #

If RPC is implemented based on TCP short connections, we can optimize network communication by modifying Linux TCP configurations. Before starting the optimization of TCP configurations, let’s understand the three-way handshake for establishing TCP connections and the four-way handshake for closing TCP connections, which will help with understanding the following content.

Three-way handshake

Four-way handshake

We can use the command sysctl -a | grep net.xxx to check the default TCP parameter settings of the Linux system. If we need to modify a configuration, we can add the configuration item that needs to be modified to vim /etc/sysctl.conf, and then use the command sysctl -p to apply the modified configuration settings. Usually, we modify the following configurations to improve network throughput and reduce latency.

The above is a detailed explanation of RPC optimization from different perspectives. Apart from the final Linux TCP configuration optimization in the system, other optimizations are mainly from the perspective of code programming, which ultimately realizes a path to optimize an RPC communication framework.

After understanding these, you can make a technical selection based on your business scenario and effectively solve performance issues that may arise in the process.

Summary #

In today’s distributed systems, especially as systems move towards microservices, communication between services becomes more frequent. Mastering the principles and optimization of communication protocols between services is a necessary skill.

In systems with a high level of concurrency, I prefer to use the RPC communication protocol implemented by Dubbo. The Dubbo protocol establishes a single long connection for communication, with non-blocking read and write operations using NIO for network I/O. It is also compatible with high-performance serialization frameworks such as Kryo, FST, and Protobuf, making it very practical for high-concurrency, small object transfer in business scenarios.

In enterprise-level systems, the business complexity is often higher than that of ordinary internet products. Communication between services may involve not only data transfer but also the transfer of images and files. Therefore, the design of RPC communication protocols focuses more on functional requirements and does not pursue extreme performance. Other communication frameworks have advantages in terms of functionality, ecosystem, ease of use, and ease of entry.

Thought Question #

Currently, there are many frameworks for implementing Java RPC communication, and there are also many protocols for implementing RPC communication. Besides the Dubbo protocol, have you used any other RPC communication protocols? Based on what we’ve learned, can you compare and discuss the advantages and disadvantages of each?