09 Netty Primer Using It for Network Programming Is Good Above

09 Netty Primer Using It for Network Programming is Good Above #

Those who are familiar with Java should know that the JDK itself provides a set of NIO APIs, but this native API has a series of problems.

The Java NIO API is extremely complex. To write mature and usable Java NIO code, you need to be proficient in components such as Selector, ServerSocketChannel, SocketChannel, ByteBuffer, etc. in the JDK, and also understand some counterintuitive designs and underlying principles, which is very unfriendly to beginners.
If you directly use Java NIO for development, the difficulty and amount of development will be very high. We need to fill in many reliable implementation aspects ourselves, such as connection reconnection caused by network fluctuations, half-package read and write, etc. This will lead to a situation where the core business logic is relatively simple, but there is a lot of code to complete other common capabilities, resulting in a long development time. At this time, a unified NIO framework is needed to encapsulate these common capabilities.
Bugs in the JDK itself. One of the most famous bugs is the Epoll Bug, which will cause the Selector to spin empty, causing the CPU usage rate to reach 100%. This will prevent the execution of business logic and reduce service performance.

Netty encapsulates the JDK’s built-in NIO API and solves some of the JDK’s own problems. It has the following advantages:

Easy to get started, easy to use, complete documentation, and no other dependencies, only relying on the JDK is enough.
High performance, high throughput, low latency, and low resource consumption.
Flexible thread model, supporting both blocking and non-blocking I/O models.
High code quality, there are basically no bugs in the mainstream versions currently.

Because Netty has the above advantages, many Internet companies and open-source RPC frameworks treat it as the underlying library for network communication, such as Apache Spark, Apache Flink, Elastic Search, and Dubbo, which we will analyze in this course.

Next, we will introduce the core design of Netty from the perspective of I/O and thread models, helping you to fully understand the principles of Netty.

Netty I/O Model Design #

When performing network I/O operations, the way of reading and writing data will greatly determine the performance of I/O. As an excellent networking foundation library, Netty adopts the NIO I/O model, which is also one of the important reasons for its high performance.

1. Traditional Blocking I/O Model #

In the traditional blocking I/O model (also known as BIO), as shown in the diagram below, each request requires a separate thread to complete the entire operation of reading data, processing business logic, and writing back data.

In this model, each thread can only be bound to one connection at a time, as shown in the diagram below. When the volume of concurrent requests is high, a large number of threads need to be created to handle the connections, which causes the system to waste a lot of resources on thread switching and reduces the performance of the program. We know that the speed of network data transmission is much slower than the processing speed of the CPU. After a connection is established, there may not be readable data or writable connections, so the thread can only block and wait. This not only prevents the full utilization of CPU computing power but also causes a waste of resources due to thread switching.

2. I/O Multiplexing Model #

To address the shortcomings of the traditional blocking I/O model, the I/O multiplexing model has made significant improvements in performance. In the I/O multiplexing model, multiple connections share a single Selector object, which is responsible for detecting the read and write events of these connections. At this time, the number of threads does not need to be the same as the number of connections. Only a few threads need to periodically check the read and write status of the connections from the Selector. There is no need for a large number of threads to block and wait for connections. When a connection has new data to process, the operating system notifies the thread, which then returns from the blocking state and starts reading and writing, as well as subsequent business logic processing. The I/O multiplexing model is shown in the diagram below:

Netty adopts the above I/O multiplexing model. With the presence of the multiplexer Selector, it can concurrently handle hundreds or thousands of network connections, greatly increasing the server’s processing capacity. Moreover, the Selector does not block the thread. This means that when a connection is not readable or writable, the thread can process other readable or writable connections, which maximizes the efficiency of I/O threads and avoids thread switching caused by frequent I/O blocking. As shown in the diagram below:

From the perspective of data processing, the traditional blocking I/O model deals with byte streams or character streams, which means reading one or more bytes from a data stream in a sequential manner and unable to arbitrarily change the position of the read pointer. In NIO, however, this traditional I/O stream concept is discarded, and the concepts of Channel and Buffer are introduced. Data can be read from a Channel into a Buffer or written from a Buffer into a Channel. Unlike streams in traditional I/O, Buffers in NIO do not require sequential operations and can read or write data at any position.

Netty Thread Model Design #

After a server program reads binary data, it needs to decode it first to obtain messages that the program logic can understand. Then, the messages are passed to the business logic for processing, and generate corresponding results to return to the client. Whether to execute the encoding and decoding logic, message dispatching logic, business processing logic, and response returning logic in a single thread or distribute them to different threads will greatly affect the performance of the program. Therefore, an excellent thread model is crucial for a high-performance network library.

Netty adopts the design of the Reactor thread model. The core principle of the Reactor pattern, also known as the Dispatcher pattern, is that the Selector is responsible for monitoring I/O events. After an I/O event is detected, it is dispatched to the relevant thread for processing.

To help you better understand the design concept of the Netty thread model, we will start with the basic single reactor single-thread model and gradually increase the complexity of the model, until we reach the mature thread model design currently used by Netty.

1. Single Reactor Single Thread #

The Reactor object listens for client request events and dispatches them when received. If it is a connection establishment event, the Acceptor handles the connection request through Accept, and then creates a Handler object to handle the business request after the connection is established. If it is not a connection establishment event but a data read/write event, the Reactor will dispatch the event to the corresponding Handler for processing. The single thread in this case calls the Handler object to complete the entire process of reading data, business processing, and sending responses. Of course, during this process, there may also be situations where the connection is not readable or writable, and the single thread will execute the logic of other Handlers instead of blocking and waiting. The specific situation is shown in the following figure:

The advantage of the single Reactor single thread model is that the thread model is simple, without introducing multiple threads, so there is no problem of multi-threading concurrency and competition.

However, its drawbacks are also very obvious, that is, the performance bottleneck problem. A thread can only run on one CPU, and the number of connections it can handle is limited, and it cannot fully utilize the advantages of multi-core CPUs. Once a business logic takes a long time, this single thread will get stuck on it and cannot handle other connection requests, causing the program to enter a deadlock state and reduce availability. It is precisely because of this limitation that this thread model is generally only used on the client side.

2. Single Reactor Multi-thread #

In the single Reactor multi-thread architecture, after the Reactor monitors the client request, if it is a connection establishment request, the Acceptor handles it through accept, and then creates a Handler object to handle the business request after the connection is established. If it is not a connection establishment request, the Reactor will dispatch the event to the corresponding Handler for processing. Up to this point, the process is basically the same as the single Reactor single thread model, the only difference is that the thread that executes the Handler logic belongs to a thread pool.

Single Reactor multi-thread model

Clearly, the single Reactor multi-thread model can fully utilize the processing power of multi-core CPUs and improve the throughput of the entire system. However, introducing a multi-threading model requires consideration of issues such as thread concurrency, data sharing, and thread scheduling. In this model, only one thread is used to process all I/O events monitored by the Reactor, including connection establishment events and read/write events. When the number of connections keeps increasing, the unique Reactor thread will also encounter bottlenecks.

3. Master-Slave Reactor Multi-thread #

To solve the problems in the single Reactor multi-thread model, we can introduce multiple Reactors. Among them, the main Reactor is responsible for handling connection establishment events monitored by the MainReactor through the Acceptor object. After the Acceptor completes the establishment of the network connection, the MainReactor will allocate the established connection to the SubReactor for subsequent monitoring.

When a connection is assigned to a SubReactor, the SubReactor is responsible for monitoring the read/write events on that connection. When a new read event (OP_READ) occurs, the Reactor sub-thread will call the corresponding Handler to read the data, and then distribute it to the thread in the Worker thread pool for processing and return the result. After the processing is completed, the Handler will call send to return the response to the client, provided that the connection has a writable event (OP_WRITE) to send data.

Master-Slave Reactor multi-thread model

The Master-Slave Reactor multi-thread model solves the bottleneck of a single Reactor. The responsibility of the Master-Slave Reactor is clear, with the main Reactor only responsible for monitoring the connection establishment events, and the SubReactor only responsible for monitoring read/write events. The entire Master-Slave Reactor multi-thread architecture fully utilizes the advantages of multi-core CPUs, can support expansion, and is highly decoupled from the specific business logic, with high reusability. However, the downside is that the interaction is slightly more complex and requires a certain programming threshold.

4. Netty Thread Model #

Netty supports several thread modes mentioned above. In terms of server-side design, Netty is modified based on the Master-Slave Reactor multi-thread model, as shown in the following figure:

Netty abstracts two sets of thread pools: BossGroup is specifically used to accept client connections, and WorkerGroup is specifically used for network reading and writing. Both BossGroup and WorkerGroup are of type NioEventLoopGroup, which is an event loop group that includes multiple event loops. Each event loop is a NioEventLoop.

NioEventLoop represents a continuously looping thread that performs processing tasks. Each NioEventLoop corresponds to a Selector object used to listen to the connections bound to it. The events on these connections are processed by the thread corresponding to the Selector. Each NioEventLoopGroup can have multiple NioEventLoops, which means multiple threads.

Each Boss NioEventLoop listens for the accept events of establishing connections on the Selector, and then handles the accept events to establish network connections with clients, generating corresponding NioSocketChannel objects, with each NioSocketChannel representing a network connection. After that, the NioSocketChannel is registered with the Selector of a Worker NioEventLoop.

Each Worker NioEventLoop listens for read/write events on the corresponding Selector. When read/write events occur, they are handled through the Pipeline. A Pipeline is associated with a Channel, and multiple ChannelHandlers can be added to the Pipeline, each ChannelHandler can contain certain logic, such as encoding and decoding. When the Pipeline processes requests, it calls the ChannelHandlers in the specified order.

Summary #

In this lesson, we focused on introducing some background knowledge about network I/O and the macroscopic design models of Netty.

First, we introduced some shortcomings and deficiencies of Java NIO, which is also a significant reason for the emergence of Netty and other network libraries.
Next, we introduced Netty’s design on the I/O model and explained the advantages of I/O multiplexing.
Finally, starting from the basic single Reactor single thread model, we gradually delved into and introduced common network I/O thread models, and also introduced the thread model currently used by Netty.

Of course, regarding the related content of Netty, I also welcome you to share and communicate with me in the comment area.