04 Event Dispatch Layer, Why Is Event Loop the Essence of Netty

04 Event dispatch layer, why is EventLoop the essence of Netty #

Hello, I’m Ruodi. Through the previous courses, we have learned that the secret to Netty’s high performance lies in its Reactor thread model. EventLoop is the core processing engine of Netty’s Reactor thread model. So how does it efficiently implement event loop and task processing mechanisms? In this lesson, we will learn about the implementation principles and best practices of EventLoop.

Revisiting the Reactor Thread Model #

The design of a network framework cannot be separated from the I/O thread model. The quality of the thread model directly determines the system’s throughput, scalability, security, and other aspects. Currently, mainstream network frameworks almost all use the I/O multiplexing solution. As one of the event dispatchers, the Reactor pattern is responsible for dispatching read and write events to their respective handlers. Doug Lea, the famous author of the Java concurrency package, explained the evolutionary process of I/O models in server development in his article “Scalable I/O in Java”. The three Reactor thread models in Netty also derive from this classic article. Now let’s analyze these three Reactor thread models in detail.

Single-threaded model #

(Taken from Lea D. Scalable IO in Java)

The above diagram describes the structure of the single-threaded model in Reactor. In the Reactor single-threaded model, all I/O operations (including connection establishment, data read and write, event dispatch, etc.) are handled by a single thread. The single-threaded model is simple in logic, but it also has obvious limitations:

A single thread can only handle a limited number of connections, and the CPU is prone to being fully utilized, resulting in a performance bottleneck.
When multiple events are triggered at the same time, if one event is not processed, the subsequent events cannot be executed, resulting in message backlog and request timeouts.
While the thread is processing I/O events, the Select operation cannot simultaneously handle connection establishment, event dispatch, and other operations.
If the I/O thread is constantly under heavy load, it is likely to cause the server node to become unavailable.

Multi-threaded model #

(Taken from Lea D. Scalable IO in Java)

Due to the performance bottleneck of the single-threaded model, the multi-threaded model emerged as a solution. In the Reactor multi-threaded model, the business logic is handed over to multiple threads for processing. Other operations of the multi-threaded model are similar to the single-threaded model, such as reading data that still retains a serialized design. When the client sends data to the server, the Select operation will listen for readable events, and after the data is read, it will be submitted to the business thread pool for concurrent processing.

Master-slave multi-threaded model #

(Taken from Lea D. Scalable IO in Java)

The master-slave multi-threaded model consists of multiple Reactor threads, each with its own Selector object. The MainReactor is only responsible for handling the accept events of client connections. After the connection is successfully established, the newly created connection object is registered with SubReactor. Then, SubReactor assigns an I/O thread from the thread pool and binds it to the connection. It is responsible for all I/O events during the connection lifespan.

Netty recommends using the master-slave multi-threaded model, which easily achieves thousands or even tens of thousands of client connections. In scenarios with massive concurrent client requests, the master-slave multi-threaded model can even increase the number of SubReactor threads appropriately, thereby leveraging the multi-core capability to improve system throughput.

After introducing the three Reactor thread models mentioned above and combining them with their respective architectural diagrams, we can roughly summarize the four steps of the Reactor thread model’s operating mechanism: connection registration, event loop, event dispatch, and task processing, as shown in the following diagram.

Connection Registration: After the Channel is established, it is registered with the Selector selector in the Reactor thread.
Event Loop: Poll all registered Channels’ I/O events in the Selector selector.
Event Dispatch: Allocate the corresponding processing thread for the ready I/O events.
Task Processing: Process the assigned tasks.
Task processing: The Reactor thread is also responsible for handling non-I/O tasks in the task queue. Each worker thread retrieves tasks from its own task queue and executes them asynchronously.

The above introduces the evolution process and basic principles of the Reactor thread model. Netty also follows the running mechanism of the Reactor thread model. Now let’s take a look at how Netty implements the Reactor thread model.

Implementation of Netty EventLoop #

What is EventLoop #

The concept of EventLoop is not unique to Netty. It is a program model for event waiting and processing that can solve the problem of high resource consumption in multi-threading. For example, Node.js adopts the EventLoop running mechanism, which not only consumes fewer resources but also supports large-scale traffic access.

The diagram below shows the common running mode of EventLoop. Whenever an event occurs, the application will put the generated event into the event queue. Then the EventLoop will poll the events from the queue and execute them or distribute them to the corresponding event listeners. The ways of event execution usually include immediate execution, deferred execution, and periodic execution.

How does Netty implement EventLoop #

In Netty, EventLoop can be understood as the event handling engine of the Reactor thread model. Each EventLoop thread maintains a Selector and a task queue called the taskQueue. It is mainly responsible for handling I/O events, normal tasks, and scheduled tasks.

In Netty, NioEventLoop is recommended as the implementation class. So how does Netty implement NioEventLoop? First, let’s take a look at the core run() method source code of NioEventLoop. In this course, we will not analyze the source code in depth, but just understand the implementation structure of NioEventLoop.

protected void run() {
    for (;;) {
        try {
            try {
                switch (selectStrategy.calculateStrategy(selectNowSupplier, hasTasks())) {
                    case SelectStrategy.CONTINUE:
                        continue;
                    case SelectStrategy.BUSY_WAIT:
                    case SelectStrategy.SELECT:
                        select(wakenUp.getAndSet(false)); // Polling I/O events

if (wakenUp.get()) {

    selector.wakeup();

}

default:

}

} catch (IOException e) {

    rebuildSelector0();

    handleLoopException(e);

    continue;

}
cancelledKeys = 0;

needsToSelectAgain = false;

final int ioRatio = this.ioRatio;

if (ioRatio == 100) {

    try {

        processSelectedKeys(); // Handle I/O events

    } finally {

        runAllTasks(); // Handle all tasks

    }

} else {

    final long ioStartTime = System.nanoTime();

    try {

        processSelectedKeys(); // Handle I/O events

    } finally {

        final long ioTime = System.nanoTime() - ioStartTime;

        runAllTasks(ioTime * (100 - ioRatio) / ioRatio); // Handle async task queue after handling I/O events

    }

}

} catch (Throwable t) {

    handleLoopException(t);

}

try {

    if (isShuttingDown()) {

        closeAll();

        if (confirmShutdown()) {

            return;

        }

    }

} catch (Throwable t) {

    handleLoopException(t);

}

}

The structure of the above source code is relatively clear. The processing flow of each loop of NioEventLoop includes several steps: event polling, event processing, and task processing. It is a typical implementation of the Reactor thread model. In addition, Netty provides a parameter called ioRatio that can adjust the time ratio between I/O event processing and task processing. Below, we will focus on the core parts of event processing and task processing, and explain in detail the implementation principles of Netty EventLoop.

Event Processing Mechanism #

Combining with the overall architecture of Netty, let’s take a look at the event flow chart of EventLoop to better understand the design principles of Netty EventLoop. The event processing mechanism of NioEventLoop adopts a lock-free serialization design approach.

BossEventLoopGroup and WorkerEventLoopGroup contain one or more NioEventLoops. BossEventLoopGroup is responsible for listening to client’s Accept events. When an event is triggered, the event is registered to one NioEventLoop in WorkerEventLoopGroup. When a new Channel is created, only one NioEventLoop is selected to be bound to it. This means that all event handling for the life cycle of the Channel is thread-independent, and there is no overlap between different NioEventLoop threads.
After NioEventLoop completes data reading, it will call the bound ChannelPipeline to propagate the event. ChannelPipeline is thread-safe, and the data will be passed to the first ChannelHandler in the ChannelPipeline. After the data processing is completed, the processed data is passed to the next ChannelHandler. The entire process is executed in a serialized manner, without thread context switching issues.

The lock-free serialization design of NioEventLoop not only maximizes the system throughput, but also reduces the difficulty of developing business logic. Developers do not need to spend too much effort on thread safety issues. Although single-threaded execution avoids thread switching, its drawback is that it cannot perform I/O operations that take too long. Once an I/O event is blocked, all subsequent I/O events will not be executed, and even lead to event backlogging. When using Netty for program development, we must have sufficient risk awareness of the implementation logic of ChannelHandler.

The reliability of NioEventLoop threads is crucial. Once NioEventLoop is blocked or enters an empty polling state, the entire system will become unavailable. In the JDK, the implementation of Epoll has a bug. Even if the event list polled by Selector is empty, the NIO thread can still be awakened, causing 100% CPU usage. This is the notorious bug of JDK epoll empty polling. Netty, as a high-performance and reliable network framework, needs to ensure the safety of I/O threads. So, how does Netty solve the JDK epoll empty polling bug? In fact, Netty does not solve this problem from the root cause, but cleverly avoids this problem.

Leaving aside other details, let’s directly locate to the last part of the code in the event polling select() method and see how Netty avoids the JDK epoll empty polling bug.

long time = System.nanoTime();

if (time - TimeUnit.MILLISECONDS.toNanos(timeoutMillis) >= currentTimeNanos) {

    selectCnt = 1;

} else if (SELECTOR_AUTO_REBUILD_THRESHOLD > 0 &&

        selectCnt >= SELECTOR_AUTO_REBUILD_THRESHOLD) {

    selector = selectRebuildSelector(selectCnt);

    selectCnt = 1;

    break;

}

Netty provides a detection mechanism to determine whether a thread may enter an empty polling state. The specific implementation is as follows:

Record the current time currentTimeNanos before each Select operation.
time - TimeUnit.MILLISECONDS.toNanos(timeoutMillis) >= currentTimeNanos: If the duration of the event polling is greater than or equal to timeoutMillis, it means it is normal. Otherwise, it indicates that the blocking time did not meet expectations and may trigger the JDK empty polling bug.
Netty introduces a counter variable selectCnt. In normal cases, selectCnt will be reset, otherwise it will be increased. When selectCnt reaches the threshold SELECTOR_AUTO_REBUILD_THRESHOLD (default 512), a Selector rebuilding will be triggered.

Netty cleverly avoids the JDK bug using this method. All the SelectionKeys in the exceptional Selector will be re-registered to the newly created Selector, and the exceptional Selector can be discarded after the rebuilding is completed.

Task Processing Mechanism #

NioEventLoop is responsible for processing not only I/O events but also tasks in the task queue. The task queue follows the FIFO rule to ensure the fairness of task execution. The types of tasks handled by NioEventLoop can be divided into three categories.

Normal tasks: Tasks are added to the task queue taskQueue by calling the execute() method of NioEventLoop. For example, Netty encapsulates WriteAndFlushTask to the taskQueue when writing data. The implementation class of taskQueue is MpscChunkedArrayQueue, which is a multiple producer single consumer queue. It ensures thread safety when multiple threads concurrently add tasks.
Scheduled tasks: A scheduled task is added to the scheduled task queue scheduledTaskQueue by calling the schedule() method of NioEventLoop. This allows the task to be periodically executed. For example, sending heartbeat messages, etc. The scheduled task queue scheduledTaskQueue is implemented using a priority queue PriorityQueue.
Tail tasks: tailTasks has a lower priority compared to the normal task queue. After executing tasks in the taskQueue, tasks in the tail task queue will be executed. Tail tasks are not commonly used and are mainly used for some finalization work, such as calculating the execution time of the event loop, reporting monitoring information, etc.

Next, let’s analyze the logic of task processing in NioEventLoop based on the source code structure of runAllTasks. The source code implementation is as follows:

protected boolean runAllTasks(long timeoutNanos) {

    // 1. Merge scheduled tasks into the normal task queue

    fetchFromScheduledTaskQueue();

    // 2. Get tasks from the normal task queue

    Runnable task = pollTask();

    if (task == null) {

        afterRunningAllTasks();

        return false;

    }

    // 3. Calculate the timeout for task processing

Hope this helps! final long deadline = ScheduledFutureTask.nanoTime() + timeoutNanos;

    long runTasks = 0;

    long lastExecutionTime;

    for (;;) {

        // 4. Execute tasks safely
        
        safeExecute(task);

        runTasks ++;

        // 5. Check for timeout every 64 tasks
        
        if ((runTasks & 0x3F) == 0) {

            lastExecutionTime = ScheduledFutureTask.nanoTime();

            if (lastExecutionTime >= deadline) {

                break;

            }

        }

        task = pollTask();

        if (task == null) {

            lastExecutionTime = ScheduledFutureTask.nanoTime();

            break;

        }

    }

    // 6. Cleanup

    afterRunningAllTasks();

    this.lastExecutionTime = lastExecutionTime;

    return true;

}

I have annotated the specific implementation steps in the code as comments, which can be divided into 6 steps.

fetchFromScheduledTaskQueue function: Fetch the scheduled tasks from the scheduledTaskQueue and aggregate them into the normal task queue taskQueue. Only tasks with deadlines smaller than the current time can be merged.
Get a task from the normal task queue taskQueue.
Calculate the maximum timeout for task execution.
safeExecute function: Execute tasks safely, actually directly calling the run() method of the Runnable.
Check for timeout every 64 tasks. If the execution time is greater than the maximum timeout, stop executing the tasks immediately to avoid affecting the processing of the next round of I/O events.
Finally, get and execute the tasks from the tail queue.

Best Practices for EventLoop #

Using EventLoop effectively is crucial in daily development. Here are some best practices for EventLoop based on practical work experience.

The process of establishing a network connection, including the three-way handshake and security authentication, can consume a significant amount of time. It is recommended to use two EventLoopGroup instances, one for the boss and one for the worker, which helps to alleviate the pressure on the Reactor threads.
Since the Reactor thread pattern is suitable for processing short-duration tasks, for ChannelHandlers that take a long time to process, consider maintaining a separate business thread pool. Encapsulate the decoded data into tasks for asynchronous processing to avoid blocking the EventLoop due to ChannelHandler blocking.
If the business logic execution time is short, it is recommended to execute it directly in the ChannelHandler. For example, for encoding and decoding operations, this can help avoid excessive design complexity.
Avoid designing too many ChannelHandlers. Having too many ChannelHandlers can cause problems in terms of system performance and maintainability. When designing the business architecture, it is necessary to clearly define the boundary between the business layers and the Netty layers. Avoid adding all business logic to ChannelHandlers.

Summary #

In this lesson, we have learned about the core processing engine of the Netty Reactor thread model, EventLoop, and understood the background of EventLoop. Combining the Reactor master-slave multi-threaded model, we summarized the functions and uses of Netty EventLoop.

MainReactor thread: Handles client request access.
SubReactor thread: Reads data, dispatches and executes I/O events.
Task execution thread: Used to execute normal tasks or scheduled tasks such as idle connection detection and heartbeat reporting.

The design principles of EventLoop are applied in many high-performance frameworks such as Redis, Nginx, and Node.js. Does the design principle of EventLoop inspire you? In the subsequent chapters of the Source Code section, we will further introduce the source code implementation of EventLoop. Once you understand this infinite loop, you can consider yourself a Netty expert.