08 Network Communication Optimization How the Io Model Resolves Bottlenecks Under High Concurrency

08 Network communication optimization How the IO model resolves bottlenecks under high concurrency #

Hello, I’m Liu Chao.

When it comes to Java I/O, I’m sure you are familiar with it. You might use I/O operations to read and write files, or to implement data transmission through sockets…These are the most common operations related to I/O that we encounter in systems.

We all know that I/O is slower than memory, especially in the context of the current era of big data, where the performance issue of I/O becomes even more prominent. I/O read and write has become a system performance bottleneck in many application scenarios, which cannot be ignored.

Today, we will delve into the performance issues exposed by Java I/O in high-concurrency and big data business scenarios, starting from the source and learning about optimization methods.

What is I/O #

I/O is the main channel for machines to acquire and exchange information, and streams are the primary way to perform I/O operations.

In computers, a stream is a transformation of information. Streams are ordered, so relative to a certain machine or application, we usually refer to the information it receives from the outside as the input stream (InputStream), and the information it outputs from the machine or application as the output stream (OutputStream). Together, they are called input/output streams (I/O Streams).

When machines or programs exchange information or data, they always convert objects or data into some form of stream first, and then transmit them through the stream. After reaching the designated machine or program, the stream is converted back into object data. Therefore, a stream can be viewed as a carrier of data, enabling data exchange and transmission.

The I/O operation classes in Java are located in the java.io package. The InputStream, OutputStream, Reader, and Writer classes are the four basic classes in the I/O package. They respectively handle byte streams and character streams. The following figure shows them:

Looking back on my experience, I remember having a question when I first read the Java I/O stream documentation, which I would like to share with you now: “Whether it’s file I/O or network transmission, the minimum storage unit for information is always bytes, so why are I/O streams divided into byte stream operations and character stream operations?”

We know that converting characters to bytes involves time-consuming transcoding. If we don’t know the encoding type, it is easy to encounter garbled characters. Therefore, I/O streams provide a direct interface for manipulating characters, making it more convenient for us to perform character stream operations. Now let’s learn about “byte streams” and “character streams” separately.

1. Byte Streams #

InputStream/OutputStream are abstract classes for byte streams. These two abstract classes have several subclasses derived from them, each subclass handling different types of operations. If it’s file read/write operations, we use FileInputStream/FileOutputStream; if it’s array read/write operations, we use ByteArrayInputStream/ByteArrayOutputStream; if it’s ordinary string read/write operations, we use BufferedInputStream/BufferedOutputStream. The specific content is shown in the following figure:

2. Character Streams #

Reader/Writer are abstract classes for character streams. These two abstract classes also have several subclasses derived from them, each subclass handling different types of operations. The specific content is shown in the following figure:

Performance Issues with Traditional I/O #

We know that I/O operations can be classified into disk I/O operations and network I/O operations. The former refers to reading data from a disk and inputting it into memory, and then persistently outputting the read information onto the physical disk; the latter refers to reading information from a network and inputting it into memory, and ultimately outputting the information back into the network. However, both disk I/O and network I/O have serious performance issues in traditional I/O.

1. Multiple Memory Copies #

In traditional I/O, we can use an InputStream to read a data stream from the data source into a buffer, and an OutputStream to output the data to an external device (including disk or network). Let’s take a look at the specific process of the input operation in the operating system as shown in the following diagram:

The JVM issues a read() system call and initiates a read request to the kernel through the read system call.
The kernel sends a read command to the hardware and waits for the read to be ready.
The kernel copies the data to be read into the designated kernel buffer.
The operating system kernel copies the data to the user space buffer, and then the read system call returns.

During this process, the data is first copied from the external device to the kernel space, and then copied from the kernel space to the user space, resulting in two memory copy operations. This operation leads to unnecessary data copying and context switching, thereby reducing the performance of I/O.

2. Blocking #

In traditional I/O, the read() method of InputStream is a while loop operation that waits for data to be read and only returns when the data is ready. This means that if no data is ready, the read operation will be blocked and the user thread will be in a blocked state.

In the case of a small number of connection requests, using this method is not a problem and the response speed is high. However, in the event of a large number of connection requests, many listening threads need to be created. If these threads are blocked without any data ready, they will enter a blocked state. Once thread blocking occurs, these threads will continuously compete for CPU resources, resulting in a large number of CPU context switches and increased system performance overhead.

How to Optimize I/O Operations #

To address the performance issues mentioned above, both programming languages and operating systems have optimized I/O. JDK 1.4 introduced the java.nio package (short for new I/O), which improved memory copying and performance problems caused by blocking. JDK 1.7 further released NIO2, which introduced asynchronous I/O implemented at the operating system level. Let’s delve into the specific optimization implementations.

1. Optimize Read and Write Operations with Buffers #

In traditional I/O, stream-based I/O implementations such as InputStream and OutputStream process data byte by byte.

NIO, on the other hand, is block-based, processing data in blocks. The two most important components in NIO are the buffer and the channel. A buffer is a contiguous block of memory that serves as an intermediate for reading and writing data in NIO. A channel represents the source or destination of the buffered data and is used for reading from or writing to the buffer.

The key difference between traditional I/O and NIO is that traditional I/O is stream-oriented, while NIO is buffer-oriented. With NIO, we can read an entire file into memory and then process it, whereas traditional I/O reads the file and processes the data simultaneously. Although traditional I/O also uses buffered blocks, such as BufferedInputStream, it still cannot match the performance of NIO. By replacing traditional I/O operations with NIO, system performance can be significantly improved.

2. Use Direct Buffers to Reduce Memory Copying #

In addition to optimizing buffered blocks, NIO’s Buffer provides a class called DirectBuffer, which allows direct access to physical memory. Regular buffers allocate memory in the JVM heap, while direct buffers allocate memory directly in physical memory.

When data needs to be output to an external device, it must first be copied from user space to kernel space and then to the output device. With DirectBuffer, this process is simplified to only copying from kernel space to the external device, reducing data copying.

It’s worth mentioning that since DirectBuffer allocates non-JVM physical memory, creating and disposing of direct buffers incurs high costs. The memory allocated by DirectBuffer is not directly managed by the JVM’s garbage collector, but when the DirectBuffer wrapper class is garbage collected, the memory block is released through the Java Reference mechanism.

3. Avoid Blocking and Optimize I/O Operations #

NIO is also commonly referred to as Non-blocking I/O, as this term better reflects its characteristics. Why is this the case?

Even with the use of buffered blocks, traditional I/O still suffers from blocking issues. Due to the limited number of threads in a thread pool, when a large number of concurrent requests occur that exceeds the maximum number of available threads, the excess requests can only wait until there are idle threads in the thread pool that can be reused. When reading from the input stream of a socket, the read stream will block until one of the following three conditions is met to release the block:

Data becomes available for reading.
The connection is released.
A null pointer or I/O exception occurs.

Blocking is the major drawback of traditional I/O. NIO introduced non-blocking through channels and multiplexers, which optimize NIO. Let’s dive into the principles behind these two components.

Channels

As discussed earlier, traditional I/O involves copying data back and forth between user space and kernel space, and the data in kernel space is read from or written to the disk using the operating system’s I/O interfaces.

Initially, when an application called the operating system I/O interface, the CPU handled the allocation. The main problem with this approach was that “it consumed a lot of CPU power when a large number of I/O requests occurred.” Later, the operating system introduced Direct Memory Access (DMA), where the storage access between kernel space and the disk is handled entirely by DMA. However, this method still required CPU permission and relied on the DMA bus to perform data copying operations. If there were too many DMA buses, bus conflicts would occur.

The introduction of channels solved these problems. Channels have their own processors and can perform I/O operations between kernel space and the disk. In NIO, both reading and writing data involve channels. Since a channel is bidirectional, reading and writing can occur simultaneously.

Multiplexers (Selectors)

The selector is the foundation of Java NIO programming. It is used to check whether one or more NIO channels are ready for reading or writing.

Selectors are event-driven, and we can register accept and read event listeners in the selector. The selector continuously polls the registered channels. If a channel has a triggered event, it will become ready, and then an I/O operation can be performed.

One thread uses one selector and continuously polls multiple channels. We can set the channel to non-blocking when registering it with the selector. This way, if there is no I/O operation on the channel, the thread will not wait indefinitely but will continuously poll all channels, thereby avoiding blocking.

Currently, the I/O multiplexing mechanism used by operating systems is epoll, which has no limit of 1024 connection handles like the traditional select mechanism does. Therefore, in theory, a selector can poll thousands or tens of thousands of clients.

Let me provide an analogy with a real-life scenario that will help you better understand the roles and functions of channels and selectors in non-blocking I/O.

Imagine the scenario of monitoring multiple I/O connection requests as an entrance to a train station. In the past, ticket checking could only allow passengers boarding the nearest train to enter in advance, and there was only one ticket checker. If passengers from other trains wanted to enter the station, they would have to queue at the entrance. This is similar to I/O operations without thread pool implementation.

Later, the train station was upgraded with additional entrances, allowing passengers of different trains to enter through their respective designated entrances. This is similar to creating multiple listening threads to listen to I/O requests from different clients concurrently.

Finally, the train station underwent remodeling and could accommodate more passengers. Each train could carry more passengers, and the trains were scheduled appropriately, so passengers no longer had to queue in large crowds but could enter the station through a single, large, unified entrance. This large entrance is similar to the selector, and the trains are similar to channels, while the passengers are similar to the I/O streams.

Summary #

Java’s traditional I/O was initially implemented based on the InputStream and OutputStream, which operated on a byte-level. In high-concurrency and large data scenarios, this type of operation easily leads to blocking, resulting in poor performance. Additionally, the process of copying output data from user space to kernel space and then to the output device adds to the system’s performance overhead.

To address the performance issues caused by blocking, traditional I/O later introduced buffering, where the buffer block became the minimum unit of operation. However, the overall performance still fell short of expectations.

Then came NIO, which operates on the buffer block level. In addition to buffering, NIO introduced two components - “channels” and “selectors” - to achieve non-blocking I/O. NIO is suitable for scenarios with a large number of I/O connection requests. These three components together improve the overall I/O performance.

You can practice traditional I/O and NIO with a few simple examples on Github.

Reflection Questions #

In JDK version 1.7, Java released an upgrade package for NIO called NIO2, also known as AIO. AIO implements true asynchronous I/O, which means it directly hands over I/O operations to the operating system for asynchronous processing. This is also an optimization for I/O operations, so why do many container communication frameworks still use NIO?