06 Case Analysis- How Buffering Can Speed Up Code #

This lesson will provide a detailed introduction to the optimization technique of “buffering”. Buffering was mentioned in the previous lesson on reuse optimization, so you can review it if necessary.

Understanding the Essence of Buffering #

Buffering involves temporarily storing data and then performing batch transfers or operations. It is often used in a sequential manner to mitigate the frequent but slow random read/write operations between different devices.

You can think of a buffer as a reservoir. If there is water in the reservoir, it will flow at a constant rate without interruption, regardless of the varying speed of the water supply. By determining the state of the water in the reservoir, the speed of the water supply can be controlled freely.

Another way to imagine buffering is the process of making dumplings. The stuffing needs to wait for the dough to be rolled out. If the person rolling the dough hands over each piece to the person making the stuffing, the process will be slow. However, if a bowl is placed in between, the person rolling the dough can simply throw the dough into the bowl, and the person making the stuffing can take it from the bowl. This process is much faster. This idea of buffering is also commonly used in many factory assembly lines, demonstrating the universality and practicality of buffering.

From a macro perspective, the JVM heap is a large buffer. Code continuously produces objects in the heap space, while the garbage collector silently performs garbage collection in the background.

Based on the above analogies and implications, you can discover the benefits of buffering:

Both parties involved in buffering can maintain their own operational rhythm without disrupting the order of operations. Operations can be carried out in a one-by-one sequential manner.
Processing in batches reduces network interactions and heavy I/O operations, thereby reducing performance overhead.
It optimizes user experience, such as buffering and loading of audio/video, which achieves smooth playback by buffering data in advance.

Buffering is widely used in the Java language. When searching for “Buffer” in IDEA (Integrated Development Environment), you can see a long list of classes, with the most typical ones being file reading and writing character streams.

File Input and Output Streams #

Next, I will explain using file reading and writing character streams as an example.

Java’s I/O stream design uses the decorator pattern. When adding new functionality to a class, the decorator pattern allows the decorated object to be passed as a parameter to the decorator, encapsulating it into a new method with additional functionality. The following diagram illustrates the typical representation of the decorator pattern. In terms of adding functionality, the decorator pattern is more flexible than creating subclasses.

Within the API for reading and writing streams, BufferedInputStream and BufferedReader can accelerate the reading of characters, while BufferedOutputStream and BufferedWriter can accelerate writing.

The following is the code implementation for directly reading a file:

int result = 0; 
try (Reader reader = new FileReader(FILE_PATH)) { 
    int value; 
    while ((value = reader.read()) != -1) { 
        result += value; 
    } 
} 
return result;

To read using the buffered method, simply decorate the FileReader:

int result = 0; 
try (Reader reader = new BufferedReader(new FileReader(FILE_PATH))) { 
    int value; 
    while ((value = reader.read()) != -1) { 
        result += value; 
    } 
} 
return result;

Now let’s take a look at the specific implementation of the BufferedInputStream class, which is similar:

// Code from JDK
public synchronized int read() throws IOException { 
    if (pos >= count) { 
        fill(); 
        if (pos >= count) 
            return -1; 
    } 
    return getBufIfOpen()[pos++] & 0xff; 
}

When the contents of the buffer have been read, the fill function is used to read the input stream into the buffer:

// Code from JDK
private void fill() throws IOException { 
    byte[] buffer = getBufIfOpen(); 
    if (markpos < 0) 
        pos = 0;            /* no mark: throw away the buffer */ 
    else if (pos >= buffer.length)  /* no room left in buffer */ 
        if (markpos > 0) {  /* can throw away early part of the buffer */ 
            int sz = pos - markpos; 
            System.arraycopy(buffer, markpos, buffer, 0, sz); 
            pos = sz; 
            markpos = 0; 
        } else if (buffer.length >= marklimit) { 
            markpos = -1;   /* buffer got too big, invalidate mark */ 
            pos = 0;        /* drop buffer contents */ 
}

} else if (buffer.length >= MAX_BUFFER_SIZE) {
    throw new OutOfMemoryError("Required array size too large");
} else { /* grow buffer */
    int nsz = (pos <= MAX_BUFFER_SIZE - pos) ?
            pos * 2 : MAX_BUFFER_SIZE;
    if (nsz > marklimit)
        nsz = marklimit;
    byte nbuf[] = new byte[nsz];
    System.arraycopy(buffer, 0, nbuf, 0, pos);
    if (!bufUpdater.compareAndSet(this, buffer, nbuf)) {
        // Can't replace buf if there was an async close.
        // Note: This would need to be changed if fill()
        // is ever made accessible to multiple threads.
        // But for now, the only way CAS can fail is via close.
        // assert buf == null;
        throw new IOException("Stream closed");
    }
    buffer = nbuf;
}
count = pos;
int n = getInIfOpen().read(buffer, pos, buffer.length - pos);
if (n > 0)
    count = n + pos;
}

If the length of the buffer exceeds the maximum buffer size, an OutOfMemoryError is thrown. Otherwise, the buffer is grown by doubling its current size or by setting its size to the maximum buffer size, whichever is smaller. If the new size is larger than the mark limit, it is set to the mark limit. A new buffer of the new size is created and the contents of the original buffer are copied to the new buffer. If the compare and set operation fails, indicating an asynchronous close, an IOException is thrown. The buffer is then updated to the new buffer and the count is set to the current position. Finally, data is read into the buffer using the wrapped InputStream’s read method, and if any data is read, the count is updated accordingly.

This adjustment of reading positions and updating buffer positions is done because when operating on objects like files or sockets, reading from these slow devices via frequent interaction can be very slow. Using a buffer allows the data to be stored in memory, greatly improving the read and write speed.

Why not read all data into the buffer instead of doing this? This is a trade-off. If the buffer is too large, it will increase the time for each read and write operation, and memory is expensive and cannot be used indefinitely. The default buffer size for buffered streams is 8192 bytes, which is considered a reasonable compromise.

It’s like moving bricks. If you move them one by one, a lot of time will be wasted on back and forth trips. But if you have a small cart, the number of round trips will be greatly reduced, resulting in improved efficiency.

The JMH comparison of reading files using FileReader and BufferedReader shown in the image below (related code can be found in the repository) demonstrates the significant improvement in efficiency when using buffering (system file cache not considered).

Log Buffering #

Logs are something that programmers deal with most often. In high-concurrency applications, even with log sampling, the number of logs can still be astonishing, so choosing a high-speed log component is crucial.

SLF4J is the standard logging library in Java. It is an abstract adaptation layer that allows you to use any Java logging library, with Logback being the most popular implementation, which supports automatic reloading after modifications and is even more popular than Java’s built-in JUL.

Logback also has high performance, and one reason for that is asynchronous logging. It uses a buffer queue to temporarily store logged messages, and when the buffer reaches a certain threshold, the contents of the buffer are written to a file. Using asynchronous logging has two considerations:

Writing logs synchronously blocks business operations, resulting in increased service interface latency.
Writing logs to disk is costly. If every log generated is written immediately, the CPU will spend a lot of time on disk I/O.

Configuring Logback for asynchronous logging is also relatively simple. You need to add a layer of logic for asynchronous output based on the normal configuration (see repository for details).

<appender name="ASYNC" class="ch.qos.logback.classic.AsyncAppender">
    <discardingThreshold>0</discardingThreshold>
    <queueSize>512</queueSize>
    <!-- Specify an existing appender here -->
    <appender-ref ref="FILE"/>
</appender>

In the diagram above, after asynchronous logging, log messages will be temporarily stored in an ArrayBlockingQueue, and there will be a worker thread in the background constantly retrieving the contents of the buffer and writing them to disk.

There are three key parameters in the diagram above:

queueSize represents the size of the queue, which is 256 by default. If this value is set too high and the power is suddenly cut off with a large amount of logs, the contents of the buffer may be lost.
maxFlushTime represents the time to continue writing tasks after the log context is closed. This is achieved by calling the join method of the Thread class (worker.join(maxFlushTime)).
discardingThreshold allows discarding lower-level logs when the queueSize is close to the limit. By default, this value is 80% of the queue length. However, if you are concerned about the possibility of losing business logs, you can set this value to 0 to print all logs.

Buffer Optimization Ideas #

There is no doubt that buffering can improve performance, but it usually introduces an asynchronous problem, making the programming model more complex.

Based on the examples of file I/O streams and Logback, let’s take a look at some general operations for buffer design.

As shown in the diagram below, resource A performs read or write operations on resource B. This is a normal workflow. However, due to the insertion of an additional storage layer, the workflow is cut off, and you need to manually handle resource coordination between the two sides.

Based on the different resources, the operations after the workflow is interrupted can be divided into synchronous and asynchronous operations.

1. Synchronous Operations #

The programming model for synchronous operations is relatively simple and can be completed within a single thread. You just need to control the size of the buffer and determine the timing of processing. For example, when the buffer reaches a threshold size or when the elements in the buffer exceed a timeout period, this triggers a batch operation.

Since all operations are performed in a single thread or within synchronized blocks, and the processing capacity of resource B is limited, many operations will be blocked and wait on the calling thread. For example, when writing a file, it needs to wait for the previous data to be written before processing the next request.

2. Asynchronous Operations #

Asynchronous operations are much more complex.

The producer of the buffer is generally invoked synchronously, but it can also be filled asynchronously. Once asynchronous operations are adopted, it involves some response strategies for when the buffer is full.

At this time, these strategies should be abstracted and selected based on the business attributes, such as discarding directly, throwing exceptions, or waiting directly in the user’s thread. You will find that this is similar to the saturation strategy of the thread pool. Detailed concepts in this area will be explained in Lesson 12.

Many application systems also have more complex strategies, such as waiting in the user’s thread, setting a timeout period, and callback functions after successfully entering the buffer.

For the consumption of the buffer, starting a thread is generally used. If there are multiple threads consuming the buffer, there may be information synchronization and sequencing issues.

3. Kafka Buffer Example #

Here is an example to explain the above point: Can data be lost in Kafka producers?

To answer this question, we need to first understand some of the encapsulation of Kafka producers, and one of the points that has a significant impact on performance is buffering.

The producer will package multiple messages sent to the same partition into a batch (buffer). When the batch is full (parameter batch.size), or when the messages reach the timeout (parameter linger.ms), the messages in the buffer will be sent to the broker.

The default size of this buffer is 16KB. If the producer’s power suddenly goes out, this 16KB of data will not have a chance to be sent out. This results in message loss.

There are two ways to solve this:

Set the buffer size to be very small, in which case the messages will be sent individually, which will seriously affect performance.
Log a message before sending it, and after the message is sent successfully, log another message through the callback. By scanning the generated logs, it is possible to determine which messages have been lost.

Another interview question is: Does Kafka producer affect the high availability of the business?

This is also related to the producer’s buffer. The buffer size is limited after all. If messages are produced too quickly, or there are network problems between the producer and broker nodes, the buffer will always be in a full state. In this case, how will new messages be handled?

By configuring the producer’s timeout parameter and retry count, new messages can be blocked for the business. Generally, this timeout value is set to 1 second, which is already large enough. Some applications set the timeout parameter to a very large value, such as 1 minute, which quickly fills up the user’s threads and the entire business cannot accept new requests.

4. Other Practices #

There are many ways to improve performance using buffers. Here are a few more examples:

StringBuilder and StringBuffer can improve string concatenation performance by buffering the strings to be processed and then completing the concatenation.
When writing to disk or performing network I/O, the operating system utilizes specific buffers to enhance the efficiency of information flow. You can use the flush function to forcefully flush data. For example, you can improve network transmission performance by adjusting the parameters SO_SNDBUF and SO_RCVBUF of the socket.
InnoDB, the engine used by MySQL, enhances database performance by reducing page swaps and increasing the size of the innodb_buffer_pool.
Buffers are sometimes used in lower-level tools as well. For example, a common ID generator can buffer a portion of ID ranges to avoid frequent and time-consuming interactions.

5. Precautions #

Although buffering can greatly improve the performance of our applications, it also comes with some issues. We need to be aware of these exceptional situations when designing.

One of the most serious issues is the loss of buffered content. Even if you use addShutdownHook to gracefully shut down, there are still some cases that are difficult to prevent, such as sudden power outage or the abrupt death of the application process. In these cases, the unprocessed information in the buffer will be lost, especially in the case of financial or e-commerce order information, which can be quite serious.

Therefore, before writing content into the buffer, it is necessary to pre-write the log. When a failure occurs and the system restarts, data recovery can be performed based on these logs. In the field of databases, file buffering is very common, and WAL logging (Write-Ahead Logging) is generally used. For systems that have strict requirements for data integrity, battery or UPS may even be used to ensure that the buffer is persisted. This is a new problem brought by performance optimization that must be solved.

Summary #

As we can see, buffer optimization is an operation that intercepts the normal business process and adds a buffering component. It can be implemented synchronously or asynchronously, with the latter being more difficult.

Most components, from operating systems to databases, from Java APIs to some middleware, can achieve significant performance improvements by setting parameters to control buffer size. However, it is important to note that certain extreme scenarios (power failure, abnormal termination, kill -9, etc.) may result in data loss. If your business has a low tolerance for this, then you will need to invest more effort in handling these exceptions.

During our interviews, in addition to assessing your grasp of knowledge details, we will also evaluate your ability to summarize and analyze similar problems. In your daily work, you should also strive to summarize and think more, so as to gain a deeper understanding from a small glimpse. Answering in this way will certainly impress the interviewers.