43 Data Types in the Bufio Package Part 2

43 Data Types in the bufio Package - Part 2 #

Hello, I’m Haolin, and today I will continue sharing about the data types in the bufio package.

In the previous article, I mentioned that the main data types in the bufio package are Reader, Scanner, Writer, and ReadWriter. I focused on bufio.Reader and bufio.Writer types in particular. Today, we will continue to concentrate on the bufio.Reader type for further learning.

Knowledge Expansion #

Question: What are the different methods for reading using the `bufio.Reader` type? #

The bufio.Reader type has many pointer methods for reading data. Among these, four methods can be used to represent different reading processes: Peek, Read, ReadSlice, and ReadBytes.

The purpose of the Peek method of a Reader value is to read and return n unread bytes from its buffer, starting from the index represented by the read count.

If the buffer is not filled and the number of unread bytes is less than n, this method will call the fill method to start the buffer filling process. However, if it finds an error during the last buffer filling, it will not fill the buffer again.

If the given n is larger than the length of the buffer or the number of unread bytes in the buffer is less than n, the Peek method will return a sequence of “all unread bytes” as the first result value.

At the same time, it usually returns the value of the bufio.ErrBufferFull variable (referred to as the buffer full error) as the second result value, indicating that even though the buffer has been compressed and filled, it still cannot satisfy the requirements.

Only when none of the above situations occur, the Peek method can return “n bytes starting from the read count” and “nil to indicate no error occurred”.

The Peek method of the bufio.Reader type has a distinct feature: even if it reads data from the buffer, it does not change the value of the read count.

Other reading methods of this type are not like this. For example, the Read method of this type sometimes copies the unread bytes in the buffer to the byte slice represented by its parameter p one by one, and immediately increases the read count based on the actual number of bytes copied.

If there are still unread bytes in the buffer, this is how the method behaves. However, at other times, the read count of the Reader may be equal to the write count, indicating that there are no unread bytes in the buffer.
When there are no unread bytes in the buffer, the Read method first checks if the length of the parameter p is greater than or equal to the length of the buffer. If it is, the Read method will simply abandon filling the buffer and instead read data directly from its underlying reader and copy it to p. This means that it completely bypasses the buffer and directly connects both the data source and the consumer.

It is worth noting that the Peek method behaves differently in similar situations (which method is better depends on the specific use case).

When the conditions are met, the Peek method fills the buffer and returns all unread bytes in the buffer if the value of the parameter n is larger than the length of the buffer.

If the length of the buffer that we initially set is very large, the execution time of this method in this situation may be long. The main reason is that filling the buffer takes a long time.

According to the fill method, it tries to fill the writable space in the buffer as much as possible. However, in most cases, the Read method does not write data to the buffer, especially in the case described earlier where there are no unread bytes in the buffer and the length of the parameter p is greater than or equal to the length of the buffer.

In this case, the method directly reads data from the underlying reader, so the reading speed of the data becomes the decisive factor for the execution time in this situation.

Of course, what I mentioned here is only where time-consuming operations are more likely to occur in certain situations. The final conclusion should be based on the objective results of performance testing.

Going back to the internal process of the Read method. If there are no unread bytes in the buffer, but its length is larger than the length of the parameter p, the method will first reset the values of the read count and the write count to 0, and then attempt to fill the buffer from beginning to end using the data obtained from the underlying reader. However, it is important to note that the ReadSlice method and the ReadBytes method are similar in one aspect: as long as they write the obtained data into the buffer, they will update the value of the written count in a timely manner.

Now let’s talk about the ReadSlice method and the ReadBytes method. Overall, the purpose of these two methods is to continuously read data until the specified delimiter provided by the caller is encountered.

The ReadSlice method first searches for the delimiter in the unread portion of its buffer. If it cannot find the delimiter and the buffer is not full, the method will fill the buffer by calling the fill method and then search again. This process repeats until the delimiter is found.

If an error occurs during the filling process, the ReadSlice method will return the unread portion of the buffer as the first result value, along with the corresponding error value.

Note that it is possible for the buffer to be filled but still not find the delimiter.

In this case, the ReadSlice method will return the entire buffer (represented by the byte slice of the buf field) as the first result value, and the error indicating that the buffer is full (the value of the bufio.ErrBufferFull variable) as the second result value.

It is reasonable to do so because the buffer filled by the fill method only contains unread bytes from start to finish.

Of course, once the ReadSlice method finds the delimiter, it will slice out the corresponding byte slice containing the delimiter on the buffer and return it as the result value. The method will correctly update the value of the read count regardless of whether the delimiter is found or not.

For example, before returning all the unread bytes in the buffer or the byte slice representing the entire buffer, it will assign the value of the written count to the read count to indicate that there are no unread bytes in the buffer.

If we can say that ReadSlice is a method that may give up halfway, then we can say that ReadBytes is quite persistent.

The ReadBytes method repeatedly calls the ReadSlice method to read data from the buffer until the delimiter is found.

During this process, the ReadSlice method may return all the bytes read and the corresponding error value due to a full buffer, but the ReadBytes method always ignores such errors and calls the ReadSlice method again. This allows the latter to continue filling the buffer and searching for the delimiter.

Unless the error value returned by the ReadSlice method does not represent a buffer full error or it finds the delimiter, this process will never end.

If the search process ends, regardless of whether it is due to finding the delimiter or encountering an error, the ReadBytes method will assemble all the bytes read in the process into a byte slice in the order they were read, and return it as the first result value. If the process ends due to an error, it will also return the obtained error value as the second result value.

Among the various read methods of the bufio.Reader type, besides the ReadBytes method, the ReadLine method also relies on the ReadSlice method. However, the latter does not have any particularity in the reading process, so I won’t elaborate on it here.

In addition, the ReadString method of this type completely relies on the ReadBytes method. The former simply performs a simple type conversion on the result value returned by the latter.

Finally, I would like to remind you of a security issue. The Peek method, ReadSlice method, and ReadLine method of the bufio.Reader type may cause content leakage.

This is mainly because under normal circumstances, they all return byte slices directly based on the buffer. I have explained what content leakage means when talking about the bytes.Buffer type. You can refer to that for more information.

The caller can access other parts of the buffer and even modify its content through the result value returned by these methods. This is usually very dangerous.

Summary #

We have presented the data types in the bufio package in some detail, with the focus on the bufio.Reader type.

The bufio.Reader type represents a reader with a buffer. When initialized, it requires an underlying reader, which must be an implementation of the io.Reader interface.

The buffer in a Reader value serves as an intermediate storage for data, sitting between the underlying reader and the read methods and their callers. The read methods of this type typically first read data from the buffer, and when necessary, read a portion of data from the underlying reader in advance and fill it into the buffer for future use. The operation of filling the buffer is usually performed by the fill method of this value. During the filling process, the fill method may compress the buffer.

Among the many read methods of the Reader value, there are four methods that represent different read processes: Peek, Read, ReadSlice, and ReadBytes.

The Peek method does not change the value of the read counter even if it reads data from the buffer. The Read method bypasses the buffer and directly requests data from the underlying reader if the length of the parameter value is too large and there are no unread bytes in the buffer.

The ReadSlice method searches for the given delimiter in the remaining unread portion of the buffer and fills the buffer if necessary.

If the separator is still not found after filling the buffer, this method will return the entire buffer as the first result value and return an error indicating that the buffer is full.

The ReadBytes method repeatedly fills the buffer by calling the ReadSlice method and searches for the delimiter in it. This process continues until an unexpected error occurs or the delimiter is found.

The ReadLine method of the Reader value relies on its ReadSlice method, while its ReadString method relies entirely on the ReadBytes method.

In addition, it is worth noting that the Peek method, ReadSlice method, and ReadLine method of the Reader value may result in a leak of the content in its buffer.

Finally, let’s talk about the bufio.Writer type. The functionality of writing the temporarily stored data in the buffer of a value of this type to its underlying writer is mainly implemented by its Flush method.

All the data writing methods of this type will call its Flush method when necessary. In general, these writing methods first write the data to the buffer of the respective value and then increment the value’s written counter. However, in some cases, the Write method and the ReadFrom method may also bypass the buffer and directly write the data to the underlying writer.

Please remember that although these writing methods occasionally call the Flush method, it is always safest to explicitly invoke this method after writing all the data.

Thought Question #

Today’s thought question is: What is the main function of the bufio.Scanner type? What are its characteristics?

Thank you for listening, see you next time.

Click here to view the detailed code accompanying the Go Language column article.