43 Data Types in the bufio Package - Part 2 #
Hello, I’m Haolin, and today I will continue sharing about the data types in the bufio package.
In the previous article, I mentioned that the main data types in the bufio
package are Reader
, Scanner
, Writer
, and ReadWriter
. I focused on bufio.Reader
and bufio.Writer
types in particular. Today, we will continue to concentrate on the bufio.Reader
type for further learning.
Knowledge Expansion #
Question: What are the different methods for reading using the bufio.Reader
type?
#
The bufio.Reader
type has many pointer methods for reading data. Among these, four methods can be used to represent different reading processes: Peek
, Read
, ReadSlice
, and ReadBytes
.
The purpose of the Peek
method of a Reader
value is to read and return n
unread bytes from its buffer, starting from the index represented by the read count.
If the buffer is not filled and the number of unread bytes is less than n
, this method will call the fill
method to start the buffer filling process. However, if it finds an error during the last buffer filling, it will not fill the buffer again.
If the given n
is larger than the length of the buffer or the number of unread bytes in the buffer is less than n
, the Peek
method will return a sequence of “all unread bytes” as the first result value.
At the same time, it usually returns the value of the bufio.ErrBufferFull
variable (referred to as the buffer full error) as the second result value, indicating that even though the buffer has been compressed and filled, it still cannot satisfy the requirements.
Only when none of the above situations occur, the Peek
method can return “n bytes starting from the read count” and “nil to indicate no error occurred”.
The Peek
method of the bufio.Reader
type has a distinct feature: even if it reads data from the buffer, it does not change the value of the read count.
Other reading methods of this type are not like this. For example, the Read
method of this type sometimes copies the unread bytes in the buffer to the byte slice represented by its parameter p
one by one, and immediately increases the read count based on the actual number of bytes copied.
-
If there are still unread bytes in the buffer, this is how the method behaves. However, at other times, the read count of the
Reader
may be equal to the write count, indicating that there are no unread bytes in the buffer. -
When there are no unread bytes in the buffer, the
Read
method first checks if the length of the parameterp
is greater than or equal to the length of the buffer. If it is, theRead
method will simply abandon filling the buffer and instead read data directly from its underlying reader and copy it top
. This means that it completely bypasses the buffer and directly connects both the data source and the consumer.
It is worth noting that the Peek
method behaves differently in similar situations (which method is better depends on the specific use case).
When the conditions are met, the Peek
method fills the buffer and returns all unread bytes in the buffer if the value of the parameter n
is larger than the length of the buffer.
If the length of the buffer that we initially set is very large, the execution time of this method in this situation may be long. The main reason is that filling the buffer takes a long time.
According to the fill
method, it tries to fill the writable space in the buffer as much as possible. However, in most cases, the Read
method does not write data to the buffer, especially in the case described earlier where there are no unread bytes in the buffer and the length of the parameter p
is greater than or equal to the length of the buffer.
In this case, the method directly reads data from the underlying reader, so the reading speed of the data becomes the decisive factor for the execution time in this situation.
Of course, what I mentioned here is only where time-consuming operations are more likely to occur in certain situations. The final conclusion should be based on the objective results of performance testing.
Going back to the internal process of the Read
method. If there are no unread bytes in the buffer, but its length is larger than the length of the parameter p
, the method will first reset the values of the read count and the write count to 0, and then attempt to fill the buffer from beginning to end using the data obtained from the underlying reader.
However, it is important to note that the ReadSlice
method and the ReadBytes
method are similar in one aspect: as long as they write the obtained data into the buffer, they will update the value of the written count in a timely manner.
Now let’s talk about the ReadSlice
method and the ReadBytes
method. Overall, the purpose of these two methods is to continuously read data until the specified delimiter provided by the caller is encountered.
The ReadSlice
method first searches for the delimiter in the unread portion of its buffer. If it cannot find the delimiter and the buffer is not full, the method will fill the buffer by calling the fill
method and then search again. This process repeats until the delimiter is found.
If an error occurs during the filling process, the ReadSlice
method will return the unread portion of the buffer as the first result value, along with the corresponding error value.
Note that it is possible for the buffer to be filled but still not find the delimiter.
In this case, the ReadSlice
method will return the entire buffer (represented by the byte slice of the buf
field) as the first result value, and the error indicating that the buffer is full (the value of the bufio.ErrBufferFull
variable) as the second result value.
It is reasonable to do so because the buffer filled by the fill
method only contains unread bytes from start to finish.
Of course, once the ReadSlice
method finds the delimiter, it will slice out the corresponding byte slice containing the delimiter on the buffer and return it as the result value. The method will correctly update the value of the read count regardless of whether the delimiter is found or not.
For example, before returning all the unread bytes in the buffer or the byte slice representing the entire buffer, it will assign the value of the written count to the read count to indicate that there are no unread bytes in the buffer.
If we can say that ReadSlice
is a method that may give up halfway, then we can say that ReadBytes
is quite persistent.
The ReadBytes
method repeatedly calls the ReadSlice
method to read data from the buffer until the delimiter is found.
During this process, the ReadSlice
method may return all the bytes read and the corresponding error value due to a full buffer, but the ReadBytes
method always ignores such errors and calls the ReadSlice
method again. This allows the latter to continue filling the buffer and searching for the delimiter.
Unless the error value returned by the ReadSlice
method does not represent a buffer full error or it finds the delimiter, this process will never end.
If the search process ends, regardless of whether it is due to finding the delimiter or encountering an error, the ReadBytes
method will assemble all the bytes read in the process into a byte slice in the order they were read, and return it as the first result value. If the process ends due to an error, it will also return the obtained error value as the second result value.
Among the various read methods of the bufio.Reader
type, besides the ReadBytes
method, the ReadLine
method also relies on the ReadSlice
method. However, the latter does not have any particularity in the reading process, so I won’t elaborate on it here.
In addition, the ReadString
method of this type completely relies on the ReadBytes
method. The former simply performs a simple type conversion on the result value returned by the latter.
Finally, I would like to remind you of a security issue. The Peek
method, ReadSlice
method, and ReadLine
method of the bufio.Reader
type may cause content leakage.
This is mainly because under normal circumstances, they all return byte slices directly based on the buffer. I have explained what content leakage means when talking about the bytes.Buffer
type. You can refer to that for more information.
The caller can access other parts of the buffer and even modify its content through the result value returned by these methods. This is usually very dangerous.
Summary #
We have presented the data types in the bufio
package in some detail, with the focus on the bufio.Reader
type.
The bufio.Reader
type represents a reader with a buffer. When initialized, it requires an underlying reader, which must be an implementation of the io.Reader
interface.
The buffer in a Reader
value serves as an intermediate storage for data, sitting between the underlying reader and the read methods and their callers. The read methods of this type typically first read data from the buffer, and when necessary, read a portion of data from the underlying reader in advance and fill it into the buffer for future use. The operation of filling the buffer is usually performed by the fill
method of this value. During the filling process, the fill
method may compress the buffer.
Among the many read methods of the Reader
value, there are four methods that represent different read processes: Peek
, Read
, ReadSlice
, and ReadBytes
.
The Peek
method does not change the value of the read counter even if it reads data from the buffer. The Read
method bypasses the buffer and directly requests data from the underlying reader if the length of the parameter value is too large and there are no unread bytes in the buffer.
The ReadSlice
method searches for the given delimiter in the remaining unread portion of the buffer and fills the buffer if necessary.
If the separator is still not found after filling the buffer, this method will return the entire buffer as the first result value and return an error indicating that the buffer is full.
The ReadBytes
method repeatedly fills the buffer by calling the ReadSlice
method and searches for the delimiter in it. This process continues until an unexpected error occurs or the delimiter is found.
The ReadLine
method of the Reader
value relies on its ReadSlice
method, while its ReadString
method relies entirely on the ReadBytes
method.
In addition, it is worth noting that the Peek
method, ReadSlice
method, and ReadLine
method of the Reader
value may result in a leak of the content in its buffer.
Finally, let’s talk about the bufio.Writer
type. The functionality of writing the temporarily stored data in the buffer of a value of this type to its underlying writer is mainly implemented by its Flush
method.
All the data writing methods of this type will call its Flush
method when necessary. In general, these writing methods first write the data to the buffer of the respective value and then increment the value’s written counter. However, in some cases, the Write
method and the ReadFrom
method may also bypass the buffer and directly write the data to the underlying writer.
Please remember that although these writing methods occasionally call the Flush
method, it is always safest to explicitly invoke this method after writing all the data.
Thought Question #
Today’s thought question is: What is the main function of the bufio.Scanner
type? What are its characteristics?
Thank you for listening, see you next time.
Click here to view the detailed code accompanying the Go Language column article.