38 Bytes Package and Byte Slice Operations Part 1

38 bytes Package and Byte Slice Operations - Part 1 #

I believe that after the last lesson, you are already familiar enough with the strings.Builder and strings.Reader types.

Last time, I also suggested that you take a look at the other entities in the strings package. If you did so carefully, then you will definitely have a sense of familiarity with the bytes package that we are going to discuss today.

Introduction: Basics of `bytes.Buffer` #

The strings and bytes packages can be considered twin siblings, as they are very similar in terms of API. In terms of the number and functionality of the functions they provide, the difference between them can be said to be minimal.

The main difference is that the strings package mainly deals with Unicode characters and strings encoded in UTF-8, while the bytes package mainly deals with bytes and byte slices.

Today, I will mainly talk about the most distinctive type in the bytes package: Buffer. As the name suggests, the bytes.Buffer type is mainly used as a buffer for byte sequences.

Like the strings.Builder type, bytes.Buffer is also ready to use out of the box.

However, the difference is that strings.Builder can only concatenate and export strings, while bytes.Buffer can not only concatenate and truncate byte sequences, export its content in various forms, but also sequentially read sub-sequences from it.

It can be said that bytes.Buffer is a data type that combines reading and writing capabilities. Of course, these are basically the functions that a buffer should have.

Internally, the bytes.Buffer type also uses a byte slice as the content container. And, similar to the strings.Reader type, bytes.Buffer has an int type field to represent the count of bytes read, which can be referred to as the read count.

However, the read count cannot be calculated by the methods provided by bytes.Buffer.

First, let’s take a look at the following code:

var buffer1 bytes.Buffer
contents := "Simple byte buffer for marshaling data."
fmt.Printf("Writing contents %q ...\n", contents)
buffer1.WriteString(contents)
fmt.Printf("The length of buffer: %d\n", buffer1.Len())
fmt.Printf("The capacity of buffer: %d\n", buffer1.Cap())

I first declared a variable buffer1 of type bytes.Buffer and wrote a string into it. Then, I want to print the length and capacity of this bytes.Buffer value (hereinafter referred to as Buffer value). After running this code, we will see the following output:

Writing contents "Simple byte buffer for marshaling data." ...
The length of buffer: 39
The capacity of buffer: 64

At first glance, there seems to be no problem. The meanings of the length 39 and the capacity 64 seem to be consistent with our known concepts. I wrote a string with a length of 39 into the buffer, so the length of buffer1 is 39.

According to the slice’s automatic expansion strategy, the number 64 is also reasonable. In addition, you can imagine that the value of the read count at this time should be 0, because I have not called any methods to read its contents.

However, actually, like the Len method of the strings.Reader type, the Len method of buffer1 also returns the length of the unread portion in the content container, not the total length of the stored contents. An example is as follows:

p1 := make([]byte, 7)
n, _ := buffer1.Read(p1)
fmt.Printf("%d bytes were read. (call Read)\n", n)
fmt.Printf("The length of buffer: %d\n", buffer1.Len())
fmt.Printf("The capacity of buffer: %d\n", buffer1.Cap())

When I read a part of the content from buffer1 and fill a byte slice p1 of length 7 with them, the result value returned by the Len method of buffer1 will also change accordingly. If you run this code, you will find that the length of this buffer has become 32.

In addition, since we did not write any more content to this buffer, its capacity remains unchanged at 64.

In summary, what you need to remember here is that the length of a Buffer value represents the length of the unread content, not the total length of the stored content. It is related to both read and write operations on the current value, and will change as these two types of operations are performed. It may become smaller or larger.

The capacity of a Buffer value refers to the capacity of its content container (the byte slice). It is only related to write operations on the current value and will continue to increase as content is written into it.

Now let’s talk about the read count. Since the strings.Reader also has a Size method that can give the value of the content length, we can easily obtain its read count by subtracting the length of the unread portion from the content length.

However, the bytes.Buffer type does not have such a method, only the Cap method. But the Cap method provides the capacity of the content container, not the content length.

Moreover, in many cases, the capacity of the content container is not the same as the content length. Therefore, without a ready-made calculation formula, it is difficult for us to estimate the read count of a Buffer value in slightly more complex situations.

Once we understand the concept of the read count and can obtain the values of the read count and content length in real time during reading and writing processes, we can intuitively understand the behavior of various methods of the current Buffer value. However, unfortunately, we cannot directly obtain both of these numbers.

Although we cannot directly obtain the read count of a Buffer value, and sometimes it is difficult to estimate it, we should not give up. Instead, we should explore the key role of the read count in bytes.Buffer by studying its documentation and source code.

Otherwise, our desire to use bytes.Buffer well will not be easy to achieve.

The following question can be answered well if you carefully read the source code of bytes.Buffer.

Our question today is: What is the role of the read count recorded by a bytes.Buffer value?

The typical answer to this question is as follows.

The approximate functions of the read count in bytes.Buffer are as follows:

When reading the content, the corresponding method will find the unread portion based on the read count, and update the count after reading.
When writing content and requiring expansion, the corresponding method will implement the expansion strategy based on the read count.
When truncating content, the corresponding method will truncate the unread portion after the index represented by the read count.
When reading back, the corresponding method needs to use the read count to record the rollback point.
When resetting the content, the corresponding method will set the read count to 0.
When exporting the content, the corresponding method will only export the unread portion after the index represented by the read count.
When obtaining the length, the corresponding method will calculate the length of the unread portion based on the read count and the length of the content container, and return it.

Problem Analysis #

From the above typical answer, we have already realized the importance of the read counter in bytes.Buffer type and its methods. Yes, most of the methods of bytes.Buffer use the read counter, and they are indispensable.

When reading content, the corresponding method will first check whether there is unread content in the content container based on the read counter. If so, it will start reading from the index represented by the read counter.

After reading is completed, it will promptly update the read counter. In other words, it will record how many bytes have been read. The corresponding methods mentioned here include all methods with names starting with Read, as well as the Next method and the WriteTo method.

When writing content, most of the corresponding methods will first check whether the current content container has enough capacity to accommodate new content. If not, they will resize the content container.

During resizing, when necessary, the method will find the unread part based on the read counter and copy the content to the head of the expanded content container.

Then, the method will set the value of the read counter to 0, indicating that the next read should start from the first byte of the content container. The corresponding methods used to write content include all methods with names starting with Write, as well as the ReadFrom method.

The method Truncate used to truncate content can confuse many developers who are not familiar with bytes.Buffer. It takes an int parameter, the value of which represents: how many bytes need to be retained from the head during truncation.

However, it is important to note that the head mentioned here does not refer to the head of the content container, but the head of the unread part. The starting index of the head is represented by the value of the read counter. Therefore, in this case, the sum of the value of the read counter and the parameter value represents the new total length of the content container.

In bytes.Buffer, there are two methods used for reading rollback: UnreadByte and UnreadRune. These two methods are used to rollback one byte and one Unicode character, respectively. They are generally called to rollback the separator at the end of the content that was read last time, or to prepare for re-reading the previous byte or character.

However, the prerequisite for rollback is that the last operation before calling them must be “read”, and the read operation must be successful. Otherwise, these methods can only ignore the subsequent operations and return a non-nil error value.

The UnreadByte method is relatively simple, just subtract 1 from the value of the read counter. The UnreadRune method needs to subtract the number of bytes occupied by the last read Unicode character from the read counter.

This number of bytes is stored in another field of bytes.Buffer, and its valid range here is [1, 4]. Only the ReadRune method will set the value of this field within this range.

Therefore, only calling the UnreadRune method immediately after calling the ReadRune method can complete the rollback successfully. This method obviously has a narrower scope than the UnreadByte method.

As I mentioned before, the Len method of bytes.Buffer returns the length of the unread part in the content container, not the total length of the stored content (i.e., the content length).

The behavior of the Bytes method and the String method of this type is consistent with that of the Len method. The first two methods only access the content in the unread part and return the corresponding result value.

After analyzing all the relevant methods, we can summarize it like this: the content before the index represented by the read counter has been read and is almost never read again.

However, the memory space where these read contents are located may be used to store new content. This is generally due to resetting or extending the content container. In this case, the value of the read counter will always be set to 0, pointing to the first byte of the content container again. This is sometimes done to avoid memory allocation and reuse memory space.

Summary #

In summary, bytes.Buffer is a data type that combines reading and writing functions. It is very suitable as a buffer for byte sequences. We will continue to expand our knowledge of bytes.Buffer in the next article. If you have any questions about this content, please leave a message and we can discuss it together.

Thank you for listening and see you next time.

Click here to view the detailed code accompanying the Go Language column articles.