33 Temporary Object Pool Sync. Pool

33 Temporary Object Pool sync #

So far, we have learned together the most important synchronization tools in the Go standard library, including the classic mutex, read-write lock, condition variable, and atomic operations, as well as several synchronization tools unique to Go:

sync/atomic.Value;
sync.Once;
sync.WaitGroup;
context.Context.

Today, we will talk about another synchronization tool in the Go standard library: sync.Pool.

The sync.Pool type can be called a temporary object pool, and its value can be used to store temporary objects. Like many other synchronization tools in Go, the sync.Pool type also belongs to the struct type, and its value should not be copied after it is used.

Here, “temporary objects” mean a class of values that are not needed for persistent use. These values are optional for the program, but having them would clearly be better. Their creation and destruction can occur at any time and will not affect the functionality of the program at all.

At the same time, they should also be indistinguishable, and any one of them can replace another. If a class of values completely satisfies the above conditions, then you can store them in a temporary object pool.

You may have already realized that we can use a temporary object pool as a cache for a certain type of data. In fact, in my opinion, the main purpose of a temporary object pool is for this use case.

The sync.Pool type has only two methods—Put and Get. Put is used to store temporary objects in the current pool and accepts an interface{} type parameter; while Get is used to retrieve temporary objects from the current pool and returns an interface{} type value.

Specifically, the Get method of this type may remove any value from the current pool and return it as the result. If at this point the current pool has no values, then this method will create a new value using the New field of the current pool and return it directly.

The New field of the sync.Pool type represents the function for creating temporary objects. Its type is a function type without parameters but with a unique result, namely: func() interface{}.

This function is the last resort for the Get method to obtain a temporary object. If the Get method still cannot obtain a value at the end, then it will call this function. The result value of this function will not be stored in the current temporary object pool, but will be returned directly to the caller of the Get method.

The actual value of the New field needs to be given when initializing the temporary object pool. Otherwise, when we call its Get method, we may get nil. So, the sync.Pool type is not ready to use out of the box. However, this type only has this one public field, so initialization is not difficult.

As an example, the standard library package fmt uses the sync.Pool type. This package creates a sync.Pool type value to cache a certain class of temporary objects and assigns this value to a variable named ppFree. These temporary objects can identify, format, and store content for printing.

var ppFree = sync.Pool{
 New: func() interface{} { return new(pp) },
}

When the New field of the temporary object pool ppFree is called, it always returns a pointer to a new value of type pp (i.e., a temporary object). This ensures that the Get method of ppFree can always return a value that can contain the content to be printed.

The pp type is a private type in the fmt package and has many methods that implement different functionalities. However, the key point here is that each of its values is independent, equal, and reusable.

Specifically, these objects are independent from each other and not affected by external states. They are mainly concerned with buffering a content to be printed. Since the code in the fmt package always resets these temporary objects before actually using them, they don’t care which temporary object they get. This is the specific manifestation of the equality of temporary objects.

In addition, after using the temporary objects, the code will first erase the buffered content and then put them back into ppFree. This prepares them for reuse.

Well-known printing functions like fmt.Println and fmt.Printf use ppFree and its temporary objects in this way. Therefore, when the program executes many printing function calls at the same time, ppFree can promptly provide its cached temporary objects to speed up execution.

And when the program no longer calls printing functions for a period of time, the temporary objects in ppFree can be cleaned up in a timely manner to save memory space.

Obviously, in this dimension, the temporary object pool can help the program achieve scalability. This is its greatest value.

I think by now you have a clear understanding of the basic functionality, usage, applicable scenarios, and significance of the temporary object pool. Let’s discuss some of its internal mechanisms next so that we can make better use of it for more tasks.

First, let me ask you a question. This question is likely what you want to ask too. The question for today is: why are the values in the temporary object pool cleaned up promptly?

The typical answer here is: because of the garbage collector in the Go runtime system, it will fully clean up all values in the temporary object pool before each execution starts.

Problem Analysis #

I have already explained to you when temporary objects are created earlier. Now let me explain in detail when they are destroyed.

When the sync package is initialized, it registers a function with the Go runtime system. The purpose of this function is to clear all values in the temporary object pools that have been created. We can call it the pool cleaning function.

Once the pool cleaning function is registered with the Go runtime system, it is executed every time garbage collection is about to be performed.

In addition, there is a package-level private global variable in the sync package. This variable represents a summary of all temporary object pools used in the current program. It is a slice with elements of type *sync.Pool. We can call it the pool summary list.

Typically, when the Put method or Get method of a temporary object pool is called for the first time, the pool is added to the pool summary list. Because of this, the pool cleaning function always has access to all temporary object pools that are actually being used.

Specifically, the pool cleaning function will iterate over the pool summary list. For each temporary object pool in the list, it first sets all private temporary objects and shared temporary object lists in the pool to nil, and then destroys all local pool lists in the pool.

Finally, the pool cleaning function resets the pool summary list to an empty slice. As a result, all temporary objects stored in these pools are completely cleared.

If there is no reference to these temporary objects outside the temporary object pool, they will be treated as garbage and destroyed during the subsequent garbage collection process, and the memory space they occupy will be reclaimed for other purposes.

Above is my further explanation of temporary object cleaning. The key points to remember are the meaning of the pool cleaning function and the pool summary list, as well as the key role they play. Once you understand these, you should be able to confidently respond to this question when asked.

However, we have also encountered several new terms here, such as private temporary objects, shared temporary object lists, and local pools. What do they mean? This leads us to the following question.

Knowledge Expansion #

Question 1: What is the data structure used for storing values in the temporary object pool? #

In the temporary object pool, there is a multi-level data structure. It is because of the existence of this data structure that the temporary object pool can efficiently store a large number of values.

The top level of this data structure can be called the local pool list, but to be more precise, it is an array. The length of this list is always the same as the number of P (processors) in the Go language scheduler.

Do you remember? P in the Go language scheduler stands for processor, which refers to a kind of intermediary that can carry several Gs (goroutines) and coordinate them with Ms (system-level threads) for actual execution.

Here, G is the abbreviation for goroutine, and M is the abbreviation for machine, which refers to the system-level thread. It is precisely because of the existence of P that G and M can be flexibly and efficiently paired, achieving a powerful concurrent programming model.

One important reason for the existence of P is to distribute the execution pressure of concurrent programs and the main reason why the length of the local pool list in the temporary object pool is equal to the number of P is also to distribute the pressure. The pressure here includes both storage and performance. Before explaining them, let’s explore the data structure in the temporary object pool.

Each local pool in the local pool list contains three fields (or components): private, which stores private temporary objects, shared, which represents the list of shared temporary objects, and an embedded field of type sync.Mutex.

- The correspondence between the local pool in sync.Pool and various Gs

In fact, each local pool corresponds to a P. As we all know, a goroutine must be associated with a P in order to run. In other words, a running goroutine is always associated with a specific P.

When the Put method or Get method of the temporary object pool is called, it always tries to obtain the corresponding local pool from the local pool list of the temporary object pool based on the ID of the P associated with the current goroutine.

In other words, which local pool the Put method or Get method of a temporary object pool obtains depends entirely on the P associated with the goroutine in which the code calling it is running.

Since we have mentioned this, the following question will immediately follow.

Question 2: How does the temporary object pool use internal data structures to store and access values? #

The Put method of the temporary object pool always attempts to store a new temporary object in the private field of the corresponding local pool first, so that a usable value can be quickly obtained when retrieving a temporary object later.

The method only accesses the shared field of the local pool when the private field already holds a value.

Similarly, the Get method of the temporary object pool always tries to retrieve a temporary object from the private field of the corresponding local pool. It only accesses the shared field of the local pool when the value of the private field is nil.

In principle, the shared field of a local pool can be accessed by code in any goroutine, regardless of which P the goroutine is associated with. This is why I call it the list of shared temporary objects.

In contrast, the private field of a local pool can only be accessed by code in the goroutine associated with the corresponding P, so it can be considered P-level private.

Taking the Put method of the temporary object pool as an example, when it finds that the private field of the corresponding local pool already holds a value, it then accesses the shared field of the local pool. Since the shared field is shared, it must be protected by a mutex.

Do you remember the sync.Mutex field embedded in the local pool? It is the mutex used here, which means that the local pool itself has the functionality of a mutex. The Put method appends the new temporary object to the end of the list of shared temporary objects under the protection of the mutex.

Similarly, when the Get method of the temporary object pool finds that the private field of the corresponding local pool does not hold a value, it also accesses the shared field of the local pool. It attempts to retrieve the value of the last element in the list of shared temporary objects under the protection of the mutex.

However, the list of shared temporary objects may also be empty, which may be because all the temporary objects in this local pool have been taken, or the temporary object pool has just been cleaned.

Regardless of the reason, the Get method will search through all the local pools in the current temporary object pool one by one and check their list of shared temporary objects.

As long as it finds an element in the list of shared temporary objects, it will retrieve the last element of that list as the result.

- Steps to retrieve a temporary object from sync.Pool

Of course, it may still fail to obtain a usable temporary object in this way, such as when all the temporary object pools have just been cleaned.

In this case, the Get method will use the last resort–calling the function that can create temporary objects. Do you remember? This function is represented by the New field of the temporary object pool and needs to be provided when initializing the temporary object pool. If the value of this field is nil, the Get method can only return nil at this point.

Above is my more complete answer to this question.

Summary #

Today, we discussed another useful synchronization tool - the sync.Pool type, which I referred to as a temporary object pool.

The temporary object pool has a New field, which is best given a value when initializing the pool. The temporary object pool also has two methods, Put and Get, which are used to store temporary objects in the pool and retrieve them from the pool, respectively.

Each value stored in the temporary object pool should be independent, equal, and reusable. We should not care which value we get from the pool, nor should we care if the value has been used before.

To fully achieve these two points, we may need to write some additional code. However, this amount of code should be minimal, just like the usage of the fmt package with the temporary object pool. Therefore, when choosing a temporary object pool, we must consider the characteristics of the values it will store.

Inside the temporary object pool, there is a multi-layer data structure supporting the storage of temporary objects. Its top layer is a list of local pools, which contains the local pools corresponding to each P, and its length is always equal to the number of P’s.

In each local pool, there is a private temporary object and a shared temporary object list. The former can only be accessed by the code in the goroutine associated with its corresponding P, while the latter does not have this constraint. From another perspective, the former is used for quick access to temporary objects, while the latter is used for sharing within the pool.

It is because of this data structure that the temporary object pool can effectively distribute storage pressure and performance pressure. At the same time, it is because of the clever use of this data structure by the Get method of the temporary object pool that the temporary objects in it can be efficiently utilized. For example, this method sometimes “steals” a temporary object from the shared temporary object list of another local pool.

This internal structure and access method make the temporary object pool a distinctive synchronization tool. The temporary objects it stores should be values with relatively long lifetimes, and these values should not be held and used by the code in any goroutine for a long time.

Therefore, the temporary object pool is very suitable for use as a cache for certain data. In a sense, the temporary object pool can help programs achieve scalability, which is its greatest value.

Thought Question #

Today’s thought question is: How to ensure that a temporary object pool always has sufficient temporary objects?

Please answer from two aspects: the initialization of the temporary object pool and the method invocation. You can refer to the fmt package and the way the temporary object pool is used in the demo70.go file if necessary.

Thank you for listening, see you next time.

Click here to view the detailed code accompanying the Go language column article.