31 Sync. Wait Group and Sync. Once

31 sync #

The mutex, condition variable, and atomic operation we discussed in the past few lessons are all basic and important synchronization tools. In Go, they are also among the most commonly used concurrency-safe tools, aside from channels.

Speaking of channels, I don’t know if you’ve ever thought about it, but in some cases, the way we use channels seems a bit clumsy.

For example: declare a channel, make its capacity equal to the number of goroutines we activate manually, then use this channel to make the main goroutine wait for the completion of other goroutines.

To be more specific, let other goroutines send an element value to this channel before they finish running, and let the main goroutine receive an element value from this channel in the end. The number of times it receives should be the same as the number of other goroutines.

This is the multi-goroutine coordination process demonstrated in the coordinateWithChan function below.

func coordinateWithChan() {
    sign := make(chan struct{}, 2)
    num := int32(0)
    fmt.Printf("The number: %d [with chan struct{}]\n", num)
    max := int32(10)
    go addNum(&num, 1, max, func() {
        sign <- struct{}{}
    })
    go addNum(&num, 2, max, func() {
        sign <- struct{}{}
    })
    <-sign
    <-sign
}

The declaration of the addNum function is in the demo65.go file. The addNum function will use its last parameter value as a defer function.

Both of the goroutines I activate manually will call the addNum function, and the last parameter value they pass to this function (which is a function with no parameter or result declaration) will only do one thing: send an element value to the sign channel.

Have you seen the last two lines of code in the coordinateWithChan function? The repeated two receive expressions, <-sign, don’t they look ugly?

Introduction: WaitGroup type in the sync package #

In fact, in this scenario, we can choose another synchronization tool, which is the WaitGroup type in the sync package. It is more suitable for implementing this kind of one-to-many goroutine collaboration process than channels.

The WaitGroup type, abbreviated as WaitGroup, is ready to use and also concurrency-safe. Similar to the synchronization tools we discussed earlier, once it is used, it cannot be copied.

The WaitGroup type has three pointer methods: Add, Done, and Wait. You can imagine that this type has a counter with a default value of 0. We can increase or decrease the value of this counter by calling the Add method of a value of this type.

In general, I use this method to record the number of goroutines that need to wait. Correspondingly, the Done method of this type is used to decrement the value of the counter of its owning value. We can call it with a defer statement in the goroutine that needs to wait.

The purpose of the Wait method of this type is to block the current goroutine until the counter in its owning value becomes zero. If the value of that counter is already 0 when the method is called, it will do nothing.

You may have noticed that a value of the WaitGroup type, referred to as a WaitGroup value, can completely replace the channel sign in the coordinateWithChan function. The modified version of it is shown below.

func coordinateWithWaitGroup() {
 var wg sync.WaitGroup
 wg.Add(2)
 num := int32(0)
 fmt.Printf("The number: %d [with sync.WaitGroup]\n", num)
 max := int32(10)
 go addNum(&num, 3, max, wg.Done)
 go addNum(&num, 4, max, wg.Done)
 wg.Wait()
}

Clearly, the overall code is reduced by several lines and appears more concise. Here, I first declare a variable wg of the WaitGroup type. Then, I call its Add method and pass 2 as the argument because I will enable two goroutines that need to wait later.

Since the Done method of the wg variable is a function with no parameter and no result declaration itself, I can directly pass this method as the last argument to the addNum function in the go statement.

Finally, in the coordinateWithWaitGroup function, I call the Wait method of wg. Therefore, this function will wait until both of those two goroutines are finished running before terminating.

That concludes the most typical use case of the WaitGroup type. However, we should not stop here. It is necessary to further understand this type. Let’s explore the following question together.

Question: Can the counter value in a sync.WaitGroup type go below 0?

The typical answer here is: No, it cannot.

Problem Analysis #

Let’s analyze why this is not allowed. The reason WaitGroup’s counter cannot be smaller than 0 is because it will cause a panic. Both the Done and Add methods, when called inappropriately on this type of value, will cause a panic. Additionally, don’t forget that we can pass a negative number to the Add method.

In reality, there are more than one way to cause a panic when using the WaitGroup value.

You need to know that after declaring this type of variable, we should first call its Add method to ensure that the counter is greater than 0, based on the number of goroutines or other events we want to wait for. This is necessary to ensure that we can use this type of value properly later on.

If we call the Add method for the first time at the same time as calling the Wait method, for example, in two concurrently enabled goroutines, then it is possible for the Add method here to panic.

Although this situation is not easy to reproduce, that’s exactly why we need to pay attention to it. Therefore, although the WaitGroup value itself does not need to be initialized, it is still necessary to increase the value of its counter as early as possible.

In addition, you may already know that the WaitGroup value can be reused, but the integrity of its count cycle must be maintained. Here, the count cycle refers to a process in which the counter value of the value changes from 0 to a positive integer, and then goes through a series of changes before eventually returning to 0.

In other words, as long as the counter starts at 0 and returns to 0, it can be considered a count cycle. In the lifecycle of a value like this, it can go through any number of count cycles. However, only after it completes the current count cycle can it start the next count cycle.

- Count cycle of sync.WaitGroup

Therefore, if the Wait method of this type of value is called during one of its count cycles, it will immediately block the current goroutine until this count cycle is completed. In this case, the next count cycle of the value must wait until this Wait method is finished before it can start.

If during the execution of the Wait method of this value, two count cycles are crossed, then it will cause a panic.

For example, when the current goroutine is blocked due to calling the Wait method, another goroutine calls the Done method of that value and makes its counter value 0.

This would awaken the current goroutine and make it try to continue executing the remaining code in the Wait method. But at this point, another goroutine calls its Add method and changes its counter value from 0 to a positive integer. In this case, the Wait method here will immediately throw a panic.

Looking at the last two cases that can cause a panic, we can summarize a taboo on the use of WaitGroup values, which is: do not perform operations that increase the counter value and calling the Wait method on the same WaitGroup value concurrently. In other words, we must prevent the two operations on the same WaitGroup value from being executed at the same time.

Apart from the first case, we usually need to experiment repeatedly to make the methods of WaitGroup value throw panics. Again, although this does not happen every time, the probability of this happening is not small in long-running programs, so we must take them seriously.

If you are interested in reproducing these exceptional cases, you can refer to the waitgroup_test.go file in the sync code package. The test functions prefixed with TestWaitGroupMisuse demonstrate the conditions under which these exceptional cases occur. You can imitate these test functions to write your own test code and try running it out.

Knowledge Expansion #

Question: How does the Do method of the sync.Once type ensure that the provided function is executed only once? #

Similar to the sync.WaitGroup type, the sync.Once type is also a struct type that is ready to use out of the box and is concurrently safe. Because this type contains a field of type sync.Mutex, copying the value of this type also causes the functionality to be lost.

The Do method of the Once type only accepts one parameter, which must be of type func(), i.e.: a function with no parameter declaration or result declaration.

The purpose of this method is not to execute the provided function only once for every possible function, but rather to execute only the function that was passed in when it is called for the first time and to not execute any other functions thereafter.

Therefore, if you have multiple functions that need to be executed only once, you should assign a separate Once value for each of them.

The Once type also has a field named done, which is of type uint32. Its purpose is to record the number of times the Do method of its associated value has been called. However, the value of this field can only be 0 or 1. Once the initial invocation of the Do method is completed, its value changes from 0 to 1.

You may wonder why a uint32 type that requires four bytes is used when the value of the done field can only be 0 or 1.

The reason is simple: the operations on it must be “atomic”. The Do method begins by calling the atomic.LoadUint32 function to retrieve the value of this field and, once it finds that the value is 1, it immediately returns. This ensures that “the Do method only executes the function passed in when it is first called”.

However, this sole conditional guarantee is not enough. If two goroutines both call the Do method of the same new Once value and almost simultaneously reach this conditional check code, they will both continue executing the remaining code in the Do method because the result of the check is false.

After this conditional check, the Do method immediately locks the sync.Mutex field m of its associated value. Then, it checks the value of the done field again within the critical section and only calls the provided function and changes the value of done to 1 through atomic operations if the condition is met.

If you are familiar with the Singleton pattern in the GoF design patterns, you will surely notice that the implementation of this Do method has many similarities to that pattern. Both of them first check a key condition outside the critical section, and if the condition is not met, they immediately return. This is often referred to as the “fast path” or “early exit path”.

If the condition is met, the key condition is checked again within the critical section mainly for extra caution. These two condition checks are commonly referred to as the (cross-critical section) “double check”.

Since the critical section must be lock protected by the mutex m before entering it, it will undoubtedly slow down the execution speed of the code. Therefore, the second condition check and subsequent operations are called the “slow path” or “regular path”.

Although the code in the Do method is not much, it applies a very classic programming paradigm. In Go and its standard library, we can find many applications of this classic paradigm and its derived versions.

Next, let’s talk about two characteristics of the Do method in terms of functionality.

The first characteristic is that since the Do method changes the value of the done field to 1 only after the provided function has finished executing, if the execution of the provided function takes a long time or never ends (for example, performing some daemon tasks), it is possible for related goroutines to be simultaneously blocked due to calling this Do method.

For example, if multiple goroutines concurrently call the Do method of the same Once value and the provided function continues executing without ending, then all these goroutines will be blocked by calling this Do method except for the one goroutine that preemptively executes the provided function. This is because, apart from the goroutine that executed the provided function first, the other goroutines will be blocked at the line of code that locks the mutex m protecting this Once value.

The second characteristic is that after the provided function finishes executing, the Do method uses an atomic operation to assign a value to the done field, and this operation is placed in a defer statement. Therefore, no matter how the execution of the provided function ends, the value of the done field will always become 1.

In other words, even if this provided function fails to execute successfully (for example, if it triggers a panic), we cannot re-execute it with the same Once value. So, if you need to set up a retry mechanism for the execution of the provided function, you need to consider timely replacing the Once value.

In many cases, we need to design the flow related to the Do method based on these two characteristics in order to avoid unnecessary program blocking and functional deficiencies.

Summary #

The WaitGroup type and Once type of the sync package are both very easy-to-use synchronization tools. They are both out-of-the-box ready and concurrency safe.

By using the WaitGroup value, we can easily implement a one-to-many goroutine collaboration process, where one goroutine distributes sub-tasks and multiple goroutines execute the sub-tasks together to complete a larger task.

When using the WaitGroup value, we must be careful not to let the value of its counter go below 0, otherwise it will cause a panic.

Furthermore, it is best to use the standard approach of “first adding, then concurrently calling Done, and finally calling Wait” when using the WaitGroup value. Especially, do not use Add to increase the counter value concurrently while calling Wait at the same time, as this may also cause a panic.

The Once value is simpler to use than the WaitGroup value, as it only has one method called Do. The Do method of the same Once value will only execute the parameter function passed in the first time it is called, regardless of how this function ends.

As long as the parameter function passed to a Do method has not finished executing, any goroutine calling that method will be blocked. Only after this parameter function has finished executing, will these goroutines be awakened one by one.

The Once type is implemented using a mutex lock and atomic operations, while the WaitGroup type only uses atomic operations. Therefore, it can be said that they are both higher-level synchronization tools. They are both based on basic common tools and implement a certain specific functionality. Other advanced synchronization tools in the sync package are also like this.

Thought Question #

Today’s thought question is: When using the WaitGroup value to achieve collaboration between goroutines in a one-to-many workflow, how can the distributing goroutine obtain the specific execution results of each subtask?

Click here to view the Go language column article with detailed code.