31 sync #
The mutex, condition variable, and atomic operation we discussed in the past few lessons are all basic and important synchronization tools. In Go, they are also among the most commonly used concurrency-safe tools, aside from channels.
Speaking of channels, I don’t know if you’ve ever thought about it, but in some cases, the way we use channels seems a bit clumsy.
For example: declare a channel, make its capacity equal to the number of goroutines we activate manually, then use this channel to make the main goroutine wait for the completion of other goroutines.
To be more specific, let other goroutines send an element value to this channel before they finish running, and let the main goroutine receive an element value from this channel in the end. The number of times it receives should be the same as the number of other goroutines.
This is the multi-goroutine coordination process demonstrated in the coordinateWithChan
function below.
func coordinateWithChan() {
sign := make(chan struct{}, 2)
num := int32(0)
fmt.Printf("The number: %d [with chan struct{}]\n", num)
max := int32(10)
go addNum(&num, 1, max, func() {
sign <- struct{}{}
})
go addNum(&num, 2, max, func() {
sign <- struct{}{}
})
<-sign
<-sign
}
The declaration of the addNum
function is in the demo65.go
file. The addNum
function will use its last parameter value as a defer
function.
Both of the goroutines I activate manually will call the addNum
function, and the last parameter value they pass to this function (which is a function with no parameter or result declaration) will only do one thing: send an element value to the sign
channel.
Have you seen the last two lines of code in the coordinateWithChan
function? The repeated two receive expressions, <-sign
, don’t they look ugly?
Introduction: WaitGroup
type in the sync
package
#
In fact, in this scenario, we can choose another synchronization tool, which is the WaitGroup
type in the sync
package. It is more suitable for implementing this kind of one-to-many goroutine collaboration process than channels.
The WaitGroup
type, abbreviated as WaitGroup
, is ready to use and also concurrency-safe. Similar to the synchronization tools we discussed earlier, once it is used, it cannot be copied.
The WaitGroup
type has three pointer methods: Add
, Done
, and Wait
. You can imagine that this type has a counter with a default value of 0
. We can increase or decrease the value of this counter by calling the Add
method of a value of this type.
In general, I use this method to record the number of goroutines that need to wait. Correspondingly, the Done
method of this type is used to decrement the value of the counter of its owning value. We can call it with a defer
statement in the goroutine that needs to wait.
The purpose of the Wait
method of this type is to block the current goroutine until the counter in its owning value becomes zero. If the value of that counter is already 0
when the method is called, it will do nothing.
You may have noticed that a value of the WaitGroup
type, referred to as a WaitGroup
value, can completely replace the channel sign
in the coordinateWithChan
function. The modified version of it is shown below.
func coordinateWithWaitGroup() {
var wg sync.WaitGroup
wg.Add(2)
num := int32(0)
fmt.Printf("The number: %d [with sync.WaitGroup]\n", num)
max := int32(10)
go addNum(&num, 3, max, wg.Done)
go addNum(&num, 4, max, wg.Done)
wg.Wait()
}
Clearly, the overall code is reduced by several lines and appears more concise. Here, I first declare a variable wg
of the WaitGroup
type. Then, I call its Add
method and pass 2
as the argument because I will enable two goroutines that need to wait later.
Since the Done
method of the wg
variable is a function with no parameter and no result declaration itself, I can directly pass this method as the last argument to the addNum
function in the go
statement.
Finally, in the coordinateWithWaitGroup
function, I call the Wait
method of wg
. Therefore, this function will wait until both of those two goroutines are finished running before terminating.
That concludes the most typical use case of the WaitGroup
type. However, we should not stop here. It is necessary to further understand this type. Let’s explore the following question together.
Question: Can the counter value in a sync.WaitGroup
type go below 0
?
The typical answer here is: No, it cannot.
Problem Analysis #
Let’s analyze why this is not allowed. The reason WaitGroup
’s counter cannot be smaller than 0
is because it will cause a panic. Both the Done
and Add
methods, when called inappropriately on this type of value, will cause a panic. Additionally, don’t forget that we can pass a negative number to the Add
method.
In reality, there are more than one way to cause a panic when using the WaitGroup
value.
You need to know that after declaring this type of variable, we should first call its Add
method to ensure that the counter is greater than 0
, based on the number of goroutines or other events we want to wait for. This is necessary to ensure that we can use this type of value properly later on.
If we call the Add
method for the first time at the same time as calling the Wait
method, for example, in two concurrently enabled goroutines, then it is possible for the Add
method here to panic.
Although this situation is not easy to reproduce, that’s exactly why we need to pay attention to it. Therefore, although the WaitGroup
value itself does not need to be initialized, it is still necessary to increase the value of its counter as early as possible.
In addition, you may already know that the WaitGroup
value can be reused, but the integrity of its count cycle must be maintained. Here, the count cycle refers to a process in which the counter value of the value changes from 0
to a positive integer, and then goes through a series of changes before eventually returning to 0
.
In other words, as long as the counter starts at 0
and returns to 0
, it can be considered a count cycle. In the lifecycle of a value like this, it can go through any number of count cycles. However, only after it completes the current count cycle can it start the next count cycle.
- Count cycle of
sync.WaitGroup
Therefore, if the Wait
method of this type of value is called during one of its count cycles, it will immediately block the current goroutine until this count cycle is completed. In this case, the next count cycle of the value must wait until this Wait
method is finished before it can start.
If during the execution of the Wait
method of this value, two count cycles are crossed, then it will cause a panic.
For example, when the current goroutine is blocked due to calling the Wait
method, another goroutine calls the Done
method of that value and makes its counter value 0
.
This would awaken the current goroutine and make it try to continue executing the remaining code in the Wait
method. But at this point, another goroutine calls its Add
method and changes its counter value from 0
to a positive integer. In this case, the Wait
method here will immediately throw a panic.
Looking at the last two cases that can cause a panic, we can summarize a taboo on the use of WaitGroup
values, which is: do not perform operations that increase the counter value and calling the Wait
method on the same WaitGroup
value concurrently. In other words, we must prevent the two operations on the same WaitGroup
value from being executed at the same time.
Apart from the first case, we usually need to experiment repeatedly to make the methods of WaitGroup
value throw panics. Again, although this does not happen every time, the probability of this happening is not small in long-running programs, so we must take them seriously.
If you are interested in reproducing these exceptional cases, you can refer to the waitgroup_test.go
file in the sync
code package. The test functions prefixed with TestWaitGroupMisuse
demonstrate the conditions under which these exceptional cases occur. You can imitate these test functions to write your own test code and try running it out.
Knowledge Expansion #
Question: How does the Do
method of the sync.Once
type ensure that the provided function is executed only once?
#
Similar to the sync.WaitGroup
type, the sync.Once
type is also a struct type that is ready to use out of the box and is concurrently safe. Because this type contains a field of type sync.Mutex
, copying the value of this type also causes the functionality to be lost.
The Do
method of the Once
type only accepts one parameter, which must be of type func()
, i.e.: a function with no parameter declaration or result declaration.
The purpose of this method is not to execute the provided function only once for every possible function, but rather to execute only the function that was passed in when it is called for the first time and to not execute any other functions thereafter.
Therefore, if you have multiple functions that need to be executed only once, you should assign a separate Once
value for each of them.
The Once
type also has a field named done
, which is of type uint32
. Its purpose is to record the number of times the Do
method of its associated value has been called. However, the value of this field can only be 0
or 1
. Once the initial invocation of the Do
method is completed, its value changes from 0
to 1
.
You may wonder why a uint32
type that requires four bytes is used when the value of the done
field can only be 0
or 1
.
The reason is simple: the operations on it must be “atomic”. The Do
method begins by calling the atomic.LoadUint32
function to retrieve the value of this field and, once it finds that the value is 1
, it immediately returns. This ensures that “the Do
method only executes the function passed in when it is first called”.
However, this sole conditional guarantee is not enough. If two goroutines both call the Do
method of the same new Once
value and almost simultaneously reach this conditional check code, they will both continue executing the remaining code in the Do
method because the result of the check is false
.
After this conditional check, the Do
method immediately locks the sync.Mutex
field m
of its associated value. Then, it checks the value of the done
field again within the critical section and only calls the provided function and changes the value of done
to 1
through atomic operations if the condition is met.
If you are familiar with the Singleton pattern in the GoF design patterns, you will surely notice that the implementation of this Do
method has many similarities to that pattern. Both of them first check a key condition outside the critical section, and if the condition is not met, they immediately return. This is often referred to as the “fast path” or “early exit path”.
If the condition is met, the key condition is checked again within the critical section mainly for extra caution. These two condition checks are commonly referred to as the (cross-critical section) “double check”.
Since the critical section must be lock protected by the mutex m
before entering it, it will undoubtedly slow down the execution speed of the code. Therefore, the second condition check and subsequent operations are called the “slow path” or “regular path”.
Although the code in the Do
method is not much, it applies a very classic programming paradigm. In Go and its standard library, we can find many applications of this classic paradigm and its derived versions.
Next, let’s talk about two characteristics of the Do
method in terms of functionality.
The first characteristic is that since the Do
method changes the value of the done
field to 1
only after the provided function has finished executing, if the execution of the provided function takes a long time or never ends (for example, performing some daemon tasks), it is possible for related goroutines to be simultaneously blocked due to calling this Do
method.
For example, if multiple goroutines concurrently call the Do
method of the same Once
value and the provided function continues executing without ending, then all these goroutines will be blocked by calling this Do
method except for the one goroutine that preemptively executes the provided function. This is because, apart from the goroutine that executed the provided function first, the other goroutines will be blocked at the line of code that locks the mutex m
protecting this Once
value.
The second characteristic is that after the provided function finishes executing, the Do
method uses an atomic operation to assign a value to the done
field, and this operation is placed in a defer
statement. Therefore, no matter how the execution of the provided function ends, the value of the done
field will always become 1
.
In other words, even if this provided function fails to execute successfully (for example, if it triggers a panic), we cannot re-execute it with the same Once
value. So, if you need to set up a retry mechanism for the execution of the provided function, you need to consider timely replacing the Once
value.
In many cases, we need to design the flow related to the Do
method based on these two characteristics in order to avoid unnecessary program blocking and functional deficiencies.
Summary #
The WaitGroup
type and Once
type of the sync
package are both very easy-to-use synchronization tools. They are both out-of-the-box ready and concurrency safe.
By using the WaitGroup
value, we can easily implement a one-to-many goroutine collaboration process, where one goroutine distributes sub-tasks and multiple goroutines execute the sub-tasks together to complete a larger task.
When using the WaitGroup
value, we must be careful not to let the value of its counter go below 0
, otherwise it will cause a panic.
Furthermore, it is best to use the standard approach of “first adding, then concurrently calling Done
, and finally calling Wait
” when using the WaitGroup
value. Especially, do not use Add
to increase the counter value concurrently while calling Wait
at the same time, as this may also cause a panic.
The Once
value is simpler to use than the WaitGroup
value, as it only has one method called Do
. The Do
method of the same Once
value will only execute the parameter function passed in the first time it is called, regardless of how this function ends.
As long as the parameter function passed to a Do
method has not finished executing, any goroutine calling that method will be blocked. Only after this parameter function has finished executing, will these goroutines be awakened one by one.
The Once
type is implemented using a mutex lock and atomic operations, while the WaitGroup
type only uses atomic operations. Therefore, it can be said that they are both higher-level synchronization tools. They are both based on basic common tools and implement a certain specific functionality. Other advanced synchronization tools in the sync
package are also like this.
Thought Question #
Today’s thought question is: When using the WaitGroup
value to achieve collaboration between goroutines in a one-to-many workflow, how can the distributing goroutine obtain the specific execution results of each subtask?
Click here to view the Go language column article with detailed code.