49 Program Performance Analysis Basics - Part 2 #
Hello, I’m Hao Lin, and today we will continue sharing the basics of program performance analysis.
In the previous article, we discussed the topic of “how to sample high-level information from the CPU”. Today, let’s explore its related issues together.
Knowledge Expansion #
Question 1: How to set the sampling frequency of memory profile? #
The sampling of memory profile collects the heap memory usage of a Go program during runtime at a certain proportion. Setting the sampling frequency of memory profile is simple, just assign a value to the variable runtime.MemProfileRate
.
The meaning of this variable is, on average, how many bytes are allocated before sampling the heap memory usage. If the value of this variable is set to 0
, the Go runtime system will stop sampling the memory profile completely. The default value of this variable is 512 KB
.
Note that if you want to set the sampling frequency, it is better to set it early and only set it once, otherwise it may have a negative impact on the sampling work of the Go runtime system. For example, set it once at the beginning of the main
function.
After that, when we want to obtain the memory profile, we need to call the WriteHeapProfile
function in the runtime/pprof
package. This function will write the collected memory profile to the specified writer.
Note that the memory profile obtained through the WriteHeapProfile
function is not real-time, it is a snapshot taken at the completion of the most recent garbage collection. If you want real-time information, you can call the runtime.ReadMemStats
function. However, please note that this function will cause a brief pause in the Go scheduler.
Above is a brief answer to the question of setting the sampling frequency of memory profile.
Question 2: How to obtain the blocking profile? #
We can set the sampling frequency of the blocking profile by calling the SetBlockProfileRate
function in the runtime
package. This function takes an int
parameter named rate
.
The meaning of this parameter is, whenever a blocking event has a duration exceeding a certain number of nanoseconds, it can be sampled. If the value of this parameter is less than or equal to 0
, it means that the Go runtime system will stop sampling the blocking profile completely.
In the runtime
package, there is also a package-private variable named blockprofilerate
, which is of type uint64
. The meaning of this variable is, whenever a blocking event has a duration that spans a certain number of CPU cycles, it can be sampled. Its meaning is very similar to the meaning of the rate
parameter mentioned earlier, isn’t it?
In fact, the only difference between these two is the unit of measurement. The SetBlockProfileRate
function first converts the value of the rate
parameter into the desired unit and performs necessary type conversions, and then atomically assigns the converted result to the blockprofilerate
variable. Since the default value of this variable is 0
, the Go runtime system does not record any blocking events in the program by default.
On the other hand, when we want to obtain the blocking profile, we need to call the Lookup
function in the runtime/pprof
package and pass the value "block"
as the argument, which will return a *runtime/pprof.Profile
value (referred to as Profile
value for short). After that, we also need to call the WriteTo
method of this Profile
value to drive it to write the profile information into the specified writer.
This WriteTo
method has two parameters, one is the writer mentioned earlier, which is of type io.Writer
. The other parameter represents the level of detail of the profile information and is of type int
.
The main optional values of the debug
parameter are two, namely: 0
and 1
. When debug
is 0
, the profile information written to the writer by the WriteTo
method will only contain memory addresses needed by the go tool pprof
tool, and these memory addresses will be shown in hexadecimal form.
When debug
is 1
, the corresponding package names, function names, source file paths, and line numbers will be added as comments. In addition, the profile information in the case of debug
being 0
will be converted into byte stream via protocol buffers. When debug
is 1
, the profile information outputted by the WriteTo
method will be plain text that we can understand.
In addition, the value of debug
can also be 2
. In this case, the output summary information will be plain text and usually includes more details. As for what these details include, it depends on the parameter value we pass when calling the runtime/pprof.Lookup
function. Now let’s take a look at this function together.
Question 3: What is the correct way to call the runtime/pprof.Lookup
function?
#
The runtime/pprof.Lookup
function (hereinafter referred to as the Lookup
function) provides summary information corresponding to the given name. This summary information is represented by a Profile
value. If this function returns nil
, it means that there is no summary information corresponding to the given name.
The runtime/pprof
package has predefined 6 summary names. The corresponding methods for collecting and outputting summary information are already prepared. We can use them directly. They are: goroutine
, heap
, allocs
, threadcreate
, block
, and mutex
.
When we pass "goroutine"
to the Lookup
function, it will use the corresponding method to collect the stack trace information of all goroutines currently in use. Note that this collection will cause a brief pause in the Go language scheduler.
When the WriteTo
method of the Profile
value returned by the function is called, if the value of the debug
parameter is greater than or equal to 2
, the method will output the stack trace information of all goroutines. There may be a lot of this information. If they take up more than 64 MB
, the corresponding method will truncate the excess.
If the parameter value received by the Lookup
function is "heap"
, it will collect sampling information related to heap memory allocation and release. This is actually the memory summary information we discussed earlier. The subsequent operations are very similar when we pass "allocs"
.
In these two cases, the Profile
values returned by the Lookup
function are also very similar. The only difference is that when the WriteTo
method of these two Profile
values is called, the output summary information will have slight differences, and this only applies when the debug
parameter is equal to 0
.
"heap"
will cause the output memory summary information to default to the perspective of “in use space”, while the default perspective for "allocs"
is “allocated space”.
“In use space” refers to the allocated memory space that has not been released. In this perspective, the go tool pprof
tool does not consider the part of information related to released space. In the perspective of “allocated space”, all memory allocation information will be displayed, regardless of whether these memory spaces have been released during sampling.
In addition, whether it is "heap"
or "allocs"
, when we call the WriteTo
method of the Profile
value, as long as the value assigned to the debug
parameter is greater than 0
, the specifications of the output content will be the same.
The parameter value "threadcreate"
will cause the Lookup
function to collect some stack trace information. Each of these stack trace information describes a chain of code calls that lead to the creation of a new operating system thread. The output specifications of such Profile
values also only have two types, depending on whether the value passed to their WriteTo
method is greater than 0
.
Back to "block"
and "mutex"
. "block"
represents the stack trace information of the code that is blocked due to contention of synchronization primitives. Do you remember? This is the blocking summary information we talked about earlier.
In contrast, "mutex"
represents the stack trace information of the code that used to own the synchronization primitive. Their output specifications also only have two types, depending on whether debug
is greater than 0
.
The synchronization primitives mentioned here refer to a low-level synchronization tool or mechanism that exists in the Go language runtime system.
It operates directly on memory addresses and uses asynchronous semaphores and atomic operations as implementation methods. Channels, mutexes, condition variables, WaitGroup
, and the Go language runtime system itself all use it to implement their own functions.
Alright, we’ve talked quite a bit about this issue. I believe you now have a deep understanding of the usage and meaning behind the Lookup
function. The demo99.go
file contains some sample code for your reference.
Question 4: How to add a performance profiling interface to a network service based on the HTTP protocol? #
This question is actually quite simple. In most cases, all we need to do is import the net/http/pprof
package into our program, like this:
import _ "net/http/pprof"
Then, start the network service and begin listening, for example:
log.Println(http.ListenAndServe("localhost:8082", nil))
After running this program, we can access a minimalist webpage by visiting the address http://localhost:8082/debug/pprof
in a web browser. If you carefully read the previous question, you should be able to quickly understand the meaning of each section on this webpage.
There are many available subpaths under the /debug/pprof/
URL path, which you can explore by clicking on the links in the webpage. These six subpaths – allocs
, block
, goroutine
, heap
, mutex
, and threadcreate
– are all handled by the Lookup
function underneath. By now, you should already be familiar with this function.
All of these subpaths can accept a query parameter called debug
. It controls the format and level of detail of the summary information. I won’t go into the optional values for this parameter again. Its default value is 0
. Additionally, there is another query parameter called gc
that controls whether to force a garbage collection before obtaining the summary information. If its value is greater than 0
, the program will do so. However, this parameter is only effective under the /debug/pprof/heap
path.
Once the /debug/pprof/profile
path is accessed, the program will start sampling CPU profile information. It accepts a query parameter called seconds
, which specifies how long the sampling should last. If this parameter is not explicitly specified, the sampling will last for 30
seconds. Note that under this path, the program only responds with protobuf-converted byte streams. We can directly read such an HTTP response using the go tool pprof
tool, for example:
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=60
In addition, there is another path worth mentioning: /debug/pprof/trace
. Under this path, the program mainly handles our requests using the API provided by the runtime/trace
package.
More specifically, the program first calls the trace.Start
function, and then, after the duration specified by the seconds
query parameter, it calls the trace.Stop
function. The default value for seconds
is 1
second. As for the functionality of the runtime/trace
package, I’ll leave it to you to explore and discover.
The aforementioned URL paths are fixed and unchangeable. These are the default access rules. However, we can also customize them, like this:
mux := http.NewServeMux()
pathPrefix := "/d/pprof/"
mux.HandleFunc(pathPrefix,
func(w http.ResponseWriter, r *http.Request) {
name := strings.TrimPrefix(r.URL.Path, pathPrefix)
if name != "" {
pprof.Handler(name).ServeHTTP(w, r)
return
}
pprof.Index(w, r)
})
mux.HandleFunc(pathPrefix+"cmdline", pprof.Cmdline)
mux.HandleFunc(pathPrefix+"profile", pprof.Profile)
mux.HandleFunc(pathPrefix+"symbol", pprof.Symbol)
mux.HandleFunc(pathPrefix+"trace", pprof.Trace)
server := http.Server{
Addr: "localhost:8083",
Handler: mux,
}
As you can see, with just a few entities from the net/http/pprof
package, we were able to achieve this customization. This is especially useful when using third-party network service development frameworks.
The access rules included in our custom HTTP request multiplexer mux
are similar to the default rules, except that the URL path prefix is a bit shorter.
The process of customizing mux
is similar to what the init
function in the net/http/pprof
package does. In fact, the existence of this init
function is the reason why we can access the related paths simply by importing the “net/http/pprof” package.
When we write network service programs, using the net/http/pprof
package is much more convenient and practical than directly using the runtime/pprof
package. Through proper utilization, this package can provide strong support for monitoring network services. This is as much as I’ll introduce about this package for now.
Summary #
In these two articles, we mainly talked about the performance analysis of Go programs, and many of the things mentioned are essential knowledge and skills for you. These will help you truly understand the series of operations represented by sampling, collection, and output.
I mentioned several issues related to summary information. What you need to remember is what each type of summary information represents and what kind of content it contains.
You also need to know the correct way to obtain them, including how to start and stop sampling, how to set the sampling frequency, and how to control the format and level of detail of the output.
In addition, the correct way to call the Lookup
function in the runtime/pprof
package is also important. For other summary information besides CPU summary, we can obtain them by calling this function.
In addition, I also mentioned an upper-level application, which is to add a performance analysis interface to network services based on the HTTP protocol. This is also a very practical part.
Although the net/http/pprof
package provides not many program entities, it allows us to embed performance analysis interfaces in different ways. Some of these ways are extremely simple and ready-to-use, while others are used to meet various custom requirements.
These are the Go language knowledge that I have presented to you today, and they are the foundation of program performance analysis. If you use Go language programs in a production environment, you will definitely come across them. I hope you can think about and understand all the content and issues mentioned here seriously. Only in this way can you truly handle them with ease when you actually use them.
Thought Questions #
The thought question I left for you today has actually been revealed before, which is: What is the function of the runtime/trace
package?
Thank you for listening, and we’ll see you next time.
Click here to view the detailed code accompanying the Go language column articles.