48 Program Performance Analysis Basics Part 1

48 Program Performance Analysis Basics - Part 1 #

As part of our series, today we will discuss the basics of Go program performance analysis.

Go language provides developers with a rich set of performance analysis APIs and very useful standard tools. These APIs are mainly found in the following packages:

  1. runtime/pprof;
  2. net/http/pprof;
  3. runtime/trace.

Apart from these, the runtime package also includes some lower-level APIs. They can be used to collect or output key metrics of the Go program during its execution, and help generate corresponding profile files for subsequent analysis.

As for the standard tools, there are two main ones: go tool pprof and go tool trace. They can parse the information in profile files and present it in a human-readable format.

In addition, the go test command can also generate profile files after the program tests are completed. This allows us to conveniently use the aforementioned tools to read the profile files and analyze the performance of the tested program. This undoubtedly enriches the data for program performance testing and provides more accurate and reliable results.

In Go, there are three types of profile files used for analyzing program performance: CPU profiles, Memory profiles, and Block profiles.

These profile files contain summarized information of various metrics of the Go program obtained through multiple sampling during a certain period of time.

For CPU profiles, each independent segment of summarized information records the Go code being executed on the CPU at the time of a certain sampling.

For Memory profiles, each segment of summarized information records the Go code being executed and the usage of heap memory at a certain sampling point, including the number of bytes and objects allocated and released. As for Block profiles, each segment of summarized information represents a goroutine block event in the Go program.

Note that by default, the information in these profile files is not plain text, but presented in binary format. If you try to view them with a regular text editor, you will see a bunch of “garbled characters”.

This is where the go tool pprof tool comes in handy. We can use it to enter an interactive command-line interface and browse the specified profile file. Here is an example:

$ go tool pprof cpuprofile.out
Type: cpu
Time: Nov 9, 2018 at 4:31pm (CST)
Duration: 7.96s, Total samples = 6.88s (86.38%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)

I won’t go into the specific usage of this tool here. Once you enter the interactive interface of this tool, you can simply enter the command help and press Enter to see detailed help documentation.

Now let’s talk about how to generate profile files.

You might ask, since the information in profile files is not plain text, what format are they actually in? A fact that may not be important to most developers is that they are generated as binary data streams, or byte streams, through protocol buffers.

In short, protocol buffers are a data serialization protocol as well as a serialization tool. It can convert a value, such as a struct or a dictionary, into a sequence of bytes.

Alternatively, it can convert a byte stream generated by it back into a value in the program. The former is called serialization, and the latter is called deserialization.

In other words, protocol buffers define and implement a way to “transform data between structural and flat forms”.

Protocol buffers have several advantages. For example, it can compress data while serializing it. As a result, the byte streams generated by protocol buffers are usually much smaller in size compared to other formats (such as XML and JSON) for the same data.

Another advantage is that it allows us to define the format for data serialization and structuring, and also allows us to update this format while ensuring backward compatibility.

Because of these advantages, starting from Go version 1.8, all the information about profiles is generated using protocol buffers. This is the fundamental reason why we can’t see plain text in the aforementioned profile files.

Protocol buffers have wide-ranging applications and are widely used in tasks such as data storage and data transmission. However, I will temporarily stop here regarding protocol buffers. The information you currently know is sufficient. You don’t need to worry about how the runtime/pprof package and the runtime package serialize this profile information.

Returning to the topic of generating profile files, we will explain it through specific questions.

Our question for today is: How do we make the program sample CPU profile information?

A typical answer to this question would be:

This requires using the API in the runtime/pprof package. More specifically, to start sampling CPU profile information in the program, you need to call the StartCPUProfile function in this package. And to stop sampling, you need to call the StopCPUProfile function in the same package.

Problem Analysis #

When the runtime/pprof.StartCPUProfile function (hereinafter referred to as the StartCPUProfile function) is called, it first sets the sampling frequency for CPU profiling and then collects and outputs CPU profiling information in a separate goroutine.

Note that the sampling frequency set by the StartCPUProfile function is always fixed at 100 Hertz. That is, it samples 100 times per second, or once every 10 milliseconds.

Hertz, also known as Hz, is a Chinese word transliterated from the English word “Hertz” (a surname). It is the basic unit of CPU frequency.

The CPU frequency refers to the clock frequency at which the CPU core operates, also commonly referred to as the CPU clock speed. The reciprocal of this clock frequency is the clock cycle, which is the time required for a CPU core to execute one computational instruction, in seconds.

For example, for a CPU with a frequency of 1000 Hz, the time required for a single core to execute one computational instruction is 0.001 seconds, or 1 millisecond. For example, on a modern multi-core CPU with a clock speed of 3.2 GHz, a single core can execute at least three computational instructions in one nanosecond.

The CPU profiling sampling frequency set by the StartCPUProfile function is very low compared to the modern CPU frequencies. There are two main reasons for this.

On the one hand, too high sampling frequency will have a significant negative impact on the efficiency of the Go program. Therefore, when the SetCPUProfileRate function in the runtime package is called, it ensures that the sampling frequency does not exceed 1 MHz (Megahertz), which means it allows sampling at most once every 1 microsecond. The StartCPUProfile function sets the CPU profiling sampling frequency by calling this function.

On the other hand, after a large number of experiments, the Go language team found that 100 Hz is a relatively appropriate setting. Because this not only provides enough useful profiling information, but also does not cause the program to stagnate. In addition, the operating system has limited processing capability for high-frequency sampling, and in general, a sampling frequency higher than 500 Hz is likely to result in a delayed response.

After the StartCPUProfile function is executed, a newly enabled goroutine will be responsible for collecting and outputting the CPU profiling information until the StopCPUProfile function in the runtime/pprof package is successfully called.

The StopCPUProfile function also calls the runtime.SetCPUProfileRate function and sets the parameter value (i.e., the sampling frequency) to 0. This stops the sampling for CPU profiling.

At the same time, it also sends a “signal” to the code responsible for collecting CPU profiling information to notify it to stop the collection work.

Upon receiving this “signal”, that portion of the code will write all the CPU profiling information collected during this time period to the writer specified when calling the StartCPUProfile function. The StopCPUProfile function only returns after all the above operations have been completed.

Now that I have explained this, you should have a certain understanding of the CPU profiling process. You can take a look at the code in the demo96.go file and run it a few times to deepen your understanding of this issue.

Summary #

In these two articles, we discussed the performance analysis of Go programs, which covers some essential knowledge and techniques for this task.

Firstly, we need to know that the APIs related to program performance analysis are mainly found in the code packages runtime, runtime/pprof, and net/http/pprof. These APIs can help us collect relevant performance summary information and output it to the specified location.

The Go runtime system will sample the relevant metrics of the program multiple times according to the requirements and organize and summarize the sampled results to form a complete performance analysis report. This report is the summary information we have been talking about.

Generally, we will output the summary information to a file. Depending on the type of summary information, there are three main types of summary files: CPU profile, memory profile, and block profile.

In this article, I presented a question related to the above-mentioned summary information. In the next article, we will continue to explore this question.

If you have any thoughts or doubts about today’s content, feel free to leave me a message. Thank you for listening, and see you next time.

Click here to view detailed code accompanying the Go Language column article.