38 Performance Analysis How to Analyze Performance of Go Code #
Hello, I’m Kong Lingfei.
As developers, we often focus on unit testing for functionality and tend to overlook performance details. However, if we don’t have a comprehensive understanding of the overall performance of our project when it goes live, we may encounter various issues as the request volume increases, such as high CPU usage, high memory utilization, and high request latency. To avoid these performance bottlenecks, we need to use certain methods during the development process to analyze the performance of our programs.
Go language has built-in tools and methods for performance optimization and monitoring, which greatly improve the efficiency of our profile analysis. With these tools, we can easily analyze the performance of Go programs. In Go development, developers mainly use the built-in pprof
package for performance analysis.
During performance analysis, we first use some tools and packages to generate performance data files, and then use the pprof
tool to analyze these files, thus analyzing the performance of the code. Now, let’s take a look at how to perform these two steps separately.
Generating Performance Data Files #
To view performance data, you need to generate performance data files first. There are three methods to generate performance data files, which are using the command line, using code, and using the net/http/pprof
package. These tools and packages will generate CPU and memory performance data respectively.
Next, let’s take a look at how these three methods generate performance data files.
Generating Performance Data Files Using the Command Line #
We can use go test -cpuprofile
to generate performance test data. Go to the internal/apiserver/service/v1 directory and execute the following command:
$ go test -bench=".*" -cpuprofile cpu.profile -memprofile mem.profile
goos: linux
goarch: amd64
pkg: github.com/marmotedu/iam/internal/apiserver/service/v1
cpu: AMD EPYC Processor
BenchmarkListUser-8 280 4283077 ns/op
PASS
ok github.com/marmotedu/iam/internal/apiserver/service/v1 1.798s
The above command will generate three files in the current directory:
- v1.test, the compiled binary file for testing, which can be used to resolve symbols during performance analysis.
- cpu.profile, the CPU performance data file.
- mem.profile, the memory performance data file.
Generating Performance Data Files Using Code #
We can also generate performance data files using code, for example, in the pprof.go file:
package main
import (
"os"
"runtime/pprof"
)
func main() {
cpuOut, _ := os.Create("cpu.out")
defer cpuOut.Close()
pprof.StartCPUProfile(cpuOut)
defer pprof.StopCPUProfile()
memOut, _ := os.Create("mem.out")
defer memOut.Close()
defer pprof.WriteHeapProfile(memOut)
Sum(3, 5)
}
func Sum(a, b int) int {
return a + b
}
Execute the pprof.go
file:
$ go run pprof.go
After running the pprof.go
file, cpu.profile
and mem.profile
performance data files will be generated in the current directory.
Generating Performance Data Files Using net/http/pprof
#
If you want to analyze the performance of an HTTP Server, you can use the net/http/pprof
package to generate performance data files.
In the IAM project, Gin framework is used as the HTTP engine. Therefore, the IAM project uses the github.com/gin-contrib/pprof
package to enable HTTP performance analysis. The github.com/gin-contrib/pprof
package is a simple wrapper for net/http/pprof
, which converts the pprof functionality into a Gin middleware, allowing the pprof middleware to be loaded as needed.
In the pprof.go file of the github.com/gin-contrib/pprof
package, there is the following code:
func Register(r *gin.Engine, prefixOptions ...string) {
prefix := getPrefix(prefixOptions...)
prefixRouter := r.Group(prefix)
{
...
prefixRouter.GET("/profile", pprofHandler(pprof.Profile))
...
}
}
func pprofHandler(h http.HandlerFunc) gin.HandlerFunc {
handler := http.HandlerFunc(h)
return func(c *gin.Context) {
handler.ServeHTTP(c.Writer, c.Request)
}
}
From the above code, you can see that the github.com/gin-contrib/pprof
package converts net/http/pprof.Profile
into gin.HandlerFunc
, which is the Gin middleware.
To enable HTTP performance analysis, you only need to register the pprof HTTP handler in the code (located in the internal/pkg/server/genericapiserver.go file):
// install pprof handler
if s.enableProfiling {
pprof.Register(s.Engine)
}
The above code determines whether to enable the HTTP performance analysis based on the --feature.profiling
configuration. After enabling the HTTP performance analysis, when starting the HTTP service iam-apiserver
, you can access http:// x.x.x.x:8080/debug/pprof
(x.x.x.x
is the address of the Linux server) to view the profiles information. The profiles information is shown in the following image:
We can use the following command to get the CPU performance data file:
$ curl http://127.0.0.1:8080/debug/pprof/profile -o cpu.profile
After executing the above command, wait for 30 seconds. pprof will collect performance data for these 30 seconds. During this time, we need to continuously send multiple requests to the server, and the frequency of the requests can be determined according to our scenario. After 30 seconds, the /debug/pprof/profile
endpoint will generate the CPU profile file, which will be saved in the cpu.profile
file in the current directory by the curl command.
Similarly, we can execute the following command to generate the memory performance data file:
$ curl http://127.0.0.1:8080/debug/pprof/heap -o mem.profile
The above command will automatically download the heap file, which will be saved in the mem.profile
file in the current directory by the curl command.
We can use the go tool pprof [mem|cpu].profile
command to analyze the CPU and memory performance of the HTTP interface. We can also use the command go tool pprof http://127.0.0.1:8080/debug/pprof/profile
or go tool pprof http://127.0.0.1:8080/debug/pprof/heap
to directly enter the interactive shell of the pprof tool. go tool pprof
will first download and save the CPU and memory performance data files, and then analyze these files.
Using the above three methods, we have generated cpu.profile
and mem.profile
, and now we can use go tool pprof
to analyze these two performance data files and analyze the CPU and memory performance of our program. Next, I will explain the process of performance analysis in detail.
Performance Analysis #
To analyze performance using go tool pprof
, you can refer to the following diagram:
First, let me introduce the pprof
tool, then explain how to generate performance data, and finally, I will introduce CPU and memory performance analysis methods.
Introduction to pprof Tool #
pprof is a Go program performance analysis tool that allows you to access and analyze performance data files. It also provides readable output information according to our requirements. Go integrates profile sampling tools at the language level, so you can simply import the runtime/pprof
or net/http/pprof
packages into your code to obtain the program’s profile files and perform performance analysis based on these files.
net/http/pprof
is a wrapper around the runtime/pprof
package and exposes it on an HTTP port.
Generating Performance Data #
When performing performance analysis, the main focus is on analyzing memory and CPU performance. To analyze the performance of memory and CPU, we need to generate performance data files first. In the IAM source code, there are also performance test cases available. Next, I will use the performance test cases in the IAM source code to demonstrate how to analyze the performance of a program.
Go to the internal/apiserver/service/v1 directory. The user_test.go
file contains the performance test function BenchmarkListUser. Execute the following command to generate performance data files:
$ go test -benchtime=30s -benchmem -bench=".*" -cpuprofile cpu.profile -memprofile mem.profile
goos: linux
goarch: amd64
pkg: github.com/marmotedu/iam/internal/apiserver/service/v1
cpu: AMD EPYC Processor
BenchmarkListUser-8 175 204523677 ns/op 15331 B/op 268 allocs/op
PASS
ok github.com/marmotedu/iam/internal/apiserver/service/v1 56.514s
The above command will generate the cpu.profile
, mem.profile
performance data files, and the v1.test
binary file in the current directory. Next, we will use these files to analyze the CPU and memory performance of the code. To obtain enough sampling data, we set the benchmark time to 30s
.
When performing performance analysis, different methods can be used, such as analyzing sampling graphs, analyzing flame graphs, or using the interactive mode of go tool pprof
to view CPU and memory consumption data of functions. I will use these methods to analyze CPU and memory performance.
CPU Performance Analysis #
By default, Go’s runtime system samples CPU usage at a frequency of 100 Hz, which means it samples 100 times per second or every 10 milliseconds. During each sample, the running function is recorded, and its execution time is measured to generate CPU performance data.
We have already generated the CPU performance data file cpu.profile
. Next, we will use the three methods mentioned above to analyze this performance file and optimize performance.
Method 1: Analyzing the Sampling Graph
The most intuitive way to analyze performance is through graphical representation. Therefore, we first need to generate a sampling graph, which involves two steps.
Step 1, make sure that graphviz
is installed on your system:
$ sudo yum -y install graphviz.x86_64
Step 2, generate the call graph by executing go tool pprof
:
$ go tool pprof -svg cpu.profile > cpu.svg # svg format
$ go tool pprof -pdf cpu.profile > cpu.pdf # pdf format
$ go tool pprof -png cpu.profile > cpu.png # png format
The above commands will generate the cpu.pdf
, cpu.svg
, and cpu.png
files, which contain the function call relationships and other sampling data. The image below shows an example:
This image consists of directed segments and rectangles. Let’s first look at the meaning of the directed segments.
The directed segments describe the function call relationships, while the rectangles contain CPU sampling data. From the graph, we can see that the end without an arrow calls the end with an arrow, indicating that the v1.(*userService).List
function calls the fake.(*policies).List
function.
The number 90ms
next to the segment indicates that the v1.(*userService).List
function, during the sampling period, consumed a total of 90ms
when calling the fake.(*policies).List
function. Through the function call relationships, we can determine which functions a certain function calls and how much time is spent on each function call.
Here, let’s interpret the important information in the call relationships of the graph:
In the accumulated sampling time (140ms) of runtime.schedule
, 10ms comes from direct calls to the runtime.goschedImpl
function, and 70ms comes from direct calls to the runtime.park_m
function. These data explain which functions call the runtime.schedule
function and how frequently it is called. It is also because of this reason that the time spent on function runtime.goschedImpl
calling function runtime.schedule
must be less than or equal to the accumulated sampling time of function runtime.schedule
.
Now let’s take a look at the sampling data in the rectangles. These rectangles generally contain three types of information:
- Function name/method name: This information includes the package name, structure name, and function/method name, making it easy for us to locate the function/method. For example,
fake.(*policies).List
indicates theList
method of thepolicies
structure in thefake
package. - Local sampling time and its proportion in the total sampling count: The local sampling time refers to the total time that the sampling point falls within this function.
- Accumulated sampling time and its proportion in the total sampling count: The accumulated sampling time refers to the total time that the sampling point falls within this function, as well as the functions directly or indirectly called by it.
We can explain the concepts of local sampling time and accumulated sampling time using the OutDir
function as shown in the image below:
We can consider the total execution time of the entire function as the accumulated sampling time, which includes the time spent on the white part of the code and the time spent on function calls (indicated by the red part). The time spent on the white part of the code can be considered as the local sampling time.
Through the accumulated sampling time, we can determine the total execution time of a function. The larger the accumulated sampling time, the more CPU time is consumed by calling it. However, you should note that this does not necessarily mean that the function itself is problematic. It could also be due to performance bottlenecks in the functions called directly or indirectly by this function. In this case, we should follow the function call relationships and look for the functions that consume the most CPU time. If the local sampling time of a function is large, it means that the function itself has a high execution time (excluding the time spent on calling other functions). In this case, we need to analyze the code of this function itself, rather than the code of the functions called directly or indirectly by this function. In the sampling chart, the larger the area of the rectangle, the longer the cumulative sampling time of the function. Therefore, if a function has a large area in the sampling chart, we need to analyze it carefully because there may be performance optimization opportunities.
Method 2: Analyze the Flame Graph
The sampling chart we discussed above may not be very intuitive for performance analysis. Here, we can generate flame graphs to visualize the performance bottlenecks. A flame graph is a tool invented by Brendan Gregg specifically for visualizing sampled stack traces as an intuitive image, named because the entire graph looks like a flickering flame.
The go tool pprof
command provides the -http
parameter, which allows us to view the sampling chart and flame graph through a web browser. Execute the following command:
$ go tool pprof -http="0.0.0.0:8081" v1.test cpu.profile
Then access http://x.x.x.x:8081/
(x.x.x.x
is the IP address of the server where the go tool pprof
command is executed), and various sampling view data will be displayed in the browser, as shown in the following figure:
The above UI page provides different sampling data views:
- Top: Similar to the form of linux top, sorted from high to low.
- Graph: The default view that shows the call relationship.
- Flame Graph: pprof flame graph.
- Peek: Similar to Top, also sorted from high to low.
- Source: Similar to the interactive command-line mode, with source code annotations.
- Disassemble: Show all totals.
Next, let’s focus on analyzing the flame graph. Select Flame Graph (VIEW -> Flame Graph) in the UI, and the flame graph will be displayed, as shown in the following figure:
The flame graph has the following features:
- Each column represents a call stack, and each cell represents a function.
- The y-axis shows the depth of the stack, arranged from top to bottom according to the call relationship. The bottom cell represents the function that was occupying the CPU at the time of sampling.
- The call stacks are sorted alphabetically from left to right, and identical call stacks are merged. Therefore, the wider a cell is, the more likely the corresponding function is a bottleneck.
- The color of the cells in the flame graph is randomly warm-toned, making it easy to distinguish between different call information.
When viewing the flame graph, the wider a cell is, the more likely there is a performance issue with the corresponding function. At this point, we can analyze the code of that function to find the problem.
Method 3: Use the go tool pprof
interactive mode to view detailed data
We can execute the go tool pprof
command to view the CPU performance data file:
$ go tool pprof v1.test cpu.profile
File: v1.test
Type: cpu
Time: Aug 17, 2021 at 2:17pm (CST)
Duration: 56.48s, Total samples = 440ms ( 0.78%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)
go tool pprof
outputs a lot of information:
- File: The name of the binary executable file.
- Type: The type of the sampling file, such as cpu, mem, etc.
- Time: The time at which the sampling file was generated.
- Duration: The program execution time. In the example above, the total execution time of the program is
37.43s
, and the sampling time is42.37s
. When sampling, the sampling program assigns sampling tasks to multiple cores, so the total sampling time may be greater than the total execution time. - (pprof): Command prompt indicating that we are currently in the
pprof
tool command line of thego tool
, andgo tool
includes multiple commands such ascgo
,doc
,pprof
,trace
, etc.
After executing the go tool pprof
command, we will enter an interactive shell. In this interactive shell, we can execute multiple commands. The most commonly used commands are three, as shown in the following table:
In the interactive interface, we execute the top
command to view the performance sample data:
(pprof) top
Showing nodes accounting for 350ms, 79.55% of 440ms total
Showing top 10 nodes out of 47
flat flat% sum% cum cum%
110ms 25.00% 25.00% 110ms 25.00% runtime.futex
70ms 15.91% 40.91% 90ms 20.45% github.com/marmotedu/iam/internal/apiserver/store/fake.(*policies).List
40ms 9.09% 50.00% 40ms 9.09% runtime.epollwait
40ms 9.09% 59.09% 180ms 40.91% runtime.findrunnable
30ms 6.82% 65.91% 30ms 6.82% runtime.write1
20ms 4.55% 70.45% 30ms 6.82% runtime.notesleep
10ms 2.27% 72.73% 100ms 22.73% github.com/marmotedu/iam/internal/apiserver/service/v1.(*userService).List
10ms 2.27% 75.00% 10ms 2.27% runtime.checkTimers
10ms 2.27% 77.27% 10ms 2.27% runtime.doaddtimer
10ms 2.27% 79.55% 10ms 2.27% runtime.mallocgc
In the output above, each row represents information about a function. The most important command in the pprof program is the topN
command, which is used to display the top N samples in the profile file. The top
command will output multiple lines of information, each line representing a function’s sample data, by default sorted by flat%
. The meaning of each column in the output is as follows:
- flat: The total time the sample point falls within this function.
- flat%: The percentage of sampling points that fall within the time of this function.
- sum%: The accumulated value of flat% for the previous lines, which is the cumulative percentage of the previous item.
- cum: The total time spent in this function and the functions called by it.
- cum%: The percentage of the total number of times the sampling point falls within this function and the functions called by it.
- Function name.
The above information can tell us the execution time and ranking of function performance. Based on this information, we can determine which functions may have performance issues or which functions can be further optimized.
I would like to remind you that if you execute go tool pprof mem.profile
, the meanings of the fields mentioned above are similar, except that this time it represents the size of memory allocations in bytes.
By default, the top
command is sorted by flat%
. When performing performance analysis, we need to first sort by cum
. By looking at cum
, we can intuitively see which function has the highest total time consumption. Then, we can refer to the local sampling time and call relationship of that function to determine whether it is the function itself that is causing the high time consumption or the functions it calls.
The output of top -cum
is as follows:
(pprof) top20 -cum
Showing nodes accounting for 280ms, 63.64% of 440ms total
Showing top 20 nodes out of 47
flat flat% sum% cum cum%
0 0% 0% 320ms 72.73% runtime.mcall
0 0% 0% 320ms 72.73% runtime.park_m
0 0% 0% 280ms 63.64% runtime.schedule
40ms 9.09% 9.09% 180ms 40.91% runtime.findrunnable
110ms 25.00% 34.09% 110ms 25.00% runtime.futex
10ms 2.27% 36.36% 100ms 22.73% github.com/marmotedu/iam/internal/apiserver/service/v1.(*userService).List
0 0% 36.36% 100ms 22.73% github.com/marmotedu/iam/internal/apiserver/service/v1.BenchmarkListUser
0 0% 36.36% 100ms 22.73% runtime.futexwakeup
0 0% 36.36% 100ms 22.73% runtime.notewakeup
0 0% 36.36% 100ms 22.73% runtime.resetspinning
0 0% 36.36% 100ms 22.73% runtime.startm
0 0% 36.36% 100ms 22.73% runtime.wakep
0 0% 36.36% 100ms 22.73% testing.(*B).launch
0 0% 36.36% 100ms 22.73% testing.(*B).runN
70ms 15.91% 52.27% 90ms 20.45% github.com/marmotedu/iam/internal/apiserver/store/fake.(*policies).List
10ms 2.27% 54.55% 50ms 11.36% runtime.netpoll
40ms 9.09% 63.64% 40ms 9.09% runtime.epollwait
0 0% 63.64% 40ms 9.09% runtime.modtimer
0 0% 63.64% 40ms 9.09% runtime.resetForSleep
0 0% 63.64% 40ms 9.09% runtime.resettimer (inline)
From the above output, we can see that the local sampling time percentages of v1.BenchmarkListUser
, testing.(*B).launch
, and testing.(*B).runN
are all 0%
, but their cumulative sampling time percentages are relatively high, at 22.73%
, 22.73%
, and 22.73%
, respectively.
Although the local sampling time percentage is small, the cumulative sampling time percentage is high, indicating that these three functions have high time consumption due to calling other functions, and they themselves have almost no time consumption. We can see the call relationship of the functions based on the sampling graph, as shown in the following image:
From the sampling chart, we can see that the final v1.BenchmarkListUser
called the v1.(*userService).List
function. The v1.(*userService).List
function is a function we wrote. The local sampling time percentage of this function is 2.27%
, but the cumulative sampling time percentage is as high as 22.73%
. This indicates that the v1.(*userService).List
function called other functions and consumed a large amount of CPU time.
By observing the sampling chart further, it can be seen that the long runtime of v1.(*userService).List
is due to the call to the fake.(*policies).List
function. We can also use the list
command to view the runtime performance inside the function:
list userService.*List
will list the runtime performance of the code inside the userService
struct’s List
method. From the above image, it can also be seen that u.store.Policies().List
takes the most time. The local sampling time percentage of fake.(*policies).List
is 15.91%
, indicating that the fake.(*policies).List
function itself may have bottlenecks. By reading the code of fake.(*policies).List
, it can be found that this function is a database query function, and database queries may have delays. By continuing to look at the v1.(*userService).List
code, we can find the following calling logic:
func (u *userService) ListWithBadPerformance(ctx context.Context, opts metav1.ListOptions) (*v1.UserList, error) {
...
for _, user := range users.Items {
policies, err := u.store.Policies().List(ctx, user.Name, metav1.ListOptions{})
...
})
}
...
}
In the for
loop, we called the fake.(*policies).List
function in sequence, and the delayed fake.(*policies).List
function was called for every iteration. With multiple calls, the runtime of the v1.(*userService).List
function naturally accumulates.
Now that we have identified the problem, how do we optimize it? You can take advantage of the CPU’s multi-core feature and start multiple goroutines. This way, our query time is not accumulated serially, but depends on the slowest fake.(*policies).List
call. The code for the optimized v1.(*userService).List
function can be found in internal/apiserver/service/v1/user.go. Tested with the same performance test case, the results are as follows:
$ go test -benchtime=30s -benchmem -bench=".*" -cpuprofile cpu.profile -memprofile mem.profile
goos: linux
goarch: amd64
pkg: github.com/marmotedu/iam/internal/apiserver/service/v1
cpu: AMD EPYC Processor
BenchmarkListUser-8 8330 4271131 ns/op 26390 B/op 484 allocs/op
PASS
ok github.com/marmotedu/iam/internal/apiserver/service/v1 36.179s
In the code above, ns/op
is 4271131 ns/op
. As can be seen, compared with the first test result of 204523677 ns/op
, the performance has improved by 97.91%
.
Here, please note that for your reference, I renamed the original v1.(*userService).List
function to v1.(*userService).ListWithBadPerformance
.
Memory Performance Analysis #
During the runtime of a Go program, the Go runtime system records all heap memory allocations. It does not matter at which moment the samples are taken or whether the number of allocated bytes has increased, as long as bytes are allocated and the quantity is sufficient, the profiler will sample it.
The method of memory performance analysis is similar to CPU performance analysis, so I won’t repeat it here. You can analyze it yourself using the generated memory performance data file mem.profile
.
Next, let me show you the effects before and after memory optimization. In the v1.(*userService).List
function (located in the internal/apiserver/service/v1/user.go file), we have the following code:
infos := make([]*v1.User, 0)
for _, user := range users.Items {
info, _ := m.Load(user.ID)
infos = append(infos, info.(*v1.User))
}
At this point, we run the go test
command to test the memory performance as the performance data after optimization and make a comparison:
$ go test -benchmem -bench=".*" -cpuprofile cpu.profile -memprofile mem.profile
goos: linux
goarch: amd64
pkg: github.com/marmotedu/iam/internal/apiserver/service/v1
cpu: AMD EPYC Processor
BenchmarkListUser-8 278 4284660 ns/op 27101 B/op 491 allocs/op
PASS
ok github.com/marmotedu/iam/internal/apiserver/service/v1 1.779s
The values for B/op
and allocs/op
are 27101 B/op
and 491 allocs/op
, respectively.
By analyzing the code, we found that we can optimize infos := make([]*v1.User, 0)
to infos := make([]*v1.User, 0, len(users.Items))
to reduce the number of memory reallocations for the Go slice. The optimized code is as follows:
//infos := make([]*v1.User, 0)
infos := make([]*v1.User, 0, len(users.Items))
for _, user := range users.Items {
info, _ := m.Load(user.ID)
infos = append(infos, info.(*v1.User))
}
Let’s execute go test
again to test the performance:
$ go test -benchmem -bench=".*" -cpuprofile cpu.profile -memprofile mem.profile
goos: linux
goarch: amd64
pkg: github.com/marmotedu/iam/internal/apiserver/service/v1
cpu: AMD EPYC Processor
BenchmarkListUser-8 276 4318472 ns/op 26457 B/op 484 allocs/op
PASS
ok github.com/marmotedu/iam/internal/apiserver/service/v1 1.856s
The optimized values for B/op
and allocs/op
are 26457 B/op
and 484 allocs/op
, respectively. Compared to the initial values of 27101 B/op
and 491 allocs/op
, the number of memory allocations is reduced, and each allocation size is smaller.
We can use the go tool pprof
command to view the CPU performance data file:
$ go tool pprof v1.test mem.profile
File: v1.test
Type: alloc_space
Time: Aug 17, 2021 at 8:33pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)
This command will enter an interactive mode, and you can use the top
command in the interactive mode to view the performance sample data, for example:
(pprof) top
Showing nodes accounting for 10347.32kB, 95.28% of 10859.34kB total
Showing top 10 nodes out of 52
flat flat% sum% cum cum%
3072.56kB 28.29% 28.29% 4096.64kB 37.72% github.com/marmotedu/iam/internal/apiserver/service/v1.(*userService).List.func1
1762.94kB 16.23% 44.53% 1762.94kB 16.23% runtime/pprof.StartCPUProfile
1024.52kB 9.43% 53.96% 1024.52kB 9.43% go.uber.org/zap/buffer.NewPool.func1
1024.08kB 9.43% 63.39% 1024.08kB 9.43% time.Sleep
902.59kB 8.31% 71.70% 902.59kB 8.31% compress/flate.NewWriter
512.20kB 4.72% 76.42% 1536.72kB 14.15% github.com/marmotedu/iam/internal/apiserver/service/v1.(*userService).List
512.19kB 4.72% 81.14% 512.19kB 4.72% runtime.malg
512.12kB 4.72% 85.85% 512.12kB 4.72% regexp.makeOnePass
512.09kB 4.72% 90.57% 512.09kB 4.72% github.com/marmotedu/iam/internal/apiserver/store/fake.FakeUsers
512.04kB 4.72% 95.28% 512.04kB 4.72% runtime/pprof.allFrames
The meanings of the fields in the above memory performance data are as follows:
flat
: Total memory consumption in this function for the sampled points.flat%
: Percentage of the sampled points in this function.sum%
: Accumulated percentage of the previous item.cum
: Total memory consumption in this function and the functions it calls for the sampled points.cum%
: Percentage of the total memory consumption in this function and the functions it calls for the sampled points.- Function name.
Summary #
When the performance of a Go project is low, we need to analyze the problematic code. The go tool pprof
tool provided by Go allows us to analyze the performance of the code. We can analyze the performance of the code through two steps: generating performance data files and analyzing performance data files.
There are three ways in Go to generate performance data files: generating performance data files through the command line, generating performance data files through code, and generating performance data files through net/http/pprof
.
After generating the performance data file, we can use the go tool pprof
tool to analyze it. We can obtain performance data for both CPU and memory and identify performance bottlenecks through analysis. There are three ways to analyze performance data files: analyzing sample graphs, analyzing flame graphs, and using the go tool pprof
interactive mode to view detailed data. Because flame graphs are intuitive and efficient, I recommend using them more often to analyze performance.
Exercises #
- Consider why the calling time of the function
runtime.goschedImpl
must be less than or equal to the accumulated sampling time of the functionruntime.schedule
. - In your development of Go projects, what are some good performance analysis ideas and methods? Please feel free to share them in the comments.
Feel free to communicate and discuss with me in the comments section. See you in the next lecture.