39 Performance Analysis API Server Performance Testing and Optimizations in Practice

39 Performance Analysis API Server Performance Testing and Optimizations in Practice #

Hello, I’m Kong Lingfei.

In the previous lecture, we learned how to analyze the performance of Go code. After mastering the basics of performance analysis, in this lecture, let’s take a look at how to analyze the performance of API interfaces.

Before an API goes live, we need to know its performance in order to understand the maximum request load the API server can handle and identify performance bottlenecks. Based on the business requirements for performance, we can then optimize or scale the API. By doing so, we can provide stable API services to the outside world and ensure that requests are returned in a reasonable amount of time. In this lecture, I will introduce how to use the wrk tool to test the performance of API server interfaces and provide analysis methods and results.

API Performance Metrics #

API performance testing, broadly speaking, includes the performance of the API framework and the performance of specific APIs. However, because the performance of a specific API depends on its specific implementation (such as the presence or absence of a database connection, the presence or absence of complex logic processing, etc.), I believe that discussing the performance of a single API without considering its specific implementation is meaningless. Therefore, this section will only discuss the performance of the API framework.

There are three main metrics used to measure API performance:

  • Concurrent Users: Concurrent users refer to the number of users simultaneously using the system within a certain time range. In a broad sense, concurrent users refer to the number of users simultaneously using the system, and these users may call different APIs. In a strict sense, concurrent users refer to the number of users simultaneously requesting the same API. In this section, we will discuss concurrent users in the strict sense.
  • Queries Per Second (QPS): QPS is a measure of how much traffic a specific query server can handle within a specified time. QPS = Concurrent Users / Average Request Response Time.
  • Time to Last Byte (TTLB): Time to Last Byte refers to the total time from when a client sends a request to when it receives a response. This process starts from when a client initiates a request and ends when the client receives the last byte of the server’s response. In some tools, the request response time is often referred to as TTLB. The unit of request response time is usually “seconds” or “milliseconds”.

Among these three metrics, the most important one for measuring API performance is QPS. However, when describing QPS, it is necessary to specify the QPS under which concurrent users, otherwise it is meaningless because the QPS under different concurrent users is different. For example, 100 QPS for a single user and 100 QPS for 100 users are two different concepts. The former indicates that the API can execute 100 requests in one second in a serial manner, while the latter indicates that the API can process 100 requests in one second under a concurrency of 100. When the QPS is the same, the larger the concurrency, the better the API performance and the stronger the concurrent processing capability.

When the concurrency is set too high, the API has to handle a large number of requests simultaneously, resulting in frequent context switching and less time available for actual request processing, which may lead to a decrease in QPS. When the concurrency is set too high, the request response time will also become longer. An API has an optimal concurrency where its QPS can reach the maximum, but this concurrency may not be the optimal one, and the average request response time under this concurrency needs to be considered.

Furthermore, in some API interfaces, the TPS (Transactions Per Second) of the API interface is also tested. A transaction refers to the process of a client sending a request to the server and the server responding. The client starts timing when it sends the request and ends timing after receiving the server’s response, in order to calculate the time used and the number of transactions completed.

So what is the difference between TPS and QPS? If it is a performance test for a single query interface (single scenario), and this interface does not make additional requests to other interfaces internally, then TPS = QPS; otherwise, TPS≠QPS. If it is a performance test for multiple interfaces (mixed scenarios), assuming N interfaces are all query interfaces and these interfaces do not make additional requests to other interfaces internally, then QPS = N*TPS.

API Performance Testing Methods #

There are many web performance testing tools available on Linux, including Jmeter, AB, Webbench, and wrk. Each tool has its own characteristics, but for the IAM project, we use wrk to perform API performance testing. wrk is very simple to use, easy to install, and provides professional test results. It also supports Lua scripts to create more complex test scenarios. Now, let me introduce the installation and usage of wrk.

Installation of wrk #

Installing wrk is very simple and can be done in two steps.

Step 1: Clone the wrk repo:

$ git clone https://github.com/wg/wrk

Step 2: Compile and install:

$ cd wrk
$ make
$ sudo cp ./wrk /usr/bin

Introduction to wrk usage #

Now let’s take a look at how to use wrk. wrk is easy to use and you can see all the running parameters by executing wrk --help:

$ wrk --help
Usage: wrk <options> <url>
  Options:
    -c, --connections <N>  Connections to keep open
    -d, --duration    <T>  Duration of test
    -t, --threads     <N>  Number of threads to use

    -s, --script      <S>  Load Lua script file
    -H, --header      <H>  Add header to request
        --latency          Print latency statistics
        --timeout     <T>  Socket/request timeout
    -v, --version          Print version details

  Numeric arguments may include a SI unit (1k, 1M, 1G)
  Time arguments may include a time unit (2s, 2m, 2h)

Some commonly used parameters include:

  • -t: number of threads (the number of threads should not be too high, ideally 2 to 4 times the number of cores, as having too many threads can reduce efficiency due to excessive thread switching).
  • -c: number of concurrent connections.
  • -d: test duration (default is 10s).
  • -T: request timeout.
  • -H: specify HTTP headers for the request. Some APIs require certain headers, which can be passed using the -H parameter of wrk.
  • –latency: print latency statistics.
  • -s: specify a Lua script. Lua scripts can be used to implement more complex requests.

Now let’s take a look at a wrk test result and analyze it.

A simple test looks like this (make sure the iam-apiserver is already started and health checks are enabled):

$ wrk -t144 -c30000 -d30s -T30s --latency http://10.0.4.57:8080/healthz
Running 30s test @ http://10.0.4.57:8080/healthz
  144 threads and 30000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   508.77ms  604.01ms   9.27s    81.59%
    Req/Sec   772.48      0.94k   10.45k    86.82%
  Latency Distribution
     50%  413.35ms
     75%  948.99ms
     90%    1.33s
     99%    2.44s
  2276265 requests in 30.10s, 412.45MB read
  Socket errors: connect 1754, read 40, write 0, timeout 0
Requests/sec:  75613.16
Transfer/sec:     13.70MB

Now let’s analyze the test result:

  • 144 threads and 30000 connections: simulating 20000 connections with 144 threads, corresponding to the -t and -c parameters.
  • Thread Stats: includes latency and Req/Sec.
    • Latency: response time, with average, standard deviation, maximum, and +/- standard deviation.
    • Req/Sec: requests completed per second by each thread, also with average, standard deviation, maximum, and +/- standard deviation.
  • Latency Distribution: response time distribution.
    • 50%: 50% of response time is 413.35ms.
    • 75%: 75% of response time is 948.99ms.
    • 90%: 90% of response time is 1.33s.
    • 99%: 99% of response time is 2.44s.
  • 2276265 requests in 30.10s, 412.45MB read: total number of requests completed in 30.10s (2276265) and data read (412.45MB).
  • Socket errors: connect 1754, read 40, write 0, timeout 0: error statistics, including the number of connection failures (1754), read failures, write failures, and timeout requests.
  • Requests/sec: QPS.
  • Transfer/sec: average data read per second is 13.70MB (throughput).

API Server Performance Testing Practice #

Next, let’s test the performance of the API Server. There are many factors that can affect the performance of the API Server, including the iam-apiserver itself, server hardware and configuration, testing methods, and network environment. To facilitate your comparison of performance test results, I will provide my test environment configuration for reference.

  • Client hardware configuration: 1 core, 4GB RAM.
  • Client software configuration: Clean CentOS Linux release 8.2.2004 (Core).
  • Server hardware configuration: 2 cores, 8GB RAM.
  • Server software configuration: Clean CentOS Linux release 8.2.2004 (Core).
  • Test network environment: Access within Tencent Cloud VPC, no resource-intensive business programs other than the performance testing program.

The test architecture is shown in the following diagram:

Introduction to Performance Testing Script #

When performing performance testing on the API Server, we need to execute wrk first to generate performance testing data. In order to visually view the performance data, we also need to display these performance data in the form of charts. In this lecture, I use the gnuplot tool to automate the drawing of these performance charts. To do this, we need to ensure that the gnuplot tool is installed on the Linux server. You can install it using the following command:

$ sudo yum -y install gnuplot

In this test, I will generate the following two charts to observe and analyze the performance of the API Server:

  • QPS & TTLB Chart: The X axis represents concurrency (Concurrent), and the Y axis represents Queries Per Second (QPS) and Time To Last Byte (TTLB) response time.
  • Success Rate Chart: The X axis represents concurrency (Concurrent), and the Y axis represents the success rate of requests.

To facilitate testing the performance of the API interface, I have encapsulated the performance testing and plotting logic in the scripts/wrktest.sh script. You can execute the following command in the root directory of the iam source code to generate performance testing data and performance charts:

$ scripts/wrktest.sh http://10.0.4.57:8080/healthz

The above command will perform performance testing, record performance testing data, and draw QPS and success rate charts based on this data.

Next, let me introduce the performance testing script wrktest.sh, and provide an example of how to use it.

wrktest.sh is a performance testing script used to test the performance of the API Server, record the performance testing data, and draw performance charts using gnuplot based on the performance data.

wrktest.sh can also compare the performance testing results before and after, and display the comparison results through charts. wrktest.sh will automatically calculate the appropriate number of wrk threads to be started (-t) based on the number of CPU cores: CPU cores * 3.

By default, wrktest.sh will test the API performance under multiple concurrency levels, and the default concurrency values to test are 200 500 1000 3000 5000 10000 15000 20000 25000 50000. You need to select the maximum concurrency level to test based on your server configuration. Due to my server configuration not being high enough (mainly 8GB RAM, easily exhausted under high concurrency), I chose 50000 as the maximum concurrency level. If your server configuration is high enough, you can try testing the API performance under 100000, 200000, 500000, 1000000 concurrency levels.

The usage of wrktest.sh is as follows:

$ scripts/wrktest.sh -h

Usage: scripts/wrktest.sh [OPTION] [diff] URL
Performance automation test script.

URL                    HTTP request URL, like: http://10.0.4.57:8080/healthz
  diff                   Compare two performance test results

OPTIONS:
  -h                     Usage information
  -n                     Performance test task name, default: apiserver
  -d                     Directory used to store performance data and gnuplot graphic, default: _output/wrk

Report bugs to <[[email protected]](/cdn-cgi/l/email-protection)>.

The command line parameters provided by wrktest.sh are as follows.

  • URL: The API interface to be tested.
  • diff: If comparing the results of two tests, execute wrktest.sh diff.
  • -n: The name of the test task, wrktest.sh will generate files based on the task name.
  • -d: Output file storage directory.
  • -h: Print usage information.

Below, I will demonstrate an example of using wrktest.sh.

wrktest.sh has two main functions, which are running performance tests and obtaining results, and comparing performance test results. Below, I will introduce their specific usage methods separately.

  1. Running performance tests and obtaining results

Execute the following command:

$ scripts/wrktest.sh http://10.0.4.57:8080/healthz
Running wrk command: wrk -t3 -d300s -T30s --latency -c 200 http://10.0.4.57:8080/healthz
Running wrk command: wrk -t3 -d300s -T30s --latency -c 500 http://10.0.4.57:8080/healthz
Running wrk command: wrk -t3 -d300s -T30s --latency -c 1000 http://10.0.4.57:8080/healthz
Running wrk command: wrk -t3 -d300s -T30s --latency -c 3000 http://10.0.4.57:8080/healthz
Running wrk command: wrk -t3 -d300s -T30s --latency -c 5000 http://10.0.4.57:8080/healthz
Running wrk command: wrk -t3 -d300s -T30s --latency -c 10000 http://10.0.4.57:8080/healthz
Running wrk command: wrk -t3 -d300s -T30s --latency -c 15000 http://10.0.4.57:8080/healthz
Running wrk command: wrk -t3 -d300s -T30s --latency -c 20000 http://10.0.4.57:8080/healthz
Running wrk command: wrk -t3 -d300s -T30s --latency -c 25000 http://10.0.4.57:8080/healthz
Running wrk command: wrk -t3 -d300s -T30s --latency -c 50000 http://10.0.4.57:8080/healthz

Now plot according to /home/colin/_output/wrk/apiserver.dat
QPS graphic file is: /home/colin/_output/wrk/apiserver_qps_ttlb.png
Success rate graphic file is: /home/colin/_output/wrk/apiserver_successrate.pngz

The above command will generate 3 files in the _output/wrk/ directory:

  • apiserver.dat: The wrk performance test result, with columns representing concurrency, QPS average response time, and success rate.
  • apiserver_qps_ttlb.png: The QPS & TTLB graph.
  • apiserver_successrate.png: The success rate graph.

Please note that the IP address in the requested URL should be the Tencent Cloud VPC internal address, because accessing via the internal network not only provides the lowest network latency, but also the highest security. Therefore, real businesses usually use internal network access.

  1. Comparing performance test results

Assuming we have another API performance test, and the test data is saved in the _output/wrk/http.dat file.

Execute the following command to compare the two test results:

$ scripts/wrktest.sh diff _output/wrk/apiserver.dat _output/wrk/http.dat

apiserver.dat and http.dat are two Wrk performance data files that need to be compared. The above command will generate the following two files in the _output/wrk directory:

  • apiserver_http.qps.ttlb.diff.png, QPS & TTLB comparison chart.
  • apiserver_http.success_rate.diff.png, success rate comparison chart.

Disable Debug Configuration Options #

Before testing, we need to disable some debug options to avoid affecting the performance test.

Perform the following two steps to modify the configuration file of the iam-apiserver:

  • Set server.mode to release and remove the dump and logger middlewares from server.middlewares.
  • Set log.level to info and remove stdout from log.output-paths.

Because we want to analyze the performance of the program during stress testing, we need to set feature.profiling to true to enable performance profiling. After making the modifications, restart the iam-apiserver.

Use wrktest.sh to test IAM API interface performance #

After disabling the Debug Configuration Options, you can use the wrktest.sh command to test the API performance (the default concurrency values for the tests are 200 500 1000 3000 5000 10000 15000 20000 25000 50000):

$ scripts/wrktest.sh http://10.0.4.57:8080/healthz

The generated QPS & TTLB chart and the success rate chart are shown below:

Chart

In the above chart, the X axis represents the concurrency level (Concurrent), and the Y axis represents the queries per second (QPS) and the time-to-last-byte (TTLB).

Chart

In the above chart, the X axis represents the concurrency level (Concurrent), and the Y axis represents the success rate.

From the above two charts, you can see that the API Server achieves the maximum QPS at a concurrency level of 200. At a concurrency level of 500, the average response time is 56.33ms and the success rate is 100.00%. The success rate starts to decline when the concurrency level reaches 1000. Some detailed data cannot be seen from the charts, but you can directly view the apiserver.dat file, which contains specific QPS, TTLB, and success rate data for each concurrency level.

Now that we have the performance data for the API Server, what is the level of its QPS? On the one hand, you can compare it according to your own business needs; on the other hand, you can compare it with web frameworks that have better performance. In any case, you need to have a reference.

Here, we will build the simplest HTTP server using net/http and use the same testing tool and test server to test its performance and make a comparison. The source code of the HTTP service is in the file tools/httptest/main.go:

package main

import (
    "fmt"
    "log"
    "net/http"
)

func main() {
    http.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
        message := `{"status":"ok"}`
        fmt.Fprint(w, message)
    })

    addr := ":6667"
    fmt.Printf("Serving http service on %s\n", addr)
    log.Fatal(http.ListenAndServe(addr, nil))
}

We set the request path of this HTTP server to /healthz and return {"status":"ok"}, which is exactly the same as the data returned by the API Server. In this way, you can eliminate the performance difference caused by the difference in the size of the returned data.

As you can see, this HTTP server is very simple and only uses the most basic features of the net/http package. In Go, almost all web frameworks are based on the net/http package. Since they are wrappers, they certainly cannot match the performance of the original package. Therefore, we need to compare the performance of this HTTP server started directly with net/http with the performance of the API Server.

We need to perform the same wrk test and compare the results with the test results of the API Server by creating a comparison chart. The specific comparison process can be divided into 3 steps.

Step 1: Start the HTTP server.

Execute the following command in the iam source code root directory:

$ go run tools/httptest/main.go

Step 2: Execute the wrktest.sh script to test the performance of the HTTP server:

$ scripts/wrktest.sh -n http http://10.0.4.57:6667/healthz

The above command will generate the _output/wrk/http.dat file.

Step 3: Compare the two performance test results:

$ scripts/wrktest.sh diff _output/wrk/apiserver.dat _output/wrk/http.dat

The generated two comparison charts are as follows:

Chart

Chart

From the above two comparison charts, we can see that the performance of the API Server is not as good as that of the native HTTP server in terms of QPS, response time, and success rate, especially in terms of QPS, the maximum QPS is only 13.68% of the maximum QPS of the native HTTP server. The performance needs to be optimized.

API Server Performance Analysis #

Previously, we tested the performance of the API interface. If the performance is not as expected, we need to analyze the performance data and optimize it.

Before analyzing, we need to put pressure on the API Server. Under pressure, the performance of the API interface is more likely to be exposed. So continue to execute the following command:

$ scripts/wrktest.sh http://10.0.4.57:8080/healthz

During the above command execution for stress testing, you can open another Linux terminal and use the go tool pprof tool to analyze the HTTP profile file:

$ go tool pprof http://10.0.4.57:8080/debug/pprof/profile

After executing go tool pprof, because performance data needs to be collected, this command will block for 30s.

In the pprof interactive shell, execute top -cum to view the cumulative sampling time. We execute top30 -cum to observe more functions:

(pprof) top20 -cum
Showing nodes accounting for 32.12s, 39.62% of 81.07s total
Dropped 473 nodes (cum <= 0.41s)
Showing top 20 nodes out of 167
(pprof) top30 -cum
Showing nodes accounting for 11.82s, 20.32% of 58.16s total
Dropped 632 nodes (cum <= 0.29s)
Showing top 30 nodes out of 239
      flat  flat%   sum%        cum   cum%
     0.10s  0.17%  0.17%     51.59s 88.70%  net/http.(*conn).serve
     0.01s 0.017%  0.19%     42.86s 73.69%  net/http.serverHandler.ServeHTTP
     0.04s 0.069%  0.26%     42.83s 73.64%  github.com/gin-gonic/gin.(*Engine).ServeHTTP
     0.01s 0.017%  0.28%     42.67s 73.37%  github.com/gin-gonic/gin.(*Engine).handleHTTPRequest
     0.08s  0.14%  0.41%     42.59s 73.23%  github.com/gin-gonic/gin.(*Context).Next (inline)
     0.03s 0.052%  0.46%     42.58s 73.21%  .../internal/pkg/middleware.RequestID.func1
         0     0%  0.46%     41.02s 70.53%  .../internal/pkg/middleware.Context.func1
     0.01s 0.017%  0.48%     40.97s 70.44%  github.com/gin-gonic/gin.CustomRecoveryWithWriter.func1
     0.03s 0.052%  0.53%     40.95s 70.41%  .../internal/pkg/middleware.LoggerWithConfig.func1
     0.01s 0.017%  0.55%     33.46s 57.53%  .../internal/pkg/middleware.NoCache
     0.08s  0.14%  0.69%     32.58s 56.02%  github.com/tpkeeper/gin-dump.DumpWithOptions.func1
     0.03s 0.052%  0.74%     24.73s 42.52%  github.com/tpkeeper/gin-dump.FormatToBeautifulJson
     0.02s 0.034%  0.77%     22.73s 39.08%  github.com/tpkeeper/gin-dump.BeautifyJsonBytes
     0.08s  0.14%  0.91%     16.39s 28.18%  github.com/tpkeeper/gin-dump.format
     0.21s  0.36%  1.27%     16.38s 28.16%  github.com/tpkeeper/gin-dump.formatMap
     3.75s  6.45%  7.72%     13.71s 23.57%  runtime.mallocgc
     ...

Since the top30 content is too long, only some of the most time-consuming related functions are pasted here. From the above list, we can see that there are functions like ServeHTTP, which are built-in functions of gin/http. We do not need to optimize them.

There are also such functions as middleware.RequestID.func1, middleware.Context.func1, gin.CustomRecoveryWithWriter.func1, middleware.LoggerWithConfig.func1, and middleware.NoCache. These time-consuming functions are all Gin middleware that we have loaded. These middleware consume a lot of CPU time, so we can choose to selectively load these middleware, delete some unnecessary middleware to optimize the performance of the API Server. - If we temporarily do not need these middleware, we can also set server.middlewares to empty or comment it out in the configuration file of iam-apiserver, and then restart iam-apiserver. After restarting, test the performance again using wrktest.sh and compare it with the performance of the native HTTP Server. The comparison results are shown in the two pictures below:

Image

Image

As can be seen, after deleting unused Gin middleware, the performance of the API Server has improved significantly. The best performance is when the number of concurrency is 200, the QPS is 47812, the response time is 4.33 ms, and the success rate is 100.00%. When the concurrency is 50000, its QPS is 75.02% of the native HTTP Server.

API Interface Performance Reference #

Different teams have different requirements for the performance of API interfaces. The same team has different performance requirements for each API interface. So there is no unified numerical standard to measure the performance of API interfaces, but it can be certain that the higher the performance, the better. Based on my own development experience, I provide a reference value here (the concurrency can be selected as needed), as shown in the table below:

Image

Tips for API Server Performance Testing #

When performing API Server performance testing, it is important to consider the factors that can affect the performance of the API Server. There are many factors that can impact the performance of an API Server, which can be roughly divided into two categories: the performance of the web framework and the performance of the API interfaces. In addition, when conducting performance tests, it is important to ensure that the testing environment is consistent and preferably a clean testing environment.

Performance of the Web Framework #

The performance of the web framework is crucial because it will affect the performance of every API interface.

During the design phase, we will determine the web framework to be used. At this stage, we need to conduct initial testing on the web framework to ensure that the web framework we choose has good performance and stability. After the development of the entire backend service in Go is completed, but before going live, we need to test the web framework again to ensure that it still maintains excellent performance and stability according to the final usage.

Usually, we test the performance of the web framework through API interfaces, such as the health check interface /healthz. We need to ensure that this API interface is simple enough and should not contain any logic. It should just return a small symbolic content. For example, in this lecture, we test the performance of the web framework through the /healthz interface:

s.GET("/healthz", func(c *gin.Context) {
    core.WriteResponse(c, nil, map[string]string{"status": "ok"})
})

In this interface, only the core.WriteResponse function is called, and it returns {"status":"ok"}. The core.WriteResponse function is used to return the requested data instead of directly returning the string "ok". This is done to maintain the uniform format of API responses.

Performance of the API Interfaces #

In addition to testing the performance of the web framework, we may also need to test the performance of certain important API interfaces, or even all API interfaces. In order to test the performance of API interfaces in real-world scenarios, we use HTTP stress testing tools like wrk to simulate multiple API requests and analyze the performance of the APIs.

Since a large number of requests will be simulated, there may be some issues when testing write-type interfaces such as Create, Update, and Delete. For example, many data may be inserted into the database, causing the disk space to run out or the database to be overloaded. So for write-type interfaces, we can use unit tests to test their performance. Based on my development experience, write-type interfaces usually do not have performance issues, but read-type interfaces are more likely to encounter performance problems. For read-type interfaces, we can use HTTP stress testing tools like wrk to conduct tests.

Testing Environment #

When conducting performance/stress testing, in order to avoid affecting the production environment, it is important to ensure that the tests are conducted in a separate testing environment, and the network of the testing environment should not affect the network of the production environment. Moreover, in order to better compare and analyze performance, it is necessary to ensure that our testing methods and testing environment remain consistent. This requires automating performance testing and conducting tests in the same testing environment each time.

Summary #

Before the project goes live, we need to perform performance testing on the API interface. Generally, the performance latency of the API interface should be less than 500ms. If it exceeds this value, we need to consider optimizing the performance. When conducting performance testing, it is necessary to ensure a consistent testing environment for each test so that the data between different tests can be compared. In this lecture, I recommended a great performance testing tool called wrk. We can write shell scripts to automatically plot the performance testing data from wrk into graphs, making it easier for us to view and compare performance.

If the performance of the API interface does not meet our expectations, we can use the go tool pprof tool to analyze the performance. In the go tool pprof interactive interface, execute the top -cum command to view the cumulative sample time. Based on the cumulative sample time, we can determine the code that affects performance and optimize it. After optimization, perform the test again, and if it still doesn’t meet the expectations, continue analyzing the performance of the API interface. Repeat this process until the performance of the API interface meets the expectations.

After-class Exercise #

  1. Choose a project and use the wrktest.sh script to test its API interface, analyze and optimize the performance of the API interface.
  2. Think about whether there are any other good API interface performance analysis methods in your work. Feel free to share and discuss in the comments.

You are welcome to exchange and discuss with me in the comments section. See you in the next lesson.