29 the Most Prone to Mistake Performance Tests You Need the Industry Benchmark Tool 'Wrk'

29 The Most Prone to Mistake Performance Tests- You Need the Industry Benchmark Tool ‘wrk’ #

Hello, I’m Wen Ming.

In the last section of the testing chapter, let’s talk about performance testing. This part is not unique to OpenResty, but equally applicable to other backend services.

Performance testing is very common. When we deliver a product, there are usually performance requirements, such as achieving a certain number of QPS (Queries Per Second) or TPS (Transactions Per Second), keeping the latency below a certain number of milliseconds, and supporting a certain number of concurrent user connections. For open source projects, we also do performance testing before releasing a new version, comparing it with the previous version to see if there is any obvious degradation in performance. There are also some neutral websites that publish performance comparison data for similar products. Performance testing is really close to us.

In my over ten years of work, I have done many performance tests for different products and encountered quite a few pitfalls along the way. Later, I gradually realized that while performance testing can be simple to do, doing it right is not easy. In fact, many performance test results are inaccurate.

So, how do we conduct a scientific and rigorous performance test? Let me tell you about it in today’s lesson.

Performance Testing Tools #

To do a good job, one must first have the right tools. Choosing a handy performance testing tool is half the battle to success.

You should be familiar with ab, the Apache Benchmark tool, which can be said to be the simplest performance testing tool. However, unfortunately, it is not very useful. This is because most current server-side applications are developed based on coroutines and asynchronous I/O, so their performance is not bad. ab cannot take advantage of multiple cores on the machine, so the generated request pressure is not large enough. In this case, the results obtained from ab testing are not accurate and become a performance test of ab itself instead.

Therefore, we can clearly define a standard for choosing a load testing tool, which is: the tool itself has strong performance and can generate enough pressure to overwhelm the server-side program.

Of course, you can also be extravagant and start many load testing clients and turn them into a distributed load testing system. This is naturally feasible, but don’t forget that the complexity also increases accordingly.

Returning to the practice of OpenResty, we recommend using the performance testing tool wrk. Let’s talk about why we choose it.

First of all, wrk meets the criteria for tool selection. The pressure generated by a single wrk instance can easily max out the CPU of Nginx, not to mention other server-side programs.

Secondly, wrk has many similarities with OpenResty. wrk is not an open-source project written from scratch either. It stands on the shoulders of two giants, LuaJIT and Redis, and fully utilizes the multi-core resources of the system to generate requests. In addition, wrk also exposes a Lua API, allowing you to embed your own Lua scripts to customize the headers and content of requests, making it very flexible to use.

So how do you use wrk? It’s also very simple, take a look at the following code snippet:

wrk -t12 -c400 -d30s http://127.0.0.1:8080/index.html

This means that wrk will use 12 threads, maintain 400 long-lived connections, and send HTTP requests to the specified API interface for 30 seconds. Of course, if you don’t specify any parameters, wrk will start with 2 threads and 10 long-lived connections by default.

Testing Environment #

After finding the testing tools, we can’t start the stress test directly. We also need to check the testing environment. There are four main items that need to be checked in the testing environment, and I will explain them in detail below.

Check Item 1: Disable SELinux #

If you are using the CentOS/RedHat operating system, it is recommended to disable SELinux to avoid encountering strange permission issues.

We can use the following command to check if SELinux is enabled:

$ sestatus
SELinux status: disabled

If it shows that SELinux is enabled (enforcing), you can temporarily disable it by using $ setenforce 0. You can also permanently disable it by modifying the /etc/selinux/config file and changing SELINUX=enforcing to SELINUX=disabled.

Check Item 2: Maximum File Open Limit #

Next, you need to check the global maximum file open limit of the current system using the following command:

$ cat /proc/sys/fs/file-nr
3984 0 3255296

The last number here is the maximum file open limit. If this number is relatively small in your machine, you need to increase it by modifying the /etc/sysctl.conf file:

fs.file-max = 1020000
net.ipv4.ip_conntrack_max = 1020000
net.ipv4.netfilter.ip_conntrack_max = 1020000

After making the changes, you need to restart the system service to take effect:

sudo sysctl -p /etc/sysctl.conf

Check Item 3: Process Limit #

In addition to the global maximum file open limit of the system, there is also a limit on the number of files a process can open. You can use the ulimit command to check it:

$ ulimit -n
1024

You will find that this value is 1024 by default, which is a very low value. Since each user request corresponds to a file handle, and stress testing generates a large number of requests, we need to increase this value to the level of millions. You can temporarily modify it using the following command:

$ ulimit -n 1024000

Or you can permanently apply the changes by modifying the configuration file /etc/security/limits.conf:

* hard nofile 1024000
* soft nofile 1024000

Check Item 4: Nginx Configuration #

Finally, you need to make a small modification to the configuration of Nginx, which is the operation of these two lines of code below:

events {
    worker_connections 10240;
}

This way, we can increase the number of connections for each worker. The default value is only 512, which is obviously not enough for heavy stress testing.

Pre-testing check #

Up to this point, the test environment is ready. I’m sure there are people eager to start testing, right? Hold on, before using wrk to initiate the test, let’s do a final check. After all, people make mistakes, and it is very important to perform a cross-test from a different perspective.

This final check can be divided into two steps.

Step one, use the automated tool `c1000k`. #

It comes from the author of SSDB: https://github.com/ideawu/c1000k. From the name itself, you can see that the purpose of this tool is to check if your environment can meet the requirement of handling 1 million concurrent connections.

The usage of this tool is also simple. We start a server and a client respectively, which correspond to the server program listening on port 7000 and the client program that initiates the stress test. The goal is to simulate the stress test in a real environment:

./server 7000
./client 127.0.0.1 7000

Next, the client will send requests to the server to check if the current system environment can support 1 million concurrent connections. You can try running it yourself and see the results.

Step two, check if the server program is running properly. #

If the server program is not functioning properly, then the stress test might become an error log refresh test or a 404 response test.

Therefore, the last and most important step in testing environment check is to run the unit test suite of the server, or manually call several key APIs to ensure that all the interfaces tested by wrk, the returned content, and the HTTP response codes are all normal, and that no error-level messages appear in logs/error.log.

Sending Requests #

Alright, everything is ready, we just need the east wind now. Let’s start the stress testing using wrk!

$ wrk -d 30 http://127.0.0.2:9080/hello
Running 30s test @ http://127.0.0.2:9080/hello
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   595.39us  178.51us  22.24ms   90.63%
    Req/Sec     8.33k   642.91     9.46k    59.80%
  499149 requests in 30.10s, 124.22MB read
Requests/sec:  16582.76
Transfer/sec:      4.13MB

Here, I didn’t specify any parameters, so wrk will start with 2 threads and 10 connections by default. In fact, you don’t need to adjust the number of threads and connections in wrk too much, as long as you can fully utilize the CPU of the target program.

However, the duration of the stress testing should not be too short. A few seconds of testing is meaningless, as it is likely that the server program has not finished loading hot data before the testing ends. Meanwhile, during the stress testing, you need to use monitoring tools like top or htop to confirm whether the target server program is fully utilizing the CPU.

From an observable perspective, if the CPU is fully loaded and the CPU and memory usage drop rapidly after the stress testing stops, then congratulations, this stress testing is successful. However, if you encounter the following exceptions, as a server developer, you need to pay special attention:

The CPU is not fully loaded. This is not a wrk issue, it may be a network restriction, or more likely, there are blocking operations in your code. You can review the code to determine this, or use an off-CPU flame graph.
The CPU is constantly fully loaded even after the stress testing stops. This indicates the presence of a hot loop in the code, possibly caused by regular expressions, or a bug in LuaJIT. I have encountered both of these issues in real environments. In this case, you need to use an on-CPU flame graph for further investigation.

Lastly, let’s take a look at the statistics results of wrk. Regarding this result, we usually pay attention to two values:

The first one is QPS, which is Requests/sec: 16582.76. This data is straightforward and represents how many requests the server processed per second.

The second one is the latency, Latency 595.39us 178.51us 22.24ms 90.63%. This data is equally important as QPS, as it reflects the system’s response speed. For example, for gateway applications, we hope to keep the latency within 1 millisecond.

Additionally, wrk also provides the latency parameter, which can print out the distribution of latency percentages in detail, like this:

Latency Distribution
     50%  134.00us
     75%  180.00us
     90%  247.00us
     99%  552.00us

However, the latency distribution data of wrk is not accurate because it artificially introduces disturbance from the network and tools, exaggerating the latency. This is something you need to pay attention to. You can refer to this article I wrote before to learn more about wrk latency distribution.

Conclusion #

Performance testing is a technical job, and there are not many people who can do it right and well. I hope that today’s lesson can give you a more comprehensive understanding of performance testing.

Finally, I have a homework question for you: wrk supports custom Lua scripts for stress testing. So, can you write a simple Lua script based on its documentation? This may be a bit challenging, but by completing it, you will definitely have a deeper understanding of the purpose of wrk’s exposed interface.

Feel free to leave a comment with your answer and thoughts. You are also welcome to share this article with more people as we progress together.