05 Tool Implementation- Precision Testing Methodology for Performance with JMH #

In the previous lesson, we learned that some external tools can be used to obtain system performance data.

However, sometimes we want to measure the performance of specific code segments. In such cases, we often write code to measure the execution time, which is interspersed in our logic and performs simple timing calculations. For example, the following lines of code:

long start = System.currentTimeMillis(); 
//logic 
long cost = System.currentTimeMillis() - start; 
System.out.println("Logic cost : " + cost);

Unfortunately, the results of this code may not be accurate. For example, when the JVM executes, it performs JIT compilation and inlining optimization on some code blocks or frequently executed logic. Before obtaining a stable test result, it needs to be looped thousands of times for warm-up. There can be a significant difference in performance between before and after warm-up.

In addition, as we learned in Lesson 01, there are many metrics for evaluating performance. If these metric data need to be manually calculated each time, it would be tedious, boring, and inefficient.

JMH - Benchmark Testing Tool #

JMH (the Java Microbenchmark Harness) is a tool that can perform benchmark testing. If you have identified the hot code using the series of external tools introduced in Lesson 04 and want to test its performance data and evaluate the improvement status, you can rely on JMH. Its measurement accuracy is very high, reaching the nanosecond level.

JMH is already included in JDK 12, while other versions need to be imported via Maven. The coordinates are as follows:

<dependencies> 
    <dependency> 
        <groupId>org.openjdk.jmh</groupId> 
        <artifactId>jmh-core</artifactId> 
        <version>1.23</version> 
    </dependency> 
    <dependency> 
        <groupId>org.openjdk.jmh</groupId> 
        <artifactId>jmh-generator-annprocess</artifactId> 
        <version>1.23</version> 
        <scope>provided</scope> 
    </dependency> 
</dependencies>

Next, let’s introduce how to use this tool.

JMH is a JAR package that is very similar to the unit testing framework JUnit. It can perform some basic configurations through annotations. Many of these configurations can be set using the OptionsBuilder in the main method.

The above figure shows the execution content of a typical JMH program. By enabling multiple processes and multiple threads, it first performs warm-up, then executes iterations, and finally summarizes all the test data for analysis. Before and after execution, some pre-processing and post-processing operations can also be performed based on granularity.

A piece of simple JMH code is shown below:

@BenchmarkMode(Mode.Throughput) 
@OutputTimeUnit(TimeUnit.MILLISECONDS) 
@State(Scope.Thread) 
@Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS) 
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS) 
@Fork(1) 
@Threads(2) 
public class BenchmarkTest { 
    @Benchmark 
    public long shift() { 
        long t = 455565655225562L; 
        long a = 0; 
        for (int i = 0; i < 1000; i++) { 
            a = t >> 30; 
        } 
        return a; 
    }  
    @Benchmark 
    public long div() { 
        long t = 455565655225562L; 
        long a = 0; 
        for (int i = 0; i < 1000; i++) { 
            a = t / 1024 / 1024 / 1024; 
        } 
        return a; 
    }  
    public static void main(String[] args) throws Exception { 
        Options opts = new OptionsBuilder() 
                .include(BenchmarkTest.class.getSimpleName()) 
                .resultFormat(ResultFormatType.JSON) 
                .build(); 
        new Runner(opts).run(); 
    } 
}

Next, let’s introduce the key annotations and parameters one by one.

Key Annotations #

1. @Warmup #

Here is an example:

@Warmup( 
iterations = 5, 
time = 1, 
timeUnit = TimeUnit.SECONDS)

We have mentioned the @Warmup annotation multiple times, which can be used on classes or methods for configuring warmup. As you can see, it has several configuration parameters:

timeUnit: the unit of time, with the default being seconds.
iterations: the number of iterations during the warmup phase.
time: the time duration for each warmup.
batchSize: the batch size, specifying how many times the method is called for each operation.

In the above annotation, it means that the code is warmed up for a total of 5 seconds (with 5 iterations, each lasting 1 second). During the warmup process, the measurement results are not recorded.

Let’s take a look at its execution effect:

# Warmup: 3 iterations, 1 s each 
# Warmup Iteration   1: 0.281 ops/ns 
# Warmup Iteration   2: 0.376 ops/ns 
# Warmup Iteration   3: 0.483 ops/ns

Generally, benchmark tests are conducted on relatively small code blocks that execute at a relatively fast speed. These code blocks are likely to be JIT compiled and inlined. Therefore, it is a good practice to keep methods concise when coding. We will discuss the specific optimization process in Lesson 18.

Speaking of warmup, we cannot ignore the warmup process for services in distributed environments. When releasing service nodes, there is usually a warmup process where the service is gradually scaled up to the corresponding nodes until the service reaches its optimal state. As shown in the following diagram, load balancing is responsible for this scaling process, usually based on percentages.

2. @Measurement #

An example is shown below:

@Measurement( 
iterations = 5,
time = 1,
timeUnit = TimeUnit.SECONDS)

The parameters for Measurement are the same as for Warmup. Unlike warmup, it refers to the actual number of iterations.

We can see this execution process from the log:

# Measurement: 5 iterations, 1 s each 
Iteration   1: 1646.000 ns/op 
Iteration   2: 1243.000 ns/op 
Iteration   3: 1273.000 ns/op 
Iteration   4: 1395.000 ns/op 
Iteration   5: 1423.000 ns/op

Although the code can demonstrate its optimal state after warmup, there are still some differences from the actual application scenarios in general. If your test machine has high performance or your test machine’s resource utilization has reached its limit, it will affect the numerical results of the test.

Therefore, in most cases, I will provide sufficient resources to the machine during testing to maintain a stable environment. When analyzing the results, I will also pay more attention to the performance differences between different code implementations, rather than the test data itself.

3. @BenchmarkMode #

This annotation is used to specify the type of benchmark test, corresponding to the options in Mode. It can be applied to both classes and methods. The value here is an array, which can configure multiple statistical dimensions. For example:

@BenchmarkMode({Throughput,Mode.AverageTime}) will measure the throughput and average execution time.

The so-called modes are actually the metrics we mentioned in lesson 01. In JMH, they can be divided into the following types:

Throughput: Overall throughput, such as QPS, the number of calls per unit of time, etc.
AverageTime: Average execution time, which refers to the average time of each execution. If this value is too small to be recognized, you can reduce the unit time of statistics.
SampleTime: Random sampling, which is the same concept as the TP value we talked about in lesson 1.
SingleShotTime: If you want to test the performance of only one execution, such as how long it takes for initialization, you can use this parameter, which is actually no different from the traditional main method.
All: All metrics are calculated. You can set it to this parameter to see the effect.

Let’s take the average time and see a rough execution result:

Result "com.github.xjjdog.tuning.BenchmarkTest.shift": 
  2.068 ±(99.9%) 0.038 ns/op [Average] 
  (min, avg, max) = (2.059, 2.068, 2.083), stdev = 0.010 
  CI (99.9%): [2.030, 2.106] (assumes normal distribution)

Since we declared the unit of time as nanoseconds, the average response time of the shift method in this test is 2.068 nanoseconds.

We can also see the final elapsed time:

Benchmark            Mode  Cnt  Score   Error  Units 
BenchmarkTest.div    avgt    5  2.072 ± 0.053  ns/op 
BenchmarkTest.shift  avgt    5  2.068 ± 0.038  ns/op

Since this is an average, the Error value here means the amount of deviation or fluctuation.

As you can see, when measuring these metrics, there is a time dimension, which is configured through the @OutputTimeUnit annotation.

This is relatively simple. It indicates the time type of the benchmark test results. It can be used for both classes and methods and generally choose seconds, milliseconds, microseconds. Nanoseconds are specifically for very fast methods.

For example, the combination of @BenchmarkMode(Mode.Throughput) and @OutputTimeUnit(TimeUnit.MILLISECONDS) represents the throughput per millisecond.

In the following example of throughput results, the calculations are in milliseconds:

Benchmark             Mode  Cnt       Score       Error   Units 
BenchmarkTest.div    thrpt    5  482999.685 ±  6415.832  ops/ms 
BenchmarkTest.shift  thrpt    5  480599.263 ± 20752.609  ops/ms

The OutputTimeUnit annotation can also be applied to classes or methods. By changing the time level, you can get more readable results.

4. @Fork #

The value of fork is usually set to 1, which means that only one process is used for testing. If this number is greater than 1, new processes will be used for testing. However, if it is set to 0, the program will still run, but it will run on the user’s JVM process. You can see the prompt below, but it is not recommended to do so.

# Fork: N/A, test runs in the host VM
# *** WARNING: Non-forked runs may silently omit JVM options, mess up profilers, disable compiler hints, etc. ***
# *** WARNING: Use non-forked runs only for debugging purposes, not for actual performance runs. ***

So, does fork run in a process or thread environment?

By tracing the source code of JMH, we can see that each forked process runs independently in a separate process, which allows for complete environment isolation and avoids cross-interference.

Its input and output streams are sent to our execution terminal through socket connections.

Here’s a little tip. In fact, the fork annotation has a parameter called jvmArgsAppend, which allows us to pass some JVM parameters through it.

@Fork(value = 3, jvmArgsAppend = {"-Xmx2048m", "-server", "-XX:+AggressiveOpts"})

In regular testing, you can also increase the value of fork appropriately to reduce test errors.

5. @Threads #

fork is process-oriented, while Threads is thread-oriented. After specifying this annotation, parallel testing will be enabled. If Threads.MAX is configured, the same number of threads as the number of processor cores will be used.

This is consistent with our usual coding habits. It does not mean that the more threads we create, the better. Having too many threads will cause the operating system to spend more time on context switching, resulting in a decrease in overall performance.

6. @Group #

The @Group annotation can only be added to methods and is used to categorize test methods. If you have a large number of methods in a single test file or if you need to categorize them, you can use this annotation.

The @GroupThreads annotation, associated with @Group, allows for additional thread-related settings based on the categorization. These two annotations are rarely used unless in very large performance testing scenarios.

7. @State #

The @State annotation specifies the scope of a variable within a class. It is used to declare a class as a “state” and can be used with the Scope parameter to indicate the shared scope of the state. This annotation must be used on a class, otherwise it will prompt an error.

There are three possible values for Scope.

Benchmark: Indicates that the variable’s scope is within a benchmark test class.
Thread: Each thread has its own copy of the variable. If the Threads annotation is configured, each thread will have its own copy of the variable which will not affect each other.
Group: Associated with the @Group annotation, variables within the same group will share the same instance.

In the JMHSample04DefaultState test file, the default scope of the variable x is demonstrated to be Thread. The relevant code is as follows:

@State(Scope.Thread)
public class JMHSample_04_DefaultState {
    double x = Math.PI;
    @Benchmark
    public void measure() {
        x++;
    }
}

8. @Setup and @TearDown #

Similar to the unit testing framework JUnit, @Setup is used for initialization actions before benchmarking, and @TearDown is used for actions after benchmarking to do some global configuration.

These two annotations also have a Level value, indicating the timing of method execution, and it has three possible values.

Trial: The default level, which is the Benchmark level.
Iteration: It will be executed for each iteration.
Invocation: It will be executed for each method invocation, which is the finest granularity.

If your initialization operation is related to the method, it is best to use the Invocation level. But in most cases, it is for some global resources, such as a Spring DAO, then you can use the default Trial level, which only needs to be initialized once.

9. @Param #

The @Param annotation can only be used to annotate fields. It is used to test different parameters and their impact on program performance. It can be combined with the @State annotation to specify the execution scope of these parameters.

Here is an example code:

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
@State(Scope.Benchmark)
public class JMHSample_27_Params {
    @Param({"1", "31", "65", "101", "103"})
    public int arg;
    @Param({"0", "1", "2", "4", "8", "16", "32"})
    public int certainty;
    
    @Benchmark
    public boolean bench() {
        return BigInteger.valueOf(arg).isProbablePrime(certainty);
    }
    
    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
            .include(JMHSample_27_Params.class.getSimpleName())
//          .param("arg", "41", "42") // Use this to selectively constrain/override parameters
            .build();
        new Runner(opt).run();
    }
}

Note that if you set a large number of parameters, these parameters will be executed multiple times, often resulting in a long runtime. For example, if there are M parameters for parameter 1 and N parameters for parameter 2, a total of M*N executions will be performed.

Here is a screenshot of the execution result:

Drawing 3.png

10. @CompilerControl #

This can be said to be a very useful feature.

In Java, method invocation is relatively expensive, especially when there are a large number of invocations. Take simple getter/setter methods as an example, which are widely used in Java code. When we access these methods, we need to create corresponding stack frames. After accessing the required fields, we need to pop the stack frames and resume the execution of the original program.

If the access and operations of these objects can be included within the scope of the target method’s invocation, one less method call is needed, and the speed can be improved. This is the concept of method inlining. As shown in the following figure, the efficiency will be greatly improved after the code has been JIT compiled.

This annotation can be used on classes or methods to control the compilation behavior of methods. There are typically three modes that are commonly used:

Force inline (INLINE)
Disable inline (DONT_INLINE)
Even disable method compilation (EXCLUDE)

Visualizing the result #

The results of the JMH tests can be further processed and visualized, making it more intuitive. By specifying the output format file at runtime, you can obtain performance test results in the desired format.

For example, the following code specifies that the output should be in JSON format:

Options opt = new OptionsBuilder() 
    .resultFormat(ResultFormatType.JSON) 
    .build();

1. JMH supports 5 types of result formats #

TEXT exports the result to a text file.
CSV exports the result to a CSV file.
SCSV exports the result to an SCSV file or similar formats.
JSON exports the result to a JSON file.
LATEX exports the result to LaTeX, a typesetting system based on ΤΕΧ.

In general, we export the result to a CSV file and directly operate on it in Excel to generate the corresponding chart as shown below.

Drawing 5.png

2. Graphical Plotting Tools for Results #

JMH Visualizer

Here is an open-source project called JMH Visualizer. By exporting a JSON file and uploading it to JMH Visualizer (click the link to go to the website), you can obtain simple statistical results. Personally, I don’t think its display method is very good as many operations require hovering the mouse over the graph.

JMH Visual Chart

In comparison, JMH Visual Chart (click the link to go to the website) is a more intuitive tool.

Drawing 6.png

meta-chart

A general online chart generator (click the link to go to the website), Meta-Chart allows exporting CSV files that can be further processed to produce beautiful graphics.

Drawing 7.png

Some continuous integration tools like Jenkins also provide plugins to directly display these test results.

Summary #

This section mainly introduced the benchmarking tool JMH. The official JMH offers a wide range of examples, including advanced topics like the impact of false sharing. I have uploaded it to Gitee (click the link to visit the repository) where you can import it into the IntelliJ IDEA editor for testing.

JMH is a very useful tool as it allows us to use precise test data to support our analysis results. In general, if we identify hotspot code, we need to use benchmarking tools for targeted optimizations until there is a significant improvement in performance.

The next lessons will cover the verification of performance issue details and further analysis using JMH.