03 Common Performance Metrics Without Quantification, There's No Improvement

03 Common Performance Metrics- Without Quantification, There’s No Improvement #

The previous lesson explained the development process of JDK and how to install a JDK. Before we formally start with the content of JVM, let’s first understand some basic concepts and principles related to performance.

0.260488235671565.png

If you were to ask what the hottest JVM knowledge is right now, many students might answer “JVM tuning” or “JVM performance optimization.” But where exactly should we start and how should we proceed?

In fact, “tuning” is a diagnostic and treatment method. Our ultimate goal is to optimize the processing capability, or “performance,” of the system. In this process, we are like doctors diagnosing and treating the “application system” as our patient. As a comparison, just like a doctor examining a patient, “performance optimization” aims to achieve the goal of “curing minor ailments and achieving optimal health.”

So when visiting a hospital, what is the process that doctors follow? They first ask for basic information and understanding, such as whether there is a fever, how long the cough has been going on, recent diet, and any symptoms like diarrhea. Then they give the patient a series of examination and testing forms: go get a blood test, a chest X-ray, a urine test, and so on. After that, doctors use various instruments and tools to conduct these examinations one by one, and the results of the examinations are specific standardized metrics (which, in this case, are the metrics we collect from JVM and transform into various indicators).

Then they bring the results to the doctor for diagnosis. The doctor determines which indicators are abnormal and which are normal. These abnormal indicators indicate certain problems (analyzing and troubleshooting system issues), for example, increased white blood cells (increased system latency and jitter, occasional crashes) could indicate inflammation (such as JVM misconfiguration). Finally, the doctor prescribes medication accordingly, such as amoxicillin or cefuroxime (adjusting JVM configuration), provides instructions on dosage and timing of medication, and even recommends hospitalization for surgery (system restructuring and adjustment). The doctor also provides some precautions (requirements and suggestions for daily operation and maintenance). After a period of treatment, the patient gradually recovers and ultimately achieves a cure (reduced system latency, no more jitter, and no more crashes). By understanding JVM, we acquire the ability to analyze and diagnose, which is the core theme of this course.

0.7784482211178771.png

“No quantification, no improvement.” Therefore, we need to first understand and measure performance metrics, just like the laboratory test reports we receive after medical check-ups. Because subjective feelings are unreliable, and personal experience itself cannot be replicated. However, once quantitative metrics are defined, it means that we have an objective measurement system. Even if the metrics we initially define are not particularly accurate, we can verify the effectiveness of the metrics during usage by following real scenarios, and then replace or adjust the metrics to gradually improve this quantitative metric system so that it becomes an effective tool that can be replicated and reused. Just like the “complete blood count” report shown in the figure above, once it becomes a standardized metric, the results obtained using it, which is the report, are valid when viewed by any doctor and generally lead to consistent judgments.

So what metrics are needed for diagnosing system performance? Let’s first consider what needs to be done for diagnosis. When investigating problems that occur during program or JVM execution, such as debugging program bugs, the emphasis is on ensuring correctness. This means it’s not just about problems with the JVM itself, but also analyzing the business code and logic to determine where the Java program is experiencing issues.

Analyze system performance issues: For example, determine if the expected performance indicators have been met, check for resource-level issues, JVM-level issues, and any issues with critical processing workflows or business processes that may need optimization.
Collect system status, logs, and internal performance metrics using tools. This includes monitoring and obtaining key performance indicator data, as well as conducting stress tests to gather related performance analysis data.
Based on the analysis results and performance metrics, make resource configuration adjustments and continue monitoring and analysis to optimize performance until system requirements are met and the system achieves its optimal performance state.

In computer systems, performance-related resources can be broadly classified into the following categories:

CPU: The CPU is the most critical computational resource in the system. It is limited in terms of capacity and can easily become a bottleneck due to inefficient processing of business logic. Wasting CPU resources or excessive CPU consumption is not ideal, so relevant metrics need to be monitored.
Memory: Memory corresponds to the fast temporary storage space for directly usable data during program execution. It is also limited. The process of allocating and releasing memory over time, as handled by the JVM’s garbage collector (GC), can result in various issues if the GC configuration is not optimal. These issues may include out-of-memory (OOM) crashes. Therefore, memory metrics also need attention.
IO (Storage + Networking): After the CPU performs business logic calculations in memory, the results need to be persisted by storing them on disk. Additionally, if the system operates in a distributed or multi-machine environment and provides network services, many functions may require direct use of the network. IO operations in these areas would be slower compared to CPU and memory operations, thus making them a focus of attention.

Other more specific metrics will be discussed in detail in the sections on tool and command usage.

2.2 Common Approaches to Performance Optimization #

In performance optimization, there are usually bottleneck issues, and these bottlenecks generally follow the 80/20 rule. This means that if we list all the factors in the entire processing flow that are relatively slow and rank them according to their impact on performance, the top 20% of the bottleneck issues will account for at least 80% of the performance impact. In other words, if we first solve the most important issues, the performance will improve significantly.

Typically, we start by checking if basic resources are the bottleneck. We evaluate whether the resources are sufficient and, if the cost allows, adding more resources may be the fastest solution and may also be the most cost-effective and efficient solution. The system resources associated with the JVM primarily include CPU and memory. If there are resource alerts/insufficiencies, system capacity needs to be evaluated and analyzed.

As for resources like GPUs, motherboards, and chipsets, it is difficult to measure their impact since they are rarely involved in general-purpose computing systems. There are three dimensions that generally measure system performance:

Latency: Usually measured by response time, such as average response time. However, sometimes there is significant response time jitter, meaning that the response time for some users is particularly high. In this case, we generally assume that we need to ensure that 95% of users have an acceptable response time, in order to provide a good user experience for the majority of users. This is called the 95th percentile of latency (P95, the time at which 95 out of 100 user requests have been responded to). Similarly, there is also the 99th percentile, maximum response time, etc. (the 95th and 99th percentiles are more commonly used; when there is a large amount of user traffic, any network jitter can cause the maximum response time to become very large, making this metric uncontrollable and generally not used).
Throughput: For transactional systems, we generally use the number of transactions processed per second (TPS) to measure throughput. For query and search systems, we can also use the number of requests processed per second (QPS).
Capacity: Also known as design capacity, this can be understood as hardware configuration and cost constraints.

These three dimensions are mutually related and mutually constraining. As long as the system architecture allows, increasing the hardware configuration generally improves performance indicators. However, with the failure of Moore’s Law, increasing hardware configuration to a certain extent does not provide linear performance scaling. For example, doubling the number of CPU cores or frequency and memory on machines with already high configurations does not necessarily result in a doubling of performance improvement. On the other hand, the cost increases by more than double and the cost-effectiveness rapidly declines. Moreover, there is a limit to how much can be added. As the leading cloud service provider, AWS has only started offering machines with 256 cores this year, while Alibaba Cloud currently supports a maximum of 104 cores. Therefore, at present, the most cost-effective choice is to use distributed solutions overall and analyze and optimize each system locally.

Performance metrics can also be divided into two categories:

Business performance metrics: such as throughput (QPS, TPS), response time (RT), concurrent users, business success rate, etc.
Resource constraint metrics: such as the consumption of CPU, memory, I/O, etc.

Each type of system focuses on different aspects. Batch/stream processing systems focus more on throughput, while latency can be relaxed to some extent. Generally speaking, the hardware resources of most systems are not too poor, but they are not unlimited either. High availability web systems focus on both system response time under high concurrency and throughput.

For example, “Configure a 2-core 4GB node, respond to 200 requests per second, the 95th percentile latency is 20ms, and the maximum response time is 40ms.” From this, we can interpret the basic performance information: response time (RT<40ms), throughput (200 TPS), and system configuration information (2C4G). The implicit condition may be “the concurrent number of requests does not exceed 200”.

The means and methods we can use include:

Using JDWP or development tools for local/remote debugging
System and JVM state monitoring, collecting and analyzing metrics
Performance analysis: CPU usage analysis/memory allocation analysis
Memory analysis: Dump analysis/GC log analysis
Adjusting JVM startup parameters, GC strategies, etc.

2.3 Summary of Performance Tuning #

The first step in performance tuning is to set metrics, collect data, and the second step is to identify bottlenecks, and then analyze and solve bottleneck problems. Through these methods, we can find the current performance limits. The TPS and QPS at which load testing and tuning cannot be further optimized are the limits. Once we know the limits, we can estimate traffic and system pressure based on business development, make capacity plans, prepare machine resources, and anticipate scaling plans. Finally, during the daily operation of the system, we continue to observe and gradually redo and adjust the above steps to continuously improve system performance.

We often say “talking about performance without considering the scenario is a waste of time”. In the actual process of performance analysis and tuning, we need to consider the specific business scenario and comprehensively consider the cost and performance, and choose the most suitable method to handle it. If the performance optimization of a system reaches 3000 TPS and can meet the business development needs within an affordable cost, then optimizing it to 3100 TPS is meaningless. Similarly, it is also meaningless to spend double the cost to optimize it to 5000 TPS.

Donald Knuth once said, “Premature optimization is the root of all evil.” We need to consider optimizing the system at the right time. In the early stage of business development, when the volume is small, performance is not so important. When we are developing a new system, we first consider whether the overall design is okay and whether the functionality is implemented correctly. After the basic functionality is almost done (of course, the overall framework needs to meet the performance benchmarks, which may need to be verified during the project preparation stage through a proof of concept (POC) phase), we then focus on performance optimization. Because if we consider optimization from the beginning, we may overthink and lead to overdesign. Moreover, there may be significant changes before the main framework and functionality are completed, and if optimization is done in advance, these changes may render the original optimization ineffective and require re-optimization, resulting in a lot of unnecessary work.