01 Theoretical Analysis- What Balance Indicators to Pay Attention to in Performance Optimization #

This lesson mainly introduces the metrics for performance optimization, as well as their theoretical methods and considerations.

Metrics are important references for measuring many things and making behavioral decisions. For example, in life, when you plan to buy a car, you will pay attention to many metrics such as power, fuel economy, braking performance, handling stability, smoothness, passability, emissions, and noise. These metrics also have corresponding tests and parameters, and they will be considered one by one.

We all understand this principle, but when it comes to performance optimization, we often choose the wrong direction because of the lack of theoretical basis, and fall into the dilemma of blind guessing. When measuring whether an optimization can achieve its goals, it cannot rely solely on intuition. It also has a series of metrics to measure your improvements. If the performance does not improve but worsens after modification, it cannot be called performance optimization.

Performance means completing work with limited resources in limited time. The main measure is time, so many metrics can use time as the horizontal axis.

“Slow-loading websites will be punished by search ranking algorithms, leading to a decline in website ranking.” Therefore, the speed of loading is a very intuitive criterion for judging whether performance optimization is reasonable. However, performance metrics not only include the speed of single requests, but also more factors.

Next, let’s take a look at what metrics can help us make decisions.

What are the metrics? #

1. Throughput and Response Time #

In distributed high-concurrency applications, a single request is not the sole basis for evaluation; it is often a statistical result. The most commonly used metrics for performance evaluation are throughput and response time, both of which are critical concepts to consider. To understand the significance of these two metrics, we can liken them to intersections in traffic environments.

In highly congested traffic conditions, intersections are typical bottleneck points, where long traffic light durations often result in long queues forming behind.

The time it takes for a car to pass through the intersection, from the moment we start queuing to the moment we pass the traffic light, is the response time.

Of course, we can adjust the interval of the traffic lights so that for certain vehicles, the time it takes to pass may be shorter. However, if the signal lights change too frequently, it may actually reduce the number of vehicles passing through within a unit of time. From another perspective, we could also say that the throughput of this intersection has decreased.

As often mentioned in our development work, QPS represents the number of queries per second, TPS represents the number of transactions per second, HPS represents the number of HTTP requests per second, and so on–these are all commonly used quantitative indicators related to throughput.

When optimizing performance, we need to clarify the goal of optimization–whether it is throughput or response time. Sometimes, although the response time is relatively slow, the overall throughput is still very high. For example, with batch operations on databases or merging of buffers. Although there is increased information latency, if our goal is throughput, this can still be considered a significant performance improvement.

In general, we believe that:

Response time is optimized by serial execution, solving problems by optimizing execution steps.
Throughput is optimized by parallel execution, reaching the goal by making efficient use of computing resources.

Typically, our optimization efforts primarily focus on response time because once the response time improves, the overall throughput naturally increases as well.

However, for high-concurrency internet applications, both response time and throughput are required. These applications are advertised as high-throughput, high-concurrency scenarios, and users have low tolerance for system delays. We need to find a balance using limited hardware resources.

2. Measuring Response Time #

Since response time is so important, let’s take a closer look at how to measure it.

(1) Average Response Time

The most commonly used metric is the Average Response Time (AVG), which reflects the average processing capability of the service interface. It is essentially the sum of all request durations divided by the number of requests. For example, if there are 10 requests with 2 taking 1ms, 3 taking 5ms, and 5 taking 10ms, then the average duration is (21+35+5*10)/10=6.7ms.

Unless there is a serious issue with the service over a certain period of time, the average response time will usually remain relatively consistent. Because high-concurrency applications typically have a large number of requests, the impact of long-tail requests will quickly average out, resulting in slower responses for many users. However, this will not be reflected in the average duration metric.

To address this issue, another commonly used metric is the Percentile.

(2) Percentile

This is also easy to understand. We define a time range, collect the durations of each request into a list, and then sort these durations in ascending order. By taking out the duration at a specific percentile, we get the TP value. As you can see, the TP value (Top Percentile) is similar to the median and average, and is a term used in statistics.

Its significance lies in the fact that more than N% of the requests are returned within X time. For example, TP90 = 50ms means that more than 90% of the requests are returned within 50ms.

This metric is also very important as it reflects the overall response situation of the application interface. For example, if there is a long GC (Garbage Collection) during a certain period of time, the metrics above a certain time period will show severe fluctuations, but the values of lower percentiles will rarely change.

We usually divide the percentiles into TP50, TP90, TP95, TP99, TP99.9, etc. The higher the percentile value requirement, the higher the stability requirement for the system’s response capabilities.

In high-stability systems like these, the goal is to eliminate long-tail requests that significantly impact the system. For the collection of performance data for these interfaces, we will use more detailed logging methods rather than just relying on metrics. For example, we will output the input parameters and execution steps of an interface that takes more than 1s to a log system in detail.

3. Concurrency #

Concurrency refers to the number of requests that a system can handle simultaneously, and this metric reflects the system’s load capacity.

In high-concurrency applications, high throughput alone is not enough; it must also be able to provide services to multiple users simultaneously. High concurrency can lead to serious contention issues for shared resources, so we need to reduce resource conflicts and behaviors that occupy resources for a long time.

Designing with response time in mind is generally effective. This is because reducing response time will inevitably increase the number of requests that can be processed at the same time. It is worth noting that even in a flash sale system that has undergone multiple layers of filtering and processing, the concurrency arriving at a particular node is typically around fifty or sixty. In our usual design, unless the concurrency is particularly low, we don’t need to pay excessive attention to this metric.

4. Page load time #

In the era of mobile internet, especially for the pages in apps, achieving a fast load time is crucial for providing optimal user experience. If a page can be fully loaded within 1 second, users can enjoy a smooth experience without feeling anxious or impatient.

Generally, different page load standards can be set according to business needs, such as the percentage of pages that load within 1 second as the measure of page load time. Outstanding companies in the industry, such as Taobao, are able to achieve a page load time of over 80% or higher.

5. Accuracy #

Let me tell you an interesting story. We had a technical team that was conducting tests and found that the API responses were very smooth. Even after increasing the concurrency to 20, the application’s API responses remained very fast.

However, when the application was launched, a major incident occurred because the API responses returned unusable data.

The cause of the problem was easily identified. The project had used circuit breaking. During load testing, the API exceeded the service capacity and triggered the circuit breaker. But the load testing did not check the accuracy of the API responses, resulting in a very basic mistake.

Therefore, when conducting performance evaluations, do not forget the key element of accuracy.

What are some theoretical methods? #

There are many theoretical methods for performance optimization, such as the bucket theory, baseline testing, Amdahl’s Law, etc. Below, we will briefly explain the two most commonly used theories.

1. Barrel Theory #

To maximize the amount of water a barrel can hold, each plank of wood must be of the same length and undamaged. If any plank fails to meet these conditions, the barrel cannot hold the maximum amount of water.

The amount of water that can be held depends on the shortest plank of wood, not the longest.

The barrel theory is also very suitable for explaining system performance. The components that make up a system have varying speeds. The overall performance of the system depends on the slowest component in the system.

For example, in a database application, the most significant constraint on performance is the I/O issue of writing data to disk. In other words, the disk is the bottleneck in this scenario, and our primary task is to address this bottleneck.

2. Benchmark Testing and Warmup #

Benchmark testing is not just a simple performance test, but a test to measure the optimum performance of a program.

Application interfaces often have brief timeouts immediately after startup. Before conducting the test, it is necessary to warm up the application and eliminate the effects of factors such as the JIT compiler. In Java, there is a component called JMH that can eliminate these differences.

Points to note #

1. Rely on numbers instead of speculations #

Some students have a good feel for programming and can identify system bottlenecks through guesswork. While this may be possible, it is highly discouraged. Complex systems often have multiple factors influencing performance. We should prioritize performance analysis over optimization and use intuition as an auxiliary, but not as a tool for drawing conclusions.

When optimizing performance, we usually prioritize the analyzed results based on difficulty and impact. We start by tackling the most impactful points and then address the other factors one by one.

Some optimizations may introduce new performance issues, and sometimes these new issues can lead to even worse performance degradation. You need to evaluate this chain reaction and ensure that such optimizations are indeed necessary. Moreover, it is important to use numbers to measure this process rather than relying on intuition or speculation.

2. Insufficient Confidence in Individual Data #

Have you ever had the experience of a well-known website being very slow to load, taking x seconds just to load? In fact, it is inappropriate to draw the conclusion of “slow” based on a single individual’s request. When we conduct performance evaluations, we often fall into such a misconception.

This is because the value of small batches of individual requests for reference is not very significant. Response time may vary depending on the user’s data, as well as device and network conditions.

A reasonable approach is to find patterns from statistical data, such as the average response time and TP values mentioned above, or even the histogram of response time distribution. All of these can help us evaluate the quality of performance.

3. Avoid Premature Optimization and Excessive Optimization #

Although performance optimization has many benefits, it does not mean that we should strive for perfection in every aspect. Performance optimization also has its limits. It is more difficult to make a program run correctly than to make it run faster.

The pioneer of computer science, Donald Knuth, once said, “Premature optimization is the root of all evil”, and this is the truth.

If an improvement does not bring significant value, why should we spend a lot of effort on it? For example, if an application already meets the throughput and response requirements of users, but some colleagues are obsessed with optimizing the JVM and spend a lot of effort on parameter testing, this kind of optimization is excessive.

Time should be spent on the critical points. We need to find the most urgent performance issues and tackle them. For example, if a system is primarily slow in database queries, but you spend a lot of effort optimizing Java coding conventions, this is a typical case of deviating from the goal.

In general, code optimized for performance is often difficult to understand because it sacrifices readability and compromises on structure. Clearly, premature optimization allows these hard-to-maintain characteristics to enter your project early, and when it comes time for code refactoring, it will require even greater effort to address them.

The correct approach is to consider project development and performance optimization as two separate steps. Performance optimization should be done when the overall architecture and functionality of the project are relatively stable.

4. Maintain good coding habits #

As we mentioned above, we should not optimize too early or over-optimize, but that doesn’t mean we should not consider these issues when coding.

For example, maintaining good coding standards allows for easy code refactoring. Using appropriate design patterns and properly dividing modules allows for targeted optimization for performance and structural issues.

In the pursuit of high-performance, high-quality coding, good habits are accumulated, forming excellent cultivation and quality on the path of life, which is very beneficial for us.

Summary #

In this lesson, we briefly introduced some performance metrics, such as common throughput and response speed, and discussed other influencing factors, such as concurrency, page load time, and fault tolerance.

At the same time, we also talked about the bucket theory and benchmark testing as two process methods, and introduced some misconceptions and considerations in performance testing. Now you should have a better understanding of how to describe performance. Professional performance testing software, such as JMeter and LoadRunner, extends these basic performance metrics. In our daily work, we should also use professional terminology as much as possible in order to assess system performance correctly.

After understanding the optimization metrics and having action orientation, what should we focus on next? Are there any rules to follow for Java performance optimization?

In the next lesson, we will introduce the various considerations for performance optimization as a whole.