02 How to Formulate Performance Tuning Strategies

02 How to formulate performance tuning strategies #

Hello, I am Liu Chao.

In the previous lesson, when I was introducing the importance of performance tuning, I mentioned performance testing. Faced with increasingly complex systems, developing reasonable performance tests can help us identify performance bottlenecks in advance and then develop targeted tuning strategies. The summary is the “test - analyze - tune” three-step process.

Today, based on this foundation, let’s discuss in depth “how to develop performance tuning strategies for a system”.

Performance Testing Guide #

Performance testing is a necessary measure to identify performance bottlenecks in advance and ensure system performance stability. Below, I will introduce you to two commonly used testing methods to help you test system performance comprehensively.

1. Microbenchmark Performance Testing #

Microbenchmark performance testing can accurately locate performance issues in a specific module or method. It is particularly suitable for comparing the performance of a function module or method under different implementations. For example, comparing the performance of a method using synchronous and asynchronous implementations.

2. Macrobenchmark Performance Testing #

Macrobenchmark performance testing is a comprehensive test that needs to consider the testing environment, testing scenarios, and testing objectives.

First, let’s look at the testing environment. We need to simulate the real environment of the production system.

Next, let’s consider the testing scenarios. We need to determine whether there are other business interfaces running in parallel while testing a specific interface, causing interference. If there is, we need to take it seriously because neglecting such interference can lead to biased test results.

Finally, let’s look at the testing objectives. Our performance testing should have specific goals. We can measure the system’s compliance by throughput and response time. If it doesn’t meet the standards, we need to optimize it. If it does meet the standards, we can increase the concurrency in testing and explore the maximum transactions per second (TPS) of the interface. By doing this, we can gain a deeper understanding of the interface’s performance. In addition to testing the throughput and response time of the interfaces, we also need to cyclically test interfaces that may cause performance issues and observe the CPU, memory, and I/O utilization rates of each server.

These are the detailed explanations of the two testing methods. It is worth noting that performance testing is subject to interference factors, which can make the test results inaccurate. Therefore, when conducting performance testing, we need to pay attention to several issues.

1. Warm-up Issue #

When conducting performance testing, our system will run faster and faster. The access speed after the initial access will be several times faster than the first time. Why is that?

In the Java programming language and environment, after the .java file is compiled into a .class file, the machine cannot directly execute the bytecode in the .class file. It needs an interpreter to convert the bytecode into native machine code in order to run. In order to save memory and improve execution efficiency, when the code is initially executed, the interpreter will interpret and execute this code in advance.

As the code is executed more times, when the virtual machine finds that a method or code block is running frequently, it will consider these codes as hot spot code. In order to improve the efficiency of hot spot code execution, the virtual machine will compile these codes into machine code related to the local platform through a just-in-time (JIT) compiler during runtime. It will also optimize these codes at various levels and store them in memory. After that, every time the code is executed, it can be directly obtained from memory.

Therefore, during the initial running stage, the virtual machine will spend a long time optimizing the code comprehensively, and then it can be executed at the highest performance.

This process is called warm-up. If the warm-up time is too long during performance testing, it will cause the first access speed to be slow. In this case, you can consider optimizing first before conducting the test.

2. Unstable Performance Testing Results #

When we conduct performance testing, we find that although the data sets processed in each test are the same, the test results vary. This is because during the testing process, there are many unstable factors, such as the influence of other processes on the machine, network fluctuations, and different garbage collection stages of the JVM at each stage, and so on.

We can conduct multiple tests and average the results or create a curve graph. As long as the average value is within a reasonable range and the fluctuation is not significant, this scenario can be considered as a pass for the performance testing.

3. Impact in the Case of Multiple JVMs #

If our server has multiple Java application services deployed in different Tomcat instances, it means that our server will have multiple JVMs. Any JVM has the right to use the system’s resources. If only one JVM is deployed on a machine and the performance test is conducted, the test results may be positive, or the optimization effect may be good. However, in the case of multiple JVMs on a machine in the live environment, the results may not be the same. Therefore, we should try to avoid deploying multiple JVMs on a single machine in the production environment.

Reasonable Analysis Results and Optimization Strategies #

Here I will explain the analysis and optimization steps combined together as part of the “three-step approach”.

After completing the performance test, we need to generate a performance test report to help analyze the system’s performance. The test results should include the average, maximum, and minimum throughput of the tested interface, response time, CPU, memory, I/O, and network IO utilization of the server, as well as JVM’s garbage collection (GC) frequency, etc.

By observing these optimization criteria, we can identify performance bottlenecks and then analyze and troubleshoot the issues using a bottom-up approach. First, we check if there are any abnormal CPU, memory, I/O, or network utilization at the operating system level, then we use commands to search for exception logs, and finally, we analyze these logs to identify the cause of the bottleneck. We can also check the JVM level of the Java application to see if there are any abnormal JVM garbage collection frequencies or memory allocations, and analyze the logs to find the cause of the bottleneck.

If there are no abnormalities at both the system and JVM levels, we can then check if there are any performance bottlenecks at the application service layer, such as Java programming issues, read/write data bottlenecks, etc.

Analyzing and troubleshooting performance issues is a complex and meticulous process. A performance issue may be caused by a single reason or by multiple factors. We can use a bottom-up approach to troubleshoot and a top-down approach to optimize the system’s performance. Below, I will introduce several optimization strategies from the application layer to the operating system layer.

1. Optimize the code #

Problematic code at the application layer often exposes itself through the consumption of system resources. For example, if a specific code segment causes a memory overflow, it usually means that the JVM’s memory has been exhausted. As a result, system memory resources are depleted, and frequent garbage collection by the JVM can lead to a persistent CPU utilization above 100%, further consuming system CPU resources.

There are also performance issues caused by non-problematic code, which can be more difficult to identify and require experience-based optimization. For example, when using a LinkedList collection, traversing the container with a for loop significantly reduces reading efficiency. However, this efficiency reduction may not necessarily lead to abnormal system performance parameters.

Experienced developers will switch to using an Iterator to iterate through the collection. This is because LinkedList is implemented as linked-list, and iterating over elements using a for loop results in the List being traversed every time an element is accessed, reducing reading efficiency.

2. Optimize the design #

There are many design patterns in object-oriented programming that can help optimize the design of code at the business and middleware layers. After optimization, code can not only be streamlined but overall performance can also be improved. For example, in scenarios where objects are frequently created and used, the Singleton pattern can be applied to share a single instance, reducing the performance cost of frequent object creation and destruction.

3. Optimize the algorithm #

Well-designed algorithms can significantly improve system performance. For example, using appropriate search algorithms in different scenarios can reduce time complexity.

4. Time-for-space trade-off #

Sometimes, the system does not have high speed requirements for queries but has strict requirements for storage space. At such times, we can consider using more time to save space.

For example, as explained in Lesson 03, using the intern() method of the String class can store frequently repeated data in the constant pool and reuse the same object. This greatly saves memory storage space. However, since the constant pool uses the HashMap data structure type, if we store too much data, the query performance will decrease. Therefore, in scenarios where storage capacity requirements are stringent, but query speed is not a priority, we can consider utilizing time-for-space trade-offs.

5. Space-for-time trade-off #

This method utilizes storage space to improve access speed. Many systems nowadays use MySQL databases, and sharding is a typical example of using space-for-time trade-offs.

When storing data in a single MySQL table with over tens of millions of records, the read/write performance will significantly decrease. At this point, we need to split the table data by a specific field hash value or other means. When querying data, the system will determine which table to access based on the hash value of the condition. Due to the reduced amount of data in each table, the query performance is improved.

6. Parameter tuning #

Optimizing business layer code alone is not enough. Optimization of JVM, web containers, and operating system parameters is also crucial.

By reasonably setting JVM memory space and garbage collection algorithms based on specific business requirements, system performance can be improved. For example, if our business involves creating a large number of large objects, we can configure the JVM to directly allocate those objects to the tenured generation. This can reduce the frequency of minor garbage collections, decrease CPU usage time, and improve system performance.

Improper thread pool settings in web containers and kernel parameter settings in Linux operating systems can also cause performance bottlenecks. Optimizing these two aspects based on specific business scenarios can enhance system performance.

Backup strategies to ensure system stability #

All the performance optimization strategies mentioned above are means to improve system performance. However, in this rapidly developing era of the internet, the number of users for a product is constantly changing. No matter how well we optimize our system, there will still be a limit to its capacity. Therefore, to ensure system stability, we need to adopt some backup strategies.

What are backup strategies? #

First, we can implement rate limiting by setting a maximum access limit for the system’s entry point. We can refer to the TPS (transactions per second) of the stress testing for this. At the same time, we can also take circuit-breaking measures to gracefully handle unsuccessful requests.

Second, we can implement intelligent horizontal scaling. Intelligent horizontal scaling ensures that when the traffic exceeds a certain threshold, the system can automatically add new services based on demand.

Third, we can proactively scale up. This method is usually used for highly concurrent systems, such as flash sale business systems. This is because horizontal scaling cannot handle a large number of requests that occur in an instant, and even if it does succeed, the flash sale will be over.

Currently, many companies use Docker containers to deploy application services. This is because Docker containers use Kubernetes as the container management system, and Kubernetes can achieve intelligent horizontal scaling and proactive scaling of Docker services.

Summary #

After studying this lesson, you should have some understanding of performance testing and performance tuning. Let’s review today’s content through a figure.

img

We divide performance testing into microbenchmark performance testing and macrobenchmark performance testing. The former allows us to accurately optimize the business functions of small units, while the latter can simulate the overall online environment by combining internal and external factors to test system performance. The combination of these two methods can comprehensively test system performance.

The test results can help us formulate performance tuning strategies. There are many tuning methods, which I will not go into detail here. However, they all have a common point: the strategies may vary, but the ideas and core concepts are the same, starting from business optimization, then programming optimization, and finally system optimization.

Lastly, I want to remind you that any tuning should be based on clear scenarios, known issues, and performance goals. Tuning just for the sake of tuning may introduce new bugs and bring risks and disadvantages.

Thought-provoking question #

Assuming you are currently responsible for an e-commerce system that is about to launch new products and have flash sales, what specific functionalities would you conduct micro-benchmark performance testing on, and which functionalities would you conduct macro-benchmark performance testing on?

Looking forward to seeing your answers in the comments section. Feel free to click on “Please share with friends” to share today’s content with your friends and invite them to join the discussion.