03 Core Analysis Logic All Performance Analyses Rely on These Seven Steps

03 Core Analysis Logic All Performance Analyses Rely on These Seven Steps #

Hello, this is Gao Lou.

I have previously read some performance analysis methodologies, such as the SEI Load Testing Planning Process, RBI Methodology, Performance Degradation Curve Analysis Method, and so on. Many of these viewpoints only stay at the conceptual and methodological level, without providing specific implementation details. After reading them, it is unclear how to proceed further. In my opinion, such methodologies are completely unnecessary.

Here, I would like to extend this point further. After some foreign concepts are translated into Chinese, many of them only stay at the stage of being known and understood, without being widely applied. For example, the methodologies mentioned earlier may not be known by many people in the performance industry. This shows that these methodologies lack an audience base, regardless of their quality. And those few who know about them simply translate these theories and use them as labels in certain documents, but when it comes to actual work, they don’t know how to proceed.

If I simply tell you that these methodologies are not effective, it would be just empty criticism. As educated individuals, we need to have supporting arguments. Let’s now look at some specific content.

If you search for the keywords “performance testing methodology” on any search engine (such as Baidu, Google, 360, etc.), you will find many copied contents. These contents mainly describe the implementation process of a test, and these processes are mostly limited to the testing stage. For example, these are a few descriptions of the “SEI Load Testing Planning Process” (the content may be a bit long, but it is not the focus of this lesson, so you don’t have to read it too carefully).

The SEI Load Testing Planning Process is a method that focuses on load testing planning, with the goal of producing a “clear, understandable, and verifiable load testing plan.”

The SEI Load Testing Planning Process includes six areas of focus: objectives, users, use cases, production environment, testing environment, and test scenarios.

  1. Difference between production environment and testing environment: Due to the differences between the load testing environment and the actual production environment, the load testing results on the testing environment may not accurately reflect the actual performance of the application system in the production environment. To mitigate this risk, the testing environment must be carefully designed.

  2. User analysis: Users are the ones who are most concerned about and affected by the performance of the tested application system. Therefore, it is necessary to analyze user behavior and establish use cases and scenarios based on user behavior models.

  3. Use cases: Use cases are the processes of implementing business processes in a certain order and manner. For load testing, the role of use cases is mainly to analyze and break down key business processes, assess the frequency of each business process, and identify the risks of performance problems.

From the above description, it can be seen that these contents are focused on the execution process of “testing.” The creator of this theory is Mark McWhinney. In 1992, he wrote a white paper similar to CMMI called “Software Measures and the Capability Maturity Model” together with John H. Baumert.

In this 304-page white paper, Mark McWhinney describes four levels of software measures and maturity models: repeatable level, defined level, managed level, and optimized level. These descriptions cover processes, impacts, costs, quality, stability, and other aspects.

There is no problem with such definitions themselves, but if an enterprise only obtains a certification without actually following it in specific projects, then this theory becomes meaningless.

In the performance industry, if we want to achieve practical results, we cannot obtain specific guidance from SEI, and that is the problem. We need specific guidelines for performance capacity and bottleneck analysis to demonstrate the ultimate value of performance projects. The lack of such guidelines has meant that many performance practitioners lack a reference path for growth. (I will not further analyze other performance methodologies here. If you are interested, you can look them up.)

This is also why, before diving into performance analysis cases, I want to talk about the core logic of performance analysis.

In Lesson 6 of my column “Performance Testing in Practice: 30 Lessons,” I thought I had covered all the core analysis logic. I had a feeling of pouring my heart and soul into it, and I felt like I wouldn’t need to write about analysis logic anymore. However, when I was writing this column, I still felt that something was missing.

One of the biggest challenges for performance engineers today is the lack of analysis thinking. Many people know various tools, but how to assemble the data from these analysis tools into a logical sequence is a difficult point for many people.

If we look at it from the perspective of the “testing” industry, complete performance analysis cases are very rare. However, if we look at it from the perspective of operation and maintenance or other positions, there are some cases available. But if we look at most performance cases, they lack an analysis methodology that can be distilled to a higher level.

Therefore, I believe that a performance analysis column must have a section to clearly explain the analysis logic.

However, when writing this section, I didn’t have the same feeling of pouring my heart and soul anymore. Because the purpose of this section is to “fix” the performance analysis thinking. Yes, you read it correctly, I said “fix it,” which means that after this section, we will not have any other analysis approaches.

I call this fixed analysis approach the “RESAR Performance Analysis Seven-Step Method” (please note that this is only a part of the RESAR performance engineering, not the entire RESAR performance engineering).

RESAR Performance Analysis Seven-step Method #

Following the RESAR performance engineering theory, our analysis logic is as follows:

Step 1: Stress Scenario Data. #

In my opinion, the two most important curves provided by stress testing tools are TPS (you can also call it by other names like RPS, HPS, CPS, etc., the name is not the key) and response time.

No matter what stress testing tool you use, as long as it can provide these two curves, even if you develop your own multi-threaded stress testing tool, it doesn’t matter. Whether it’s threads or coroutines, as long as you can generate the corresponding stress according to the business logic.

Why do I say that the TPS and response time curves are the most important, and what about other curves like throughput, click rates, error rates? Error rates only need to be viewed when there are errors, and I think you should agree with this. The curves of throughput, click rates, and the like will inevitably have the same trend as the TPS curve, so we don’t need to analyze them separately.

Therefore, in the first step, we only need to obtain the TPS and response time curves from the stress scenario.

Step 2: Analyze Architecture Diagram. #

Next is to analyze the architecture diagram. In this step, what we need to do is to look at the path of stress traffic. This is mainly to see the relationship between the analyzed links. If the business logic is complex and the deployment is complex, we can divide it into business paths and deployment paths. If it’s not complex, then drawing one path is enough.

Step 3: Split Response Time. #

Here, I want to emphasize to you that splitting the response time is the key starting point for performance analysis. Many people, when they see a high response time, always start guessing where the system’s performance bottleneck is without further splitting. If you are also like this, you must change this mindset and not always be obsessed with the phenomenon.

Step 4: Global Monitoring Analysis. #

By the way, many tools platforms that seem to have global monitoring capabilities are actually missing some counters. Therefore, we must complete the performance counters according to the performance analysis decision tree. If it is difficult to obtain these counters on the current tool platform, then supplement them through other tools or commands. You should pay special attention to this.

When I was analyzing a problem for a banking client before, they said they had monitoring data at various levels. But the actual situation is that the counters related to the problem were missing. Such situations are actually quite common, and many companies often only focus on the coverage at a high level and neglect the completeness of specific counters. “The step of ‘Global Monitoring and Analysis’ is crucial, and the key is to have a sufficient understanding of the counters you see. If you look at the data and have no reaction, it means that you have not reached the level of analysis. At this point, you can either read specialized columns, books, or search on Baidu (although Baidu may not be very helpful at this point), or give up.

So how do we know if a global counter has a problem? This requires a solid foundation, which is what I often refer to as basic computer knowledge. Performance analysis covers a wide range and not all knowledge related to it will be labeled with the word ‘performance’.

People often ask what is a reasonable GC frequency is. This is a difficult question to answer. As long as the GC does not affect the system capacity, it is acceptable. Therefore, we need to first look at the relationship between the GC and the system capacity curve and then make a judgment.

In performance analysis, no counter can directly tell us, ‘I am sick!’ We can only judge whether it is sick or not by ourselves.

Step 5: Targeted Monitoring Analysis. #

After looking at the global monitoring counters, we can determine which area has issues and then proceed with targeted monitoring. Don’t start with code analysis, specific parameter adjustments, SQL optimization, and so on. It will not only be chaotic but also may not yield results.

In the ‘Targeted Monitoring Analysis’ step, it is crucial to correspond with the global monitoring counters. When we want to find a stack, we need to know why we are looking for a stack; and when we want to determine if there is an issue with the I/O parameters, we also need to know why we are looking for the I/O parameters.

In this way, the logical relationship between the before and after forms what I have always emphasized in the RESAR performance engineering - an ’evidence chain’.

Step 6: Identify Performance Bottlenecks. #

With an evidence chain, we must determine the performance bottleneck. For example, if we want to determine if there are locks in a stack, we need to find out which threads are waiting for the lock and which thread holds it. Similarly, if we want to determine why an SQL is slow, we need to look at the execution process and identify the step that has the problem.

Once we have identified the performance bottleneck, the next step is to find a solution.

Step 7: Determine the Solution. #

Actually, knowing where the bottleneck is doesn’t necessarily mean knowing what solutions are available. Just like someone seeing locks in a stack without knowing how to unlock them, or someone knowing an SQL is slow without knowing how to optimize it. However, this step is the key to demonstrating the value of a performance project. Regardless of how difficult the previous steps are, providing a solution is always the focus of performance personnel.

These are the seven steps of RESAR performance analysis, which are used in every performance analysis case. In specific cases, we may choose a few steps to take. Of course, it is also possible to go through all seven steps for every case. However, in our analysis process, if we already have a clear problem point, we do not need to analyze it again.

For example, if we already know the problem point, we can directly proceed with targeted monitoring analysis without going back to step four. Also, if the performance bottleneck does not cause long response times but other issues, we may not need to go through step three. You will see the specific application of these concepts in the case studies in the later classes.”

Summary #

The core logic of performance analysis that we discussed in this lesson is the specific performance bottleneck analysis guidance in RESAR performance engineering. Without it, there would be no concrete steps for analysis. However, it loses its value if the core logic is not followed during implementation.

In these seven steps, corresponding knowledge systems are involved. For example, when building a performance analysis decision tree and searching for evidence chains of performance bottlenecks, we need strong technical knowledge to support it. It’s okay if one person doesn’t possess all the foundational knowledge; a team can be organized to work on this together.

For me, the RESAR performance analysis seven-step method is a logic that I must rely on when identifying any performance bottlenecks. It has helped me solve many problems I had never encountered before. If you want to apply this process, please remember the key point I have always emphasized: in performance analysis, you only need to know what to do next, and we will eventually find the specific cause of the bottleneck.

Homework #

Finally, please ponder on the following two questions:

  1. Why is the RESAR Performance Analysis Seven-Step Method necessary in performance projects?
  2. What kind of analysis logic did you use in the optimization cases you worked on before?

Feel free to write down your thoughts and answers in the comment section and discuss them with me. If you found this lesson helpful, feel free to share it with your friends. Their thoughts might give you even greater insights. See you in the next lesson!

About the Course Reader Group

Click on the link in the course details page, scan the QR code, and you can join our course reader group. We hope that the exchanges and collisions of ideas here can help you make greater progress. Looking forward to your arrival~