10 Designing Standard Scenarios Needs to Pay Attention to These Key Points

10 Designing standard scenarios needs to pay attention to these key points #

Hello, I am Gao Lou.

In previous lessons, we mentioned that in RESAR performance engineering, scenarios are divided into four categories: benchmark, capacity, stability, and abnormal. Each category of scenario corresponds to different goals.

Among them, benchmark scenarios are designed to identify obvious system configurations and software bugs, while also providing benchmark data for capacity scenarios. In the logic of RESAR performance engineering, benchmark scenarios are a very important part and not just an arbitrary experiment to see if the scenario can be run. There should be definitive conclusions.

In this lesson, I will explain several basic questions, such as how to determine the number of threads, the importance of incrementally increasing pressure threads, and how to apply the previously discussed analysis approach to specific analysis cases.

Let’s take a look together.

Performance Scenario Classification #

When designing performance scenarios, the first thing we need to clarify is the goal of the scenario. In some projects, we usually receive the following requirements:

Evaluate the maximum capacity that the system can support. This is obviously to know the current capacity of the system, and the goal is clear.
Test and optimize the system to support the online business goals. This requirement obviously requires optimization.
Test and evaluate whether the performance capacity can meet the business development in the next few years. This requirement obviously requires testing future business scenarios.

These are several types of performance requirements that we often receive. Based on this, I classify the scenarios into three categories according to their goals:

Validation: Evaluate the current system capacity.
Tuning: Evaluate and optimize the current system.
Calculation: Evaluate and calculate the capacity of future systems.

What is the relationship between this classification and the classification we have always emphasized in terms of types (benchmark, capacity, stability, and exception)? Here I draw a figure to illustrate:

From the figure, it can be clearly seen that there is a relationship between these two classifications: we first determine the goal of the performance scenario, and then design the corresponding specific scenario.

You should note that for the three goals in the figure, the goal at the bottom includes the goal above it, such as the scenario with the goal of tuning includes the scenario with the goal of validation.

With these basic understandings, let me explain in detail below.

1. Classification by Goal #

For the three types of performance scenarios classified according to the goal, let’s take a look at them in combination with the RESAR performance process diagram.

Performance Validation

Performance validation (testing) refers to verifying whether there are performance changes in the current version of the system, the current model, and the current environment. Note that in this stage, we do not perform complex performance monitoring, performance analysis, or tuning.

In the current performance market, most projects are in the stage of performance validation. For a system that has been running stably online for a long time, it is reasonable to perform version update validation as long as we compare the data. The duration of this project cycle is usually within one or two weeks, and it should not be longer, unless there are major performance bottlenecks.

For projects that involve performance validation, many people have been conducting the steps of “performance scenario execution” and “performance results/reports”. Other steps are not ignored either, but they may just modify previous documents and go through the motions, thinking that no one will look at them carefully anyway. Therefore, this performance validation project has become: for each version, just execute it with the same script, environment, and data.

When such executions are done frequently, you will develop a misunderstanding: it turns out that performance is carried out in such a boring way, round after round, with familiar postures and familiar tastes… Among the performance practitioners I have met, many of them have come to know about performance through such projects and believe that they are quite good at it, thinking that performance is not that difficult.

Performance Tuning

Performance tuning refers to performing performance monitoring, performance analysis, and performance optimization for the current system, model, and environment, and providing specific conclusions. This is something that most projects should strive for but actually haven’t achieved.

If a project needs to provide conclusions like “at what capacity capability can the system run after going online,” then the refinement of this scenario goal is quite critical.

Currently, many performance projects lack a clear conclusion. What does “providing a conclusion” mean? You may say that I have written the TPS and CPU usage rate, is that a conclusion? I’m sorry, but I don’t think that’s a conclusion. The conclusion should have business significance, such as supporting 10 million concurrent users or supporting 10,000 concurrent users. Only then can it be called a conclusion.

No matter how many transactions per second (TPS) you provide, when your boss or someone else asks, “Will the system crash after 10 million users are online?” you may feel like you’ve been hit with a big stick and not know how to respond. At this point, the other person’s impression is that the performance has no specific value. No matter how tired and hard you work, the value of performance will be underestimated in this situation.

Recently, I had a conversation about how to demonstrate the value of performance with a friend who has more than ten years of experience. I said that if it were my project, I would make this commitment: within the scope of the performance scenario I am executing, I guarantee that the system will not crash online. If it does crash, I feel that this performance project should not be charged for.

It’s like buying a phone and then finding out that you can’t make calls. How would you feel? You would either return it or exchange it, all while feeling frustrated.

If that’s the case, why can’t we make such a commitment when it comes to performance? Just think about it: if you finish a project but can’t tell the other person whether the system will work well, why would they need you? They would just cancel the project and save costs.

In addition, looking at the process diagram of RESAR performance engineering, for performance tuning projects, we need to complete the entire process from “performance requirements” to “production operations and maintenance.” Note that this entire process is not just going through the motions; every step needs to be meticulously crafted.

Performance estimation

Performance estimation is for future systems, future models, and future environments. We need to conduct rigorous analysis of business growth models and perform performance monitoring, analysis, and optimization during scenario execution, while providing specific conclusions. Many projects may want to achieve performance estimation, but often they just go through the motions.

In fact, in the scenario goals of performance estimation, if the future time to be estimated is not too far away, we can indeed estimate it based on the trend of business development, and this is also a reasonable scenario. The problem is when we encounter demands with exaggerated requirements, like a system that can go without downtime for ten years.

For projects involving performance estimation, we also need to complete the entire process from “performance requirements” to “production operations and maintenance.” However, there are two aspects that differ from performance tuning projects: “performance requirements” and “performance model.”

In performance estimation projects, performance requirements and performance models are not determined by performance testers, but by the entire team. Everyone, from top management to frontline employees, needs to have a unified understanding. Otherwise, when the project is finished, you won’t be able to answer the boss’s question of “Can it support 10 million concurrent users?”

The above describes the three types of performance scenarios that we divide based on our goals. Here, let me summarize them for you with a diagram. I hope you can have a clear understanding of them.

2. Classification by process #

As we mentioned before, performance scenarios can also be classified by process, which refers to how we should execute the scenarios and where the performance scenarios should start. I don’t know if you remember, but I drew a diagram like this in Lesson 5:

From the diagram, you can see that I have been emphasizing these four types of scenario execution processes:

Baseline scenario
Capacity scenario
Stability scenario
Exception scenario

Please remember that these four scenarios are the only ones needed in performance scenarios.

You may ask if it’s that absolute. Yes, I am that stubborn.

In formal performance scenarios (the ones that require result reports), I want to reemphasize two keywords: “incremental” and “continuous.” To emphasize how important these two keywords are, I specially wrote them in red frames and red text below. I hope you can take them seriously.

These two keywords are a must-do in performance scenarios. Because in our production environment, there is no discontinuity, and the number of users in our production environment surely fluctuates from few to many. Only these two keywords can set the overall tone of the scenarios. That’s why I have been emphasizing them repeatedly.

Some may say, “Can’t I just try to see if the scenario can be executed? Why do I have to do all this?” Um… in that case, please leave and close the door behind you.

Now let’s talk about what we need to focus on during the execution process of the baseline scenario.

Benchmark Scenarios #

When we don’t have any knowledge about a system, the first thing we need to understand is the approximate capacity of the system and where to start. That’s where benchmark scenarios come in.

For example, in our e-commerce system, we need to test 11 different business scenarios. Can we immediately create scripts for all 11 scenarios and start applying pressure? Definitely not, because we don’t know the maximum TPS (Transactions Per Second) each scenario can handle and whether there are any performance bottlenecks. If we apply mixed pressure right away, multiple performance issues will be exposed simultaneously and they may affect each other, making analysis difficult.

So, we need to start with benchmark scenarios for individual interfaces. How exactly do we do that? Let’s take an example. First, we test the basic performance of the login interface with a few users (please note that this process itself is not a benchmark scenario). See the following:

From the graph, we can see that one thread creates approximately 20 TPS.

At what capacity can we reach with a single interface without affecting the capacity of mixed scenarios? Obviously, if this is a single login interface, it must exceed 50 TPS at the very least. And based on our testing experience with CRUD operations, even without caching, achieving 500 TPS on an 8C16G machine should not be a problem.

Assuming the maximum 500 TPS is linearly proportional to the number of threads, we would need:

\[Number\ of\ Threads = \frac{500\ TPS}{20\ TPS} = 25\ Threads\]

Additionally, since one pressure thread can generate around 20 TPS, and the TPS curve is still rising rapidly, I would consider extending the Duration (the duration of the scenario) to observe changes in various curves during this gradual increase. This will help determine the subsequent actions and maximum capacity. I would determine the pressure process of the scenario as follows:

In the graph, I went up to 30 threads, but we don’t actually need to go much higher—just exceeding 25 threads would be enough. I set the Ramp-up period to 600 seconds, meaning one thread is created every 20 seconds, resulting in a visible continuous increase.

Now, let’s summarize the entire approach:

First, determine the TPS value when running in a single thread.
Based on the maximum estimated capacity of the system, set the number of threads and incremental parameters in the scenario. It is important to note that if you are unsure about the capacity estimation, you can simply add more threads and observe changes in the curves during the increment process.
Determine the pressure parameters for the formal benchmark scenario.

We will follow this approach for other interfaces as well. Of course, we will need to make continuous adjustments during the testing process.

Now, based on the steps described above, let’s summarize the objectives of benchmark scenarios:

Obtain the maximum TPS for individual interfaces: If the maximum TPS for an interface does not exceed the requirements of the capacity scenario, it must be optimized. But if it exceeds the requirements, does it mean there is no need for optimization? Let’s move on to the second objective.
Address the performance issues encountered in individual interface benchmark scenarios: This means that when we encounter performance bottlenecks in individual interface testing, we must analyze them. This involves performance analysis logic. Therefore, performance analysis can generally be divided into two phases:

Phase 1: Exhausting hardware resources. In the benchmark scenario, we need to consume all the CPU, memory, network, IO, or any other resources, as such conditions make it easier to observe phenomena from the performance counters of global monitoring and continue to track and analyze them.
Phase 2: Optimizing to achieve maximum TPS. In the benchmark scenario, we need to maximize the TPS of individual interfaces to avoid becoming a bottleneck in the capacity scenario.

If we cannot achieve the goal of Phase 1, then there is no need to think further—we must find out where the bottleneck is. If the hardware resources are exhausted and the TPS meets the requirements of the capacity scenario, then from a cost perspective, there is no need to proceed with the project. If the hardware resources are exhausted but the TPS does not meet the requirements of the capacity scenario, then optimization is necessary.

Now, let’s execute a scenario for a single interface to see how the above approach is implemented.

Following the design steps of the benchmark scenario mentioned above, let’s first try to run the benchmark scenario for this interface. Please note that in the benchmark test, the purpose of the trial run is only to check the basic interface response time, not to complete the benchmark scenario.

Oh, what a mess!

From the graph, although the scenario execution time is not long, there are errors reported by 10 threads, and the response time and TPS have reached a disappointing level, with only 12.5 TPS. What should we do?

There is no other way but to analyze this process. The following content is my analysis of the problem. The main focus is on how we apply the performance analysis ideas we mentioned before.

Symptoms of the problem

As shown in the above figure, the problem is quite obvious.

Analysis process

From the RESAR performance analysis logic I’ve been advocating, for problems with long response time, the first thing we need to do is to split the time. Since this system has already deployed SkyWalking, it is natural for us to use it to see where the time is being wasted.

As you can see in the graph, the SelfDuration of a Token takes more than 5 seconds! Ah, the pitfalls of open-source projects are aplenty. It seems that they have implemented the functionality, and even have tens of thousands of stars, but they completely lack performance awareness.

However, this is also a good thing. Now we have something to work on. As people who do performance analysis, we have to deal with such poor systems in order to grow quickly.

Speaking of which, since the Token interface has a long response time and we cannot see the complete invocation stack in SkyWalking, we have two actions to take:

Print a complete stack trace to see the invocation chain.
Do not print the stack trace, but directly connect to the Java process to see the method’s time consumption.

We use the first method to see the invocation chain, and we still need to trace the time consumption of specific methods in order to proceed with the evidence chain. Here, I will directly use the second method.

In the second method, there are many tools we can use to view the method’s time consumption, such as JDB, JvisualVM, Arthas, etc. Here, we will use Arthas to trace it.

First, let’s trace the Token method.

trace com.dunshan.mall.auth.controller.AuthController postAccessToken '#cost > 1000' -n 3
trace org.springframework.security.oauth2.provider.endpoint.TokenEndpoint postAccessToken '#cost > 1000' -n 3
trace org.springframework.security.oauth2.provider.token.AbstractTokenGranter getOAuth2Authentication '#cost > 1000' -n 3
trace org.springframework.security.authentication.AuthenticationManager getOAuth2Authentication '#cost > 500' -n 3
trace org.springframework.security.authentication.ProviderManager authenticate '#cost > 500' -n 3
trace org.springframework.security.authentication.dao.AbstractUserDetailsAuthenticationProvider authenticate '#cost > 500' -n 3

With the above statements, let’s trace step by step, and finally reach here:

Please note that even if we don’t use Arthas, we can achieve the same effect with other tools. So, please don’t get obsessed with tools, if you have to be obsessed, be obsessed with me.

Since the authenticate method takes a long time, let’s open the source code and see what this section does.

Then, let’s debug and trace it, and see the following part:

It turns out that this is a BCrypt encryption algorithm.

Optimization plan

Let me explain. When Bcrypt is used for encryption, each HASH value generated is different and particularly slow!

After tracing to this point, the solution is quite clear, which is to use a faster encryption method or remove this encryption algorithm. We will leave the task of changing the encryption method to the developers. As a performance analyst, I have decided to remove this encryption algorithm directly. Let’s move on for now.

Optimization result

The optimization result is as follows:

From the graph, we can see that for the same number of threads, the TPS has increased from 20 to 80.

From this simple analysis logic, you can see that through the breakdown and tracing of response time, we can identify which method is slow and further analyze this method to determine the solution. This is the application of the seven-step RESAR performance analysis method. It seems that we skipped the step of analyzing the architecture diagram in the seven-step method during this analysis process, but in fact, we cannot skip it because whether it is to look at the architecture diagram or the invocation chain, we need to have architectural logic in mind.

In the benchmark scenario, we will encounter various problems, and I will record them one by one later, hoping to provide you with some inspiration.

Summary #

According to the RESAR performance engineering theory, in performance scenarios, we can divide them into four categories: benchmark scenarios, capacity scenarios, stability scenarios, and abnormal scenarios based on the execution process. Each of these scenarios has its own purpose. In this lesson, we mainly described the logic of benchmark scenarios and provided examples.

Benchmark scenarios have two important purposes:

Obtain the maximum TPS (transactions per second) of a single interface.
Solve performance issues encountered in single interface benchmark scenarios.

Both of these purposes are important for laying the foundation for capacity scenarios.

In this lesson, I mainly wanted you to experience the process of performance analysis. Of course, the optimization effect we achieved at the end has not yet met my requirements for performance. However, rest assured, you will see more analysis logic in the upcoming lessons.

Come, let’s continue our journey.

Homework #

Finally, please reflect on the following:

Why is RESAR performance engineering divided into only four categories?
When analyzing code time, what methods can we use to track Java execution time?

Remember to discuss and exchange your thoughts in the comments section. Each reflection will help you make progress.

If you have gained something from this lesson, feel free to share it with your friends and study and progress together. See you in the next lecture!