31 How to Write a Valuable Performance Report

31 How to write a valuable performance report #

Hello, I’m Gao Lou.

In a performance project, there are three documents that I believe are the most important: performance plan, performance report, and optimization report. In Lesson 5, we have already provided the complete content of the performance plan. As for the optimization report, I don’t need to write it anymore because all the analysis of each scenario we did earlier is already optimization content.

Today, let’s take a look at the performance report. The performance report is generally referred to as the “performance test report” in a project. However, I will weaken the word “test” because I will include all aspects of the entire performance project in it.

The performance report is a summary of a performance project and the ultimate embodiment of performance value, so the performance report is very important. Just like when we are trying to lose weight, we often say “70% diet, 30% exercise,” the same goes for the performance report. We can also say that it is “70% work, 30% report.” In other words, all the hard work is experienced by ourselves, and if the report is not well done, no matter how tired and hardworking we are, all our efforts will be in vain.

However, I have noticed that in the current performance market, many performance reports are written very hastily. Either the data collection is incomplete, or the conclusions are described unreasonably, which fails to showcase the value of a performance project that was initially well-executed.

So how can we write a valuable performance report? Let’s take a look together today.

Clear Conclusions should be provided in Performance Reports #

In a performance report, there is a section that is often a nightmare for most people, which is providing clear conclusions.

What is a clear conclusion? Let me list a few common conclusion descriptions for you.

Capacity scenario conclusions:

Description 1: There are obvious performance bottlenecks in server resources, it is recommended to upgrade or add servers; storage performance is poor, it is recommended to replace it with better-performing storage; a certain service has obvious performance bottlenecks, it is recommended that developers optimize it.

Description 2: Before the optimization test, the throughput was 50 TPS, and after the optimization test, it was 100 TPS. Some people also say that there were errors before the optimization test, but there were no errors after it. The CPU utilization rate was 90% before the optimization test, and it was 50% after it. The resource consumption was 1000 C and 2000 G before the optimization test, and it was 500 C and 1000 G after it.

Description 3: Under 100 concurrent users, the average response time of various functional points in a certain system meets the performance indicators. The total TPS of functional points is 3000, with a success rate of 100%. The average server resource utilization is within the target range. When the system has 200 concurrent users, the system’s processing capacity is 4000 TPS. When the concurrent user count continues to increase, the system’s processing capacity decreases.

Description 4: The system can support 20 million simultaneous online users and 20,000 concurrent users.

Conclusions in stability scenarios:

The system ran stably for 120 hours with 400 concurrent users, the average response time of each transaction was lower than the performance requirement, the TPS showed a stable trend, the success rate of transactions was 100%, and the utilization of server resources remained stable, meeting the performance requirements.

Conclusions in batch scenarios:

Execution time for A business online batch (batch transaction ID: 001, 002, 003), B business online batch (batch transaction ID: 004, 005), and C business online batch (batch transaction ID: 006) was 130,000 milliseconds, 10,000 milliseconds, 1,000 milliseconds, 12,000 milliseconds, 10,000 milliseconds, and 160,000 milliseconds, meeting the performance requirement. The system resource utilization also meets the performance requirement.

In addition, there are many more conclusion descriptions, but I will not list them one by one.

At first glance, do you think these conclusions are reasonable? Indeed, if we only look at the conclusions of a report, it is difficult to see what is wrong with the conclusions themselves. At most, we can know which aspect the conclusions are biased towards (technical or business).

However, in several descriptions of capacity scenario conclusions mentioned above, Description 1 is obviously an unqualified conclusion because there is no specific statement. I do not recommend using words such as “obvious,” “suggested,” “poor,” or “possible” in writing performance conclusions as they are not precise enough.

Description 2 already looks very precise, but it only describes the technical aspect and does not provide a conclusion on whether the system can support the business. Description 3 is more conventional, but you should be clear in your mind that descriptions like “the average server utilization is within the target range” have specific values for this “target”. Description 4 clearly states the conclusion of the business, which I think is more reasonable.

For conclusions in stability and batch scenarios, I will leave it for you to think about, and I will not comment on each one.

After discussing so much, you may wonder what kind of conclusions should be written in performance reports. We need to be clear that performance reports express the performance conclusions of a business system.

Once, a performance engineer from a company showed me a report, and after reading it, I asked, “Are you trying to show how hard you worked with this report?” The person replied, “No, I want to show that this project was done well.” I said, “In your report, you haven’t mentioned what was done well. I only see how hard you worked…”

Why did this happen? Mainly because in his report, he described clearly how many people were involved, how long it took, and what tasks were performed, but the conclusion part was very vague, just like the first description of real-time business capacity scenarios that we mentioned earlier.

So, I told him that the boss doesn’t need to see such a report. If you need to report to the boss, it should be concise and to the point, without the need for fancy writing. Our reports are not meant to showcase how hard we worked, please remember this.

Determine the audience before writing the performance report #

So what should a performance report look like? Here are two examples.

I previously worked on a performance project with a clear business objective: to complete the entire business process for 60 million users within one hour. At the end of the project, I wrote a detailed Word report, which was about 80 pages long. However, when presenting to the clients, I only used a PPT with less than 10 pages.

On the first page of the PPT, I only included two data points:

And during the presentation, I said: “Based on our analysis and optimization of the scenario, the current system can support 61 million complete business transactions in one hour, exceeding the business objective of 60 million.”

Then I continued: “If you are interested, I can briefly explain how this goal was achieved.” You should note that if there is no response from the audience, you can continue without going into too much detail, just giving a general summary of the technical content.

If, after you finish presenting the first page of the PPT, someone starts talking about where to celebrate later, there is no need to continue. In a presentation setting, technical content may not be well-received. The boss may find it boring, while the business side may be confused. However, if someone is interested in technical details, you can talk a little more.

In summary, when presenting a performance report, it is important to maintain control of the situation and guide the audience rather than being led by them.

I also worked on another performance project that took about three months and required working overtime almost every day. It was extremely challenging. When preparing the presentation, I first wrote down all the content I could think of in a logic order, resulting in a PPT with over 120 slides (this is my habit when making presentations – I try to include everything first and then cut it down later).

The night before the presentation, I looked at the over 120-page PPT and found myself getting confused. I didn’t expect to have so much content to organize. However, I knew that not all of this content should be included in the presentation, so I decided to cut it down. On the first pass, I reduced it to about 60 pages, which was still too much; on the second pass, I cut it down to 40 pages, thinking it was still too much; on the third pass, I reduced it to 20 pages, which felt more reasonable.

So, during the presentation with about forty people, I used those 20 pages and spoke for less than 10 minutes. At the end of the presentation, I said: “These are the conclusions. If any technical staff present are interested in the specific implementation process of the project, you can take a look at the over 240-page Word technical report we distributed before the meeting. If there are no questions, then my presentation ends here.”

The audience responded quite well after the presentation.

I shared these two examples to illustrate that, according to my logic, performance reports can take two forms:

A detailed technical report: This type of report is usually in Word, PDF, or HTML format, and the content includes project background, testing scope, performance indicators, tool environment, data volume, business model, scenario execution strategy, scenario result analysis, conclusions, issue summary, recommendations for future performance work, and operations suggestions.
A concise presentation report: This type of report is usually in PPT or Keynote format and includes conclusions, basic information description (summarized in a few simple pages), issue summary, recommendations for future performance work, and operations suggestions.

The first type of report is for technical personnel, while the second type is clearly for presentation situations.

Therefore, when writing a report, it is important to first consider who will be reading it. This point is crucial. For leaders, there is no need to provide excessive detail; for technical staff, avoid being too general.

Additionally, let me remind you to avoid arguments with those who raise objections during the presentation. Even if someone raises sharp questions, you must respond with understanding. During this process, maintain a cooperative attitude without appearing arrogant, and be open to other people’s opinions.

How to write a performance report in detail? #

Usually, I don’t use templates when writing performance reports because the basic outline is clear, and the content I listed in the technical report I just mentioned is already sufficient. As for more specific details, each project is different. If you insist on using a template, it can limit your thinking. I suggest that you better write the report yourself, word by word.

Since the example project for this course is a very complete project, let’s use it as an example to see how to write a performance report in detail.

First of all, it is clear that a complete performance report can be divided into two main parts:

The first part is the information before executing the scenarios, which includes the items listed in this lecture 5, such as project background, test scope, business model, performance metrics, system architecture diagram, software and hardware environment, stress testing tools and monitoring tools, data, scenario design, and report strategy, and monitoring design.
The second part is the information after executing the scenarios, which includes scenario result summary, scenario result analysis, conclusions, issue summary, suggestions for future performance work, and suggestions for operation and maintenance.

In your own project, you may not need such completeness for the performance report, so you can make appropriate adjustments.

Regarding the content of the first part, I have already provided detailed information in lecture 5, so I will not repeat it here. Next, let’s focus on the second part.

In the second part, we need to summarize the results of each scenario. The overall structure is as follows:

For the “Scenario Result Summary” and “Scenario Analysis” parts, we have described many of them in the previous case studies of the course, so I won’t repeat them here. When you write the specific project report, you can simply paste the corresponding screenshots and add some descriptions.

Now, let’s see what the “Scenario Conclusion,” “Suggestions for Future Performance Work,” “Production Configuration Suggestions,” and “Suggestions for Operation and Maintenance” are specifically in our example project of this course.

Scenario Conclusion #

Baseline Scenario

Let’s start by drawing a chart showing the TPS comparison between the baseline scenarios before and after optimization:

With this chart, we can clearly see the test results. All the optimizations we made in the baseline scenarios are reflected in this result.

What conclusion should we give? In fact, we only need one sentence to summarize: the baseline scenarios of all business can achieve the target TPS. This sentence indicates that based on the results of the baseline scenarios, each business will not become a bottleneck in the mixed scenarios. This is the most valuable information the baseline scenarios provide to the capacity scenarios.

Capacity Scenario

Let’s start by drawing a chart showing the TPS comparison between the capacity scenarios before and after optimization:

Through the chart, we can see the effects of all the optimizations in the capacity scenarios. Similarly, we need to give a conclusion: the capacity scenarios can meet the performance targets of the online business.

Where does this conclusion come from? It comes from our previous estimation of 1000 TPS. If you want to make a presentation, you can show a chart like this:

After having this chart, you can describe the technical implementation in any way you like, according to your preferences.

However, please note that the conclusion “the capacity scenarios can meet the performance targets of the online business” only stays at the technical level. If you want to give specific conclusions at the business level and user level, you should refer to the calculation logic of concurrent users, online users, TPS, and concurrency degree mentioned in lecture 8.

For our example system in this course, since it is a demo system, we did not generate production data for online users, concurrency, and other data statistics. However, to give you a more direct conclusion, I will use the business model mentioned in lecture 5 and the data mentioned in lecture 8 to explain the calculation process.

According to the business model mentioned in lecture 5, a complete sequence of requests on the interface level consists of 11 requests, but not every user will complete all 11 requests. Based on the proportions in the business model, we can calculate that 100 TPS (one T represents one interface request) can support 54 concurrent users on average. In other words, the average TPS required per user is:

\( 100\\div54\\approx1.85 \)

And the current TPS is 1700, so the number of concurrent users supported by the current system is:

\( Concurrent\ Users = \\frac{Max\ TPS}{TPS\ per\ user} = \\frac{1700}{100\\div54} \\approx 918 \)

Based on the concurrency degree of 2.4% mentioned in lecture 8, we can calculate the corresponding number of online users. ( Online\ users = \frac{{Concurrent\ users}}{{Concurrency}} = \frac{{918}}{{2.4%}} \approx 38250 )

Based on the calculations so far, we can further write more specific conclusions: Based on the capacity scenario, the maximum TPS of the system is 1700, the system can support a maximum of 918 concurrent users, and the system can support a maximum of 38250 online users.

I want to emphasize one point: the data used in the entire calculation process above comes from the example project of our course. When you do calculations in a real project, you can use this calculation logic, but the specific data needs to be collected by yourselves.

For the conclusions of the capacity scenario, we can stop here. If you really want to describe the process of the scenario, based on the example project of our course, you can describe the details of the capacity scenario like this:

In the capacity scenario, we optimized in four stages. In the first stage, we optimized the issue of response time increasing due to parameterized data. In the second stage, we optimized the issue of business table index. In the third stage, we optimized the issue of resource imbalance. In the fourth stage, we optimized the issue of Redis persistence caused by slow disk.

After these optimizations, the maximum capacity can reach 1700 TPS, support a maximum of 918 concurrent users and 38250 online users. The largest average response time of the business is below 200ms, which fully meets the performance requirements of online business capacity. At the same time, the CPU usage of the application service can reach around 80%, and the resource usage is balanced.

If you want to add such a general description to the conclusion, it is also possible. However, besides that, there is no need to describe more content, I don’t think it is necessary.

Stability Scenario

For the stability scenario, the most important conclusion is the accumulated business volume and duration. According to the results of Lecture 27, we can draw the following conclusion: the duration of the stability scenario exceeds 16 hours, and the accumulated business volume reaches more than 77 million, the system resource utilization rate remains stable at around 80%.

If you have enough time and resources, you can also extend the duration and accumulated business volume of the stability scenario by implementing scheduled and quantitative archiving strategies, sharding, and splitting databases, etc.

Abnormal Scenario

In the execution of the abnormal scenario, we simulated several types of abnormal issues, such as application abnormalities, operating system abnormalities, container abnormalities, virtual machine abnormalities, etc.

Based on the execution results, I will write a more general conclusion here (if you are interested, you can make it more detailed): in the process of executing the abnormal scenario, the TPS trend is as expected, but the application does not handle abnormal conditions, resulting in end users seeing errors instead of friendly prompts, indicating the existence of bugs that need to be fixed.

As you can see from the previous lectures, there are many abnormal scenarios we can make. When writing conclusions, you can make more general descriptions for scenarios without obvious problems, and you can also make some general descriptions for common problems. Because if we describe each scenario one by one, it will be too long.

Recommendations for Subsequent Performance Work #

In our example project of this course, there are three typical issues that need to be improved in subsequent performance work, and I will briefly describe them:

Scheduled tasks must be separated from real-time business. This suggestion is only for the open-source project we used in this course, and it is unlikely that anyone will not separate scheduled tasks and real-time business in a real project.
Develop a scheduled and quantitative archiving plan and sharding strategy that suits the business.
Return user-friendly prompts.

Production Configuration Recommendations #

Regarding production configuration recommendations, we can summarize it based on the content of Lecture 30. In our example project of this course, I confirmed the configuration of three parameters, so we can only list three production configuration recommendations:

In a real project, you can confirm all production configurations based on the project-level performance configuration tree I provided, and the content in this table will be much richer than it is now.

Recommendations for Operations #

In fact, we have already made relatively clear recommendations for operations:

Design and implement a global monitoring strategy and real-time alerting function for the project.
Implement flow control, degradation, and circuit breaking strategies, and implement automatic capacity expansion.
Based on the project-level performance parameter configuration list, configure the corresponding performance parameters in the production environment to meet the business capacity requirements.
Implement a scheduled and quantitative archiving and sharding strategy in the production environment.

For more specific recommendations, we can create corresponding documents and put them directly in the appendix of the report.

With this, our performance report is very comprehensive.

Summary #

Writing a performance report is essentially a summary of all the previous work. Therefore, all the data in the performance report should come from reliable sources. As for the expression, I suggest being straightforward and avoiding unnecessary verbosity. Additionally, it is advisable to use charts instead of tables to illustrate conclusions, as charts provide a more intuitive trend.

As the most important and impactful output document in a performance project, we must learn to write a performance report. In the process of writing, we need to consider the audience first and then consider the presentation form of the report from the audience’s perspective.

When giving a presentation, we should aim to be concise and not overly expressive, but we should also avoid omitting important information.

Homework #

Finally, please take a moment to reflect:

Consider the differences between your previous performance report and the description in this article.
Based on the content of this column, try writing a performance report that you think is appropriate.

Remember to discuss and exchange your thoughts with me in the comment section. Every thought will push you further.

If you have gained something from reading this article, feel free to share it with your friends and learn and progress together. See you in the next lesson!