06 How to Extract a Business Model That Fits the Real Business Scenario

06 How to extract a business model that fits the real business scenario #

Hello, I’m Gao Lou.

As we know, the business model has always been an important part of performance projects. In capacity scenarios, each business proportion must match the proportion of real business scenarios. If it doesn’t match, the result of the scenario execution would be meaningless.

However, we often see that many performance practitioners lack understanding of the process of extracting business models, or cannot obtain specific data, resulting in a mismatch between the business model and the production business scenario, rendering the entire performance project meaningless.

There are also numerous projects that do not use historical business data for statistics and simply come up with corresponding business models in a very general manner. This is obviously unreasonable. However, this situation is common in industries such as finance and the Internet.

Of course, some people may use production environment requests for replay in order to make the business model match the real business scenario as much as possible. However, even if we replay the requests from the production environment, we cannot guarantee that the business model will be consistent with future business scenarios, as future business scenarios will change with business expansion.

So, when we are creating scenarios, we first need to understand whether the current scenario is simulating historical business scenarios or future business scenarios.

If it is a future business scenario, it should be evaluated by the business team rather than the performance team. However, in the current performance market, it is often demanded by companies that the performance team provide business models. This is clearly not rational. Firstly, the performance team is not as familiar with the business background as the business team, and secondly, their understanding of the business market is not professional enough.

In fact, in real work scenarios, the confirmation of the business model should never be done by a single team, but rather jointly determined by the business team, architecture team, development team, operations team, and performance team, and ultimately confirmed by the top-level leadership of the project.

If a system has historical business data, we will have background data when obtaining the business model. In this case, the performance team should extract the business models for each scenario from the historical business data. If the system does not have historical data, then, just like evaluating future business models, the teams need to collaborate to provide the current business model.

It is precisely because of the various problems mentioned earlier that performance practitioners often ask me how we can extract business models from historical business data. You may also have such confusion, so let’s discuss it in detail below, and at the same time, I will use an example to demonstrate a specific process.

In general, there are two major steps to extract real business models:

Extract production business logs. This step can be implemented through many methods. In this lesson, I will show you two relatively common methods. One is to use the awk command for extraction when there is no log statistics system, and the other is to use ELFK for extraction.
Sort out the business logic.

For the first step, we extract production business logs to obtain the corresponding business proportions. Now let’s take a look at how to use commands to extract production business logs.

Extracting Production Business Logs Using Commands #

Here I will provide an example with a small amount of Nginx logs. In Nginx, the log format is usually as follows:

120.220.184.157 - - [26/Oct/2020:14:13:05 +0800] "GET /shopping/static/skin/green/green.css HTTP/1.1" 200 4448 0.004  0.004 "https://www.xxx.cn/shopping/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36" "124.127.161.254"
120.220.184.203 - - [26/Oct/2020:14:13:05 +0800] "GET /shopping/static/js/manifest.0e5e4fd8f66f2b389f6a.js HTTP/1.1" 200 2019 0.003  0.003 "https://www.xxx.cn/shopping/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36" "124.127.161.254"
120.220.184.149 - - [26/Oct/2020:14:13:05 +0800] "GET /shopping/static/js/app.cadc2ee9c15a5c1b9eb4.js HTTP/1.1" 200 138296 0.100  0.005 "https://www.xxx.cn/shopping/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36" "124.127.161.254"

The above data content and sequence can be configured in the Nginx configuration file. Our goal is to extract the number of requests per second within a specific time period. So, we only need to extract the corresponding time and count the number of occurrences. The command is as follows:

cat 20201026141300.nginx.log|awk '{print $4}' |uniq -c

We get the following result:

5 [26/Oct/2020:14:13:05
3 [26/Oct/2020:14:13:06
14 [26/Oct/2020:14:13:07
4 [26/Oct/2020:14:13:08
1 [26/Oct/2020:14:13:09
2 [26/Oct/2020:14:13:10
1 [26/Oct/2020:14:13:12
2 [26/Oct/2020:14:13:20
14 [26/Oct/2020:14:13:23
1 [26/Oct/2020:14:13:24
2 [26/Oct/2020:14:13:26
2 [26/Oct/2020:14:13:29
9 [26/Oct/2020:14:13:30
9 [26/Oct/2020:14:13:31
1 [26/Oct/2020:14:13:32
13 [26/Oct/2020:14:13:35
2 [26/Oct/2020:14:13:37
20 [26/Oct/2020:14:13:38
2 [26/Oct/2020:14:13:39
33 [26/Oct/2020:14:13:44
17 [26/Oct/2020:14:13:46
5 [26/Oct/2020:14:13:47
23 [26/Oct/2020:14:13:48
29 [26/Oct/2020:14:13:49
4 [26/Oct/2020:14:13:50
29 [26/Oct/2020:14:13:51
26 [26/Oct/2020:14:13:52
22 [26/Oct/2020:14:13:53
57 [26/Oct/2020:14:13:59
1 [26/Oct/2020:14:14:02

This way, we can determine the time period with the highest number of requests. We can also apply this method flexibly. For example, if you want to extract data at the minute, hour, or day level, you can adjust the corresponding commands accordingly. For example, if we want to extract data at the minute level, we just need to add the appropriate truncation command, as shown below:

cat 20201026141300.nginx.log|awk '{print $4}' |cut -c 2-18|uniq -c

The corresponding result is as follows:

 352 26/Oct/2020:14:13
   1 26/Oct/2020:14:14

The above result means that there is data for two minutes in my log. In the first minute, there are 352 requests, and in the second minute, there is only one request.

If you want to perform statistics based on URL requests, you can modify the command as follows:

cat 20201026141300.nginx.log|awk '{print $7}' |cut -c 1-50|uniq -c

The result is as follows:

................
   1 /shopping/checkLogin
   1 /shopping/home/floor
   1 /sso/loginOut
   1 /shopping/home/navigation
   6 /shopping/home/floor
   2 /shopping/home/floorGoods
   1 /shopping/home/sysConfig
   4 /shopping/home/floorGoods
   1 /shopping/home/floor
   1 /sso/loginOut
................

Now, we are extracting the seventh data in the log and performing truncation for counting. This way, we can know the number of each request within each time period and obtain the corresponding business proportion.

As long as you apply these commands flexibly, there should be no problem in handling files with small amounts of data.

Extracting Production Business Logs using ELFK #

If you want to extract logs using ELFK, you can follow the steps below:

Install ELFK. ELFK refers to the combination of ElasticSearch/Logstash/FileBeat/Kibana. You can search for specific installation methods as there are many tutorials available online.
After configuring ELFK, you can view the collected information in the Discover interface of Kibana. Please note that each log corresponds to a hit.
By selecting a time period, you can see how many requests were made during that time period.
To obtain the percentage of interface requests, you can click on “Dashboard” in Kibana, create a Lens visualization panel, and select the corresponding URL field to view the percentage of each interface.

In general, extracting production logs using ELFK to obtain the business model can be divided into two stages.

The first stage is to analyze log information over a large time period and gradually narrow down the range, such as by year, month, day, hour, and minute. This step aims to cover the peak requests of the system.

The second stage is to refine the selected time period. Although we have already narrowed down the time period to minutes in the first stage, we need to further refine it to the TPS (transactions per second) level of the production environment. This allows us to compare the business scenarios in production with those in testing.

Next, I will provide a detailed explanation of these two stages using examples.

Stage 1: Analyzing logs over a large time period #

When viewing data in ELFK, it is recommended to select a time period that covers the entire business scenario as much as possible. For example, if we want to select the peak time period, we initially set a larger range for the time period to avoid missing data. Then, we can narrow down the range based on the height and low of the bar chart.

By doing this, we can determine the overall percentage of various business interfaces in the production environment.

In many enterprises, an effective method for log processing is to output the corresponding logs to ELFK in real time. This method not only allows flexible log retrieval, but also enables long-term log storage and additional post-processing, such as generating visualizations. Let’s try this method in practice.

Example

As shown in the above figure, we captured a segment of logs in Kibana, which contains a total of 6,624,578 requests. You can directly generate a table view like the one below using Kibana:

Table View

By doing so, you can see which requests are more prominent. Why didn’t I show the total count? It’s because for each request within a time period, we need to generate a corresponding bar chart. If we see that their concentrated time periods are the same, we can create one scenario. If they are different, we need to create multiple scenarios. Let’s search for some examples.

/mall-member/sso/login -
/mall-portal/home/content

Example 2

/mall-member/member/address/list

Example 3

Similar to the above examples, there are other interface graphs, but I won’t list them one by one.

By looking at the timestamps of the data points, in my example, all request levels have the same timestamps, so we only need to create one scenario to cover all of them. But please note that in your actual project, it may not be the case. If there is a high concurrency time period for a certain request that is different from other requests, multiple scenarios must be created to simulate it, as the business model in the scenarios will vary.

In my example, we included the data volume in the table and calculated the proportion, which is the quantity of a specific request divided by the total number of requests, as shown below:

Proportion

This represents the average proportion of each business interface during this time period.

By completing the first stage, we can identify the high request time periods. However, the time range is 5 minutes, which is already considered a concentrated time period for any system. But our work is not finished yet, because we not only need to know which time periods have high user activities, but also need to determine the specific TPS achievable in the production environment. Therefore, further refinement is required to determine the specific TPS achieved in production.

Stage 2: Refining the time period #

By observing the timestamps in the main interface graph, we can see that the time interval is 5 minutes. We select the time period with the highest requests and click on it.

Refined Graph

Now we have the graph with a time interval of 5 seconds:

5 Second Graph

Then, we calculate the TPS based on the number of hits, resulting in the following:

\(Production\ TPS = \frac{9278}{5} = 1,855.6\)

Of course, you can further refine it to the millisecond level:

Millisecond Graph

Using this method, we can determine the peak TPS of a system in the production environment. The value of 1855.6 is the total TPS in the testing environment. If it does not reach this value, further optimization or increased hardware resources may be required.

In general, after obtaining the total TPS, it is necessary to handle it in one of the three ways depending on the testing objectives, all of which are based on the business objectives:

No changes in business, minor changes in application version (usually minor functional changes or bug fixes): In this case, the calculated TPS can serve as the overall TPS target for performance scenarios.
No changes in business, major changes in application version (such as architectural changes): In this case, the calculated TPS can serve as the overall TPS target for performance scenarios.
Changes in business, major changes in application version: In this case, it is necessary to calculate TPS increment based on the estimated business changes. If the business trend indicates a 20% increase, you can add 20% to the calculated total TPS.

Organizing Business Logic #

After obtaining the business model from the above interface, we can organize the business logic according to the scale of the interface in order to simulate real production business scenarios more realistically. In fact, in the steps above, we have arranged them in order. You can take a look at the table in front.

So in this example, approximately 58% of users will complete the entire process. Why is it 58%? This is because the proportion of login business is 12%, while the proportion of placing an order later is 7%. So it is: - \( 7\%\\div12\% \\approx 58\% \)

Overall Process Description #

Finally, let’s summarize the overall process:

Please note that there can be multiple business scenarios in the actual project business statistics process. This approach can solve any issues of inconsistency between performance scenarios and production scenarios.

Summary #

Finally, let’s review the key points of this lecture together. There are several key points to note when extracting business models:

Extraction time: The extraction time must cover the peak time of the production system.
Extraction scope: The extraction scope should be large enough, as in some scenarios, even if it is not the peak time, there may be significant resource consumption due to a large volume of business.
Implementation of business proportion in the scenario: After obtaining the business model, we must configure the corresponding business proportion in the performance scenario, without significant deviation.

As long as these points are achieved, the performance scenario will basically not differ significantly from the real business scenario.

Homework #

After studying this lesson, please carefully consider the following two questions:

Why is it necessary to simulate real business proportions in performance scenarios?
What are the steps to extract production data and obtain a business model?

You are welcome to communicate and discuss with me in the comment section. Of course, you can also share this lesson with your friends around you. Their ideas may give you even greater gains. See you in the next lesson!

About the course reader group

Click the link on the course details page, scan the QR code, and you can join our course reader group. Hope that the communication and collision of ideas here can help you make greater progress. Looking forward to your arrival~