11 Opening Homepage an Example to Help You Understand the Performance Issues of Basic Hardware Deployment

11 Opening homepage an example to help you understand the performance issues of basic hardware deployment #

Hello, I’m Gao Lou.

In this lesson, I’ll take you through the first part of a complete performance analysis case study. We’ll use the “open homepage” interface as a stress scenario to analyze performance issues. Through this case study, you will see various performance issues at the level of basic hardware infrastructure, such as performance issues caused by virtual machine oversubscription, performance issues under CPU running mode, high IO, and hardware resource exhaustion resulting in low TPS, and so on.

If you are starting a complete project from scratch, these issues are likely to be the first ones you will face. And solving them is an essential skill for performance analysts. In addition, you will also see that the analysis chain is different for different counters of collected data. This analysis chain is what I have always emphasized as the evidence chain. If you are not clear about it, you can review it in Lesson 3.

Through this lesson, I hope you will understand that some performance issues are not so singular, and no matter where the performance issue occurs, we must deal with it.

Alright, without further ado, let’s dive deep into the performance bottlenecks of the “open homepage” interface.

Viewing the Architecture Diagram #

Before analyzing performance bottlenecks, I always create a diagram like this to see which services and technology components are involved in this interface. This will greatly help us in the subsequent performance analysis.

If you have a tool that can directly display this, that would be even better. If not, I suggest not to be overconfident in thinking that you can remember a simple architecture. Trust me, even a simple sketch on paper can greatly help in guiding your analysis.

Referring back to the above diagram, we can clearly see that the logic for opening the homepage is: User - Gateway (Redis) - Portal - (Redis, MySQL).

Take a look at the logic of the code #

Before doing the benchmark scenario of opening the homepage, I suggest you take a look at the implementation logic of this interface’s code. By examining the code, you can see what actions this interface is performing. Based on these actions, we can analyze their subsequent links.

The logic of this code is very simple. It lists various pieces of information on the homepage and returns a JSON.

public HomeContentResult contentnew() {
    HomeContentResult result = new HomeContentResult();
    if (redisService.get("HomeContent") == null) {
        // Homepage advertisement
        result.setAdvertiseList(getHomeAdvertiseList());
        // Recommended brands
        result.setBrandList(homeDao.getRecommendBrandList(0, 6));
        // Flash sale information
        result.setHomeFlashPromotion(getHomeFlashPromotion());
        // Recommended new products
        result.setNewProductList(homeDao.getNewProductList(0, 4));
        // Recommended popular products
        result.setHotProductList(homeDao.getHotProductList(0, 4));
        // Recommended topics
        result.setSubjectList(homeDao.getRecommendSubjectList(0, 4));
        redisService.set("HomeContent", result);
    }
    Object homeContent = redisService.get("HomeContent");
    result = JSONUtil.toBean(JSONUtil.toJsonPrettyStr(homeContent), HomeContentResult.class);

    return result;
}

We can see that a total of six methods are called here, and these methods directly perform queries in the database. That’s all.

Determining stress data #

After understanding the logic of the code, let’s try running 10 threads to see the trend of TPS during the incrementing process of each thread.

After running, we obtained the following results:

From the results, it can be seen that at the beginning, one thread generates around 40 TPS. Now we need to think about how to set the number of threads, increment strategy, and continuous execution strategy in the stress tool if we want to execute a scenario that can achieve the maximum TPS for the “Open Homepage” interface.

To begin with, let’s take a look at the hardware usage of the machine hosting the Portal application node to understand the relationship between TPS trend and resource usage. The machine configuration for the current Portal node is shown in the following image (note that I skipped the node where the Gateway is located):

As we can see, the machine hosting the current Portal node is 8C16G (a virtual machine), and this machine is basically not under much stress.

Now, let’s not consider other resources for the time being and only focus on the configuration of 8C16G. If TPS increases linearly, when the CPU usage of the machine reaches 100%, the TPS will be approximately 800. Therefore, the number of threads in the stress tool should be set to:

\[ \text{{number of threads}} = \frac{{800 \text{{ TPS}}}}{{40 \text{{ TPS}}}} = 20 \text{{ threads}} \]

However, achieving a proportional relationship between TPS and resource usage during the stress process may not be possible. This is because during the stress process, the consumption of various resources will increase the response time to some extent, which is considered normal response time overhead.

After determining the number of threads in the stress tool, let’s consider how to set the increment strategy.

I would like the increment time to be slower so that we can observe the performance data of each stage. According to the performance analysis decision tree in Lesson 2, in such scenarios, we have many counters that need to be analyzed and viewed. Therefore, I set the interval between each thread increment to 30 seconds, which means the increment cycle is 600 seconds.

After determining the stress parameters, we can set the trial run scenario in JMeter with the following values:

<stringProp name="ThreadGroup.num_threads">20</stringProp>
<stringProp name="ThreadGroup.ramp_time">600</stringProp>
<boolProp name="ThreadGroup.scheduler">true</boolProp>
<stringProp name="ThreadGroup.duration">700</stringProp>

After setting the trial run parameters, we can further increase the number of threads to achieve maximized resource utilization in this scenario.

You may wonder: shouldn’t we use even more threads? If you want to conduct a normal scenario, you don’t really need to use more threads. However, if you just want to see what happens when you add more stress threads, you can give it a try. When I execute performance scenarios, I often try various ways of applying stress.

That being said, there is indeed a situation where we need to increase more stress seriously. That is when the response time has increased, but not by much, and the TPS is no longer increasing. In such cases, it is difficult for us to break down the response time, especially when some systems respond quickly, with response times in the range of only a few milliseconds. Therefore, in such situations, we need to increase more threads to make the slow parts of the response time more distinct, which makes it easier to break down the time.

By setting the stress scenario increment (previously it was calculated that only 20 threads are needed to reach the maximum value, but here I set the stress threads to 100 to start the scenario, in order to observe the TPS trend and increase in response time under greater stress, which makes it easier to breakdown time), we can see that the response time of this interface is indeed gradually increasing, and with the increase in the number of threads, the response time quickly rises to several hundred milliseconds. This is a clear bottleneck that we naturally cannot accept.

Next, we need to analyze where this response time is being consumed.

Time Breakdown #

As mentioned earlier, the logic for opening the homepage is: User - Gateway (Redis) - Portal - (Redis,MySQL). Let’s break down the response time with the help of the link monitoring tool SkyWalking.

Time consumed between User and Gateway

We can see that the time consumed between User and Gateway gradually rises to around 150 milliseconds.

Gateway response time

The Gateway also consumes around 150 milliseconds, indicating that there is not much time consumption between User and Gateway in terms of networking, in milliseconds.

Time consumed between Gateway and Portal

On the Portal, the response time only takes around 50 milliseconds. Let’s take a look at the Portal.

Portal response time

The response time of the Portal is around 50 milliseconds, consistent with the time we saw above.

By breaking down the response time as described above, we can determine that the Gateway consumes the response time, and this time reaches nearly 100 milliseconds. Therefore, our next focus will be on locating the Gateway.

Analyzing response time consumption on Gateway #

Phase 1: Analyzing st cpu #

Since the response time consumption on the Gateway is high, we naturally want to find out where the time is being spent on this host.

Our analysis logic is still to start with global monitoring and then move on to targeted monitoring. When looking at global monitoring, we should start from the basic level, and in the analysis process, the most basic level is the operating system.

Through the top command, we can see the resource situation on the Gateway node, as shown below:

Among them, the st cpu has reached about 15%. We know that st cpu refers to the CPU taken away from the virtual machine by other applications or virtual machines on the host, and such a high value is obviously not normal. Therefore, we need to further investigate the cause of the abnormal st cpu.

We use the mpstat command to first see the resource performance on the host machine (the physical machine where the virtual machine running the Gateway is located):

It can be seen that 20% of the CPU is still unused, indicating that there is still space on the host machine. However, the CPU usage on the host machine is already quite high, and the only thing consuming these host machines is the applications inside the virtual machine. So, we need to check if the CPU consumption of a certain virtual machine is particularly high. The KVM list on the host machine is as follows:

[root@dell-server-3 ~]# virsh list --all
Id    名称                         状态
----------------------------------------------------
12    vm-jmeter                      running
13    vm-k8s-worker-8                running
14    vm-k8s-worker-7                running
15    vm-k8s-worker-9                running

[root@dell-server-3 ~]#

It can be seen that four virtual machines are running on this host machine, so let’s take a closer look at the resource consumption of these four virtual machines.

vm-jmeter

top - 23:42:49 up 28 days,  8:14,  6 users,  load average: 0.61, 0.48, 0.38
Tasks: 220 total,   1 running, 218 sleeping,   1 stopped,   0 zombie
%Cpu0  :  6.6 us,  3.5 sy,  0.0 ni, 88.5 id,  0.0 wa,  0.0 hi,  0.0 si,  1.4 st
%Cpu1  :  6.5 us,  1.8 sy,  0.0 ni, 88.2 id,  0.0 wa,  0.0 hi,  0.4 si,  3.2 st

```
KiB Mem :  3880180 total,   920804 free,  1506128 used,  1453248 buff/cache
KiB Swap:  2097148 total,  1256572 free,   840576 used.  2097412 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                  
 7157 root      20   0 3699292 781204  17584 S  27.8 20.1   1:09.44 java                                                                                                                     
    9 root      20   0       0      0      0 S   0.3  0.0  30:25.77 rcu_sched                                                                                                                
  376 root      20   0       0      0      0 S   0.3  0.0  16:40.44 xfsaild/dm-

vm-k8s-worker-8


top - 23:43:47 up 5 days, 22:28,  3 users,  load average: 9.21, 6.45, 5.74
Tasks: 326 total,   1 running, 325 sleeping,   0 stopped,   0 zombie
%Cpu0  : 20.2 us,  3.7 sy,  0.0 ni, 60.7 id,  0.0 wa,  0.0 hi,  2.9 si, 12.5 st
%Cpu1  : 27.3 us,  7.4 sy,  0.0 ni, 50.2 id,  0.0 wa,  0.0 hi,  3.7 si, 11.4 st
%Cpu2  : 29.9 us,  5.6 sy,  0.0 ni, 48.5 id,  0.0 wa,  0.0 hi,  4.9 si, 11.2 st
%Cpu3  : 31.2 us,  5.6 sy,  0.0 ni, 47.6 id,  0.0 wa,  0.0 hi,  4.5 si, 11.2 st
%Cpu4  : 25.6 us,  4.3 sy,  0.0 ni, 52.7 id,  0.0 wa,  0.0 hi,  3.6 si, 13.7 st
%Cpu5  : 26.0 us,  5.2 sy,  0.0 ni, 53.5 id,  0.0 wa,  0.0 hi,  4.1 si, 11.2 st
%Cpu6  : 19.9 us,  6.2 sy,  0.0 ni, 57.6 id,  0.0 wa,  0.0 hi,  3.6 si, 12.7 st
%Cpu7  : 27.3 us,  5.0 sy,  0.0 ni, 53.8 id,  0.0 wa,  0.0 hi,  2.3 si, 11.5 st
KiB Mem : 16265688 total,  6772084 free,  4437840 used,  5055764 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 11452900 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                  
13049 root      20   0 9853712 593464  15752 S 288.4  3.6  67:24.22 java                                                                                                                     
 1116 root      20   0 2469728  57932  16188 S  12.6  0.4 818:40.25 containerd                                                                                                               
 1113 root      20   0 3496336 118048  38048 S  12.3  0.7 692:30.79 kubelet                                                                                                                  
 4961 root      20   0 1780136  40700  17864 S  12.3  0.3 205:51.15 calico-node                                                                                                              
 3830 root      20   0 2170204 114920  33304 S  11.6  0.7 508:00.00 scope                                                                                                                    
 1118 root      20   0 1548060 111768  29336 S  11.3  0.7 685:27.95 dockerd                                                                                                                  
 8216 techstar  20   0 2747240 907080 114836 S   5.0  5.6   1643:33 prometheus                                                                                                               
21002 root      20   0 9898708 637616  17316 S   3.3  3.9 718:56.99 java                                                                                                                     
 1070 root      20   0 9806964 476716  15756 S   2.0  2.9 137:13.47 java                                                                                                                     
11492 root      20   0  441996  33204   4236 S   1.3  0.2  38:10.49 gvfs-udisks2-vo

vm-k8s-worker-7

top - 23:44:22 up 5 days, 22:26,  3 users,  load average: 2.50, 1.67, 1.13
Tasks: 308 total,   1 running, 307 sleeping,   0 stopped,   0 zombie
%Cpu0  :  4.2 us,  3.5 sy,  0.0 ni, 82.3 id,  0.0 wa,  0.0 hi,  1.7 si,  8.3 st
%Cpu1  :  6.2 us,  2.7 sy,  0.0 ni, 82.8 id,  0.0 wa,  0.0 hi,  1.4 si,  6.9 st
%Cpu2  :  5.2 us,  2.8 sy,  0.0 ni, 84.0 id,  0.0 wa,  0.0 hi,  1.0 si,  6.9 st
%Cpu3  :  4.5 us,  3.8 sy,  0.0 ni, 81.2 id,  0.0 wa,  0.0 hi,  1.4 si,  9.2 st
%Cpu4  :  4.4 us,  2.4 sy,  0.0 ni, 83.3 id,  0.0 wa,  0.0 hi,  1.4 si,  8.5 st
%Cpu5  :  5.5 us,  2.4 sy,  0.0 ni, 84.5 id,  0.0 wa,  0.0 hi,  1.0 si,  6.6 st
%Cpu6  :  3.7 us,  2.7 sy,  0.0 ni, 85.6 id,  0.0 wa,  0.0 hi,  0.7 si,  7.4 st
%Cpu7  :  3.1 us,  1.7 sy,  0.0 ni, 84.7 id,  0.0 wa,  0.0 hi,  1.4 si,  9.0 st
KiB Mem : 16265688 total,  8715820 free,  3848432 used,  3701436 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 12019164 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                  
18592 27        20   0 4588208 271564  12196 S  66.9  1.7 154:58.93 mysqld                                                                                                                   
 1109 root      20   0 2381424 105512  37208 S   9.6  0.6 514:18.00 kubelet                                                                                                                  
 1113 root      20   0 1928952  55556  16024 S   8.9  0.3 567:43.53 containerd                                                                                                               
 1114 root      20   0 1268692 105212  29644 S   8.6  0.6 516:43.38 dockerd                                                                                                                  
 3122 root      20   0 2169692 117212  33416 S   7.0  0.7 408:21.79 scope                                                                                                                    
 4132 root      20   0 1780136  43188  17952 S   6.0  0.3 193:27.58 calico-node                                                                                                              
 3203 nfsnobo+  20   0  116748  19720   5864 S   2.0  0.1  42:43.57 node_exporter                                                                                                            
12089 techstar  20   0 5666480   1.3g  23084 S   1.3  8.5  78:04.61 java                                                                                                                     
 5727 root      20   0  449428  38616   4236 S   1.0  0.2  49:02.98 gvfs-udisks2-vo

vm-k8s-worker-9

 top - 23:45:23 up 5 days, 22:21,  4 users,  load average: 12.51, 10.28, 9.19
Tasks: 333 total,   4 running, 329 sleeping,   0 stopped,   0 zombie
%Cpu0  : 20.1 us,  7.5 sy,  0.0 ni, 43.3 id,  0.0 wa,  0.0 hi, 13.4 si, 15.7 st
%Cpu1  : 20.1 us, 11.2 sy,  0.0 ni, 41.4 id,  0.0 wa,  0.0 hi, 11.9 si, 15.3 st
%Cpu2  : 23.8 us, 10.0 sy,  0.0 ni, 35.4 id,  0.0 wa,  0.0 hi, 14.2 si, 16.5 st
%Cpu3  : 15.1 us,  7.7 sy,  0.0 ni, 49.1 id,  0.0 wa,  0.0 hi, 12.2 si, 15.9 st
%Cpu4  : 22.8 us,  6.9 sy,  0.0 ni, 40.5 id,  0.0 wa,  0.0 hi, 14.7 si, 15.1 st
%Cpu5  : 17.5 us,  5.8 sy,  0.0 ni, 50.0 id,  0.0 wa,  0.0 hi, 10.6 si, 16.1 st
%Cpu6  : 22.0 us,  6.6 sy,  0.0 ni, 45.1 id,  0.0 wa,  0.0 hi, 11.0 si, 15.4 st
%Cpu7  : 19.2 us,  8.0 sy,  0.0 ni, 44.9 id,  0.0 wa,  0.0 hi,  9.8 si, 18.1 st
KiB Mem : 16265688 total,  2567932 free,  7138952 used,  6558804 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  8736000 avail Mem

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                  
24122 root      20   0 9890064 612108  16880 S 201.0  3.8   1905:11 java                                                                                                                     
2794 root      20   0 2307652 161224  33464 S  57.7  1.0   1065:54 scope                                                                                                                    
1113 root      20   0 2607908  60552  15484 S  13.8  0.4   1008:04 containerd                                                                                                               
1109 root      20   0 2291748 110768  39140 S  12.8  0.7 722:41.17 kubelet                                                                                                                  
1114 root      20   0 1285500 108664  30112 S  11.1  0.7 826:56.51 dockerd                                                                                                                  
29 root      20   0       0      0      0 S   8.9  0.0  32:09.89 ksoftirqd/4                                                                                                              
6 root      20   0       0      0      0 S   8.2  0.0  41:28.14 ksoftirqd/0                                                                                                              
24 root      20   0       0      0      0 R   8.2  0.0  41:00.46 ksoftirqd/3                                                                                                              
39 root      20   0       0      0      0 R   8.2  0.0  41:08.18 ksoftirqd/6                                                                                                              
19 root      20   0       0      0      0 S   7.9  0.0  39:10.22 ksoftirqd/2                                                                                                              
14 root      20   0       0      0      0 S   6.2  0.0  40:58.25 ksoftirqd/1

Clearly, the si (CPU usage from servicing interrupts) and st (CPU stolen from this VM by the hypervisor) for worker-9 are not low. This is strange because the virtual machine itself does not have a high CPU utilization. Why is st still high? Does this mean that the CPU can only be utilized to this extent?

Let’s continue our investigation.

Stage 2: Check the CPU operating mode of the physical machine #

In this stage, we need to check if there are any blocks in the service. As mentioned earlier, we need to consider from a global monitoring perspective to determine if the performance analysis counters we are looking at are complete in order to avoid judgment bias. However, when I checked the specific content of the thread stack, I did not find anything like Blocked, so we can only go back to the configuration of the physical machine.

So, what can we see for the physical machine CPU? Even if you spend a long time thinking from bottom to top and go through all the logic, you can’t figure out where the block is. So, we can only look at the CPU operating mode of the host machine.

– Physical Machine 1

[root@hp-server ~]# cpupower frequency-info
analyzing CPU 0:
driver: pcc-cpufreq
CPUs which run at the same hardware frequency: 0
CPUs which need to have their frequency coordinated by software: 0
maximum transition latency:  Cannot determine or is not supported.
hardware limits: 1.20 GHz - 2.10 GHz
available cpufreq governors: conservative userspace powersave ondemand performance
current policy: frequency should be within 1.20 GHz and 2.10 GHz.
The governor "conservative" may decide which speed to use
within this range.
current CPU frequency: 1.55 GHz (asserted by call to hardware)
boost state support:
Supported: yes
Active: yes

– Physical Machine 2

[root@dell-server-2 ~]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

powersave
[root@dell-server-2 ~]# cpupower frequency-info
analyzing CPU 0:
  driver: intel_pstate
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency:  Cannot determine or is not supported.
  hardware limits: 1.20 GHz - 2.20 GHz
  available cpufreq governors: performance powersave
  current policy: frequency should be within 1.20 GHz and 2.20 GHz.
                  The governor "powersave" may decide which speed to use
                  within this range.
  current CPU frequency: 2.20 GHz (asserted by call to hardware)
  boost state support:
    Supported: no
    Active: no
    2200 MHz max turbo 4 active cores
    2200 MHz max turbo 3 active cores
    2200 MHz max turbo 2 active cores
    2200 MHz max turbo 1 active cores

-- Physical Machine 3
[root@dell-server-3 ~]# cpupower frequency-info
analyzing CPU 0:
  driver: intel_pstate
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency:  Cannot determine or is not supported.
  hardware limits: 1.20 GHz - 2.20 GHz
  available cpufreq governors: performance powersave
  current policy: frequency should be within 1.20 GHz and 2.20 GHz.
                  The governor "powersave" may decide which speed to use
                  within this range.
  current CPU frequency: 2.20 GHz (asserted by call to hardware)
  boost state support:
    Supported: no
    Active: no

-- Physical Machine 4
[root@lenvo-nfs-server ~]# cpupower frequency-info
analyzing CPU 0:
  driver: acpi-cpufreq
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency: 10.0 us
  hardware limits: 2.00 GHz - 2.83 GHz
  available frequency steps:  2.83 GHz, 2.00 GHz
  available cpufreq governors: conservative userspace powersave ondemand performance
  current policy: frequency should be within 2.00 GHz and 2.83 GHz.
                  The governor "conservative" may decide which speed to use
                  within this range.
  current CPU frequency: 2.00 GHz (asserted by call to hardware)
  boost state support:
    Supported: no
    Active: no

It can be seen that none of the physical machines are running in performance mode.

Here, we need to understand the CPU operating modes:

![CPU Operating Modes](../images/1fdf7de4851c43439626712155a225ba.jpg)

Since we are performance analysts, we naturally want to use the performance mode, so we modify the CPU mode as follows:

-- Physical Machine 1
[root@hp-server ~]# cpupower -c all frequency-set -g performance
Setting cpu: 0
Setting cpu: 1
Setting cpu: 2
Setting cpu: 3
Setting cpu: 4
Setting cpu: 5
Setting cpu: 6
Setting cpu: 7
Setting cpu: 8
Setting cpu: 9
Setting cpu: 10
Setting cpu: 11
Setting cpu: 12
Setting cpu: 13
Setting cpu: 14
Setting cpu: 15
Setting cpu: 16
Setting cpu: 17
Setting cpu: 18
Setting cpu: 19
Setting cpu: 20
Setting cpu: 21
Setting cpu: 22
Setting cpu: 23
Setting cpu: 24
Setting cpu: 25
Setting cpu: 26
Setting cpu: 27
Setting cpu: 28
Setting cpu: 29
Setting cpu: 30
Setting cpu: 31
[root@hp-server ~]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance
[root@hp-server ~]#

-- Physical Machine 2
[root@dell-server-2 ~]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
powersave
[root@dell-server-2 ~]# cpupower -c all frequency-set -g performance
Setting cpu: 0
Setting cpu: 1
Setting cpu: 2
Setting cpu: 3
Setting cpu: 4
Setting cpu: 5
Setting cpu: 6
Setting cpu: 7
Setting cpu: 8
Setting cpu: 9
Setting cpu: 10
Setting cpu: 11
Setting cpu: 12
Setting cpu: 13
Setting cpu: 14
Setting cpu: 15
[root@dell-server-2 ~]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance
[root@dell-server-2 ~]#

-- Physical Machine 3
[root@dell-server-3 ~]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
powersave
[root@dell-server-3 ~]#  cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
powersave
[root@dell-server-3 ~]# cpupower -c all frequency-set -g performance
Setting cpu: 0
Setting cpu: 1
Setting cpu: 2
Setting cpu: 3
Setting cpu: 4
Setting cpu: 5
Setting cpu: 6
Setting cpu: 7
Setting cpu: 8
Setting cpu: 9
Setting cpu: 10
Setting cpu: 11
Setting cpu: 12
Setting cpu: 13
Setting cpu: 14
Setting cpu: 15
[root@dell-server-3 ~]#  cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance
[root@dell-server-3 ~]#

-- Physical Machine 4
[root@lenvo-nfs-server ~]# cpupower -c all frequency-set -g performance
Setting cpu: 0
Setting cpu: 1
Setting cpu: 2
Setting cpu: 3
[root@lenvo-nfs-server ~]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance
[root@lenvo-nfs-server ~]#

After performing the above actions vigorously, how does the performance change?

The result is that the performance did not improve. I will not include a screenshot here because the chart is the same as the one shown at the beginning.

Here we need to understand that the analysis process above not only covers this issue, but also other resource usage bottlenecks that we have not identified. Unfortunately, we can only continue with the investigation.

Summary #

In this lesson, we identified the bottleneck by examining the curves in the stress testing tool, and then used SkyWalking to analyze the response time.

Once we determined the points of response time consumption, we began the analysis in two stages. The evidence chain in the first stage starts from the phenomenon and proceeds downwards. Since st cpu refers to the consumption of CPU resources by other applications on the host machine, we checked other virtual machines on the host machine. Here, we need to clarify to what extent CPU resources should be used. After discovering unreasonable resource usage, we proceeded to the second stage of analysis.

In the second stage, we analyzed the CPU running mode. In a physical machine, if we don’t actively limit it, CPU consumption has no default restriction, so we checked the CPU running mode.

However, even after analyzing and attempting to solve the above problems, the TPS (transactions per second) did not change significantly. This indicates that although we optimized the counting analysis logic, there are still issues with the system. It can only be said that our current optimization measures only address the shortest board in the wooden barrel, but we have not yet found other short boards.

Please note that this does not mean that the analysis and optimization process in this lesson was meaningless. You should know that without solving these problems, the next problem will not appear. So, the analysis and optimization process in this lesson is also very valuable.

In the next lesson, we will continue to look for performance bottlenecks in the home page interface.

Homework #

Lastly, please take some time to reflect on the following questions:

Why do we check other virtual machines on the host machine when we see high st(cpu) in the virtual machine? What judgments should we make if we see high st(cpu) on the host machine?
What is the logic of CPU operation when it is in powersave mode?

Remember to discuss and exchange your thoughts with me in the comments section. Every thought you have will help you progress further.

If you found this lesson helpful, feel free to share it with your friends and let’s learn and progress together. See you in the next lecture!