20 Generating Order Information Second How to Perform Performance Optimization on Complex Business Logic

20 Generating order information second how to perform performance optimization on complex business logic #

Hello, I’m Gao Lou.

In the previous lesson, we analyzed and optimized the interface for generating order information in three stages, which improved the TPS to some extent. However, the system resources have not been fully utilized, indicating that there is still room for optimization in this interface. As Professor Gao has repeatedly mentioned, in the process of performance optimization, we should fully utilize all resources available.

Regarding the theory of “fully utilizing resources”, what I want you to understand is that when optimizing in a performance environment, the purpose of fully utilizing resources is to determine the maximum capacity of the system. However, this does not mean that you can push hardware utilization to such a degree in production environments.

For an unpredictable system capacity, high resource utilization may lead to various issues. Therefore, for safety reasons, resource utilization in many production environments is kept very low. If usage goes beyond 20%, the operations team would be panicked.

In order to obtain clear and applicable conclusions for configuring production environments, we need to analyze the business capacity in production first. Based on this data, we can determine how much hardware resources should be used when the production business capacity reaches its peak.

However, in our optimization environment, we can assess the software’s capacity by fully utilizing the system. Therefore, we will continue from the previous lesson and proceed to the fourth stage. You will see how we optimize in situations with complex business logic.

Without further ado, let’s get started.

Phase 4 #

After solving the first three unprofessional problems, we can now analyze where the time is being consumed. We need to solve the issue of efficiency before we can address the problem of wasted resources. So, let’s start by splitting the response time. Similarly, we will not perform global monitoring analysis because… I’m tired.

Splitting the Response Time #

Previously, we used APM to split the response time many times, but it didn’t feel innovative. This time, I will split the time using logs.

  • Gateway:
  
  10.100.79.93 - - [04/Feb/2021:00:13:17 +0800] "POST /mall-order/order/generateOrder HTTP/1.1" 200 726 8201 151 ms
  10.100.79.93 - - [04/Feb/2021:00:13:17 +0800] "POST /mall-order/order/generateOrder HTTP/1.1" 200 726 8201 147 ms
  10.100.79.93 - - [04/Feb/2021:00:13:17 +0800] "POST /mall-order/order/generateOrder HTTP/1.1" 200 726 8201 141 ms
  10.100.79.93 - - [04/Feb/2021:00:13:17 +0800] "POST /mall-order/order/generateOrder HTTP/1.1" 200 726 8201 122 ms
  10.100.79.93 - - [04/Feb/2021:00:13:17 +0800] "POST /mall-order/order/generateOrder HTTP/1.1" 200 726 8201 125 ms
  10.100.79.93 - - [04/Feb/2021:00:13:17 +0800] "POST /mall-order/order/generateOrder HTTP/1.1" 200 726 8201 150 ms
  10.100.79.93 - - [04/Feb/2021:00:13:17 +0800] "POST /mall-order/order/generateOrder HTTP/1.1" 200 726 8201 177 ms
  
  • Order:
  
  10.100.79.106 - - [04/Feb/2021:00:13:31 +0800] "POST /order/generateOrder HTTP/1.1" 200 738 "-" "Apache-HttpClient/4.5.12 (Java/1.8.0_261)" 72 ms 72 ms
  10.100.79.106 - - [04/Feb/2021:00:13:31 +0800] "POST /order/generateOrder HTTP/1.1" 200 738 "-" "Apache-HttpClient/4.5.12 (Java/1.8.0_261)" 94 ms 93 ms
  10.100.79.106 - - [04/Feb/2021:00:13:31 +0800] "POST /order/generateOrder HTTP/1.1" 200 738 "-" "Apache-HttpClient/4.5.12 (Java/1.8.0_261)" 76 ms 76 ms
  10.100.79.106 - - [04/Feb/2021:00:13:31 +0800] "POST /order/generateOrder HTTP/1.1" 200 738 "-" "Apache-HttpClient/4.5.12 (Java/1.8.0_261)" 95 ms 95 ms
  10.100.79.106 - - [04/Feb/2021:00:13:31 +0800] "POST /order/generateOrder HTTP/1.1" 200 738 "-" "Apache-HttpClient/4.5.12 (Java/1.8.0_261)" 90 ms 90 ms
  

We do not need to look at the subsequent services for now because they go directly to the database from this interface. Let’s first see if there are any issues with the application itself.

To make it easier for you to see, I have only included a portion of the data here, but it does not mean that we only need to look at these. In the project, you can use scripts or other methods to perform response time statistics.

From the above information, we can see that the response time for this interface is about 150 ms, while it consumes 90 milliseconds on the Order service. So, below we need to analyze why it takes so long on the Order service.

Targeted Monitoring and Analysis #

To understand the time consumption of the Order service, obviously, we need to know what actions the threads in the Order application are performing. So, let’s start by analyzing the stack of the Order service.

Using Spring Boot Admin, we can view the overall state of the threads:

You see, the threads are indeed quite busy. As for what these threads are doing, we can determine the optimization direction by examining the content of the stack and then further confirming.

However, since the system resources have not yet reached their limit, we need to adjust the thread count of Tomcat first, increasing it a bit, in order to make the Order application utilize the hardware resources.

Original value:
max: 20
Revised value:
max: 100

Let’s take a look at the results after the adjustment:

Unexpectedly, the performance has worsened… This messy curve is completely different from the elegant curve imagined!

It turns out that taking shortcuts won’t allow us to avoid pitfalls. We can only look again at where the response time is being consumed.

So, by splitting the response time using the logs of various services, I found such logs in the Member service (to make it easier for you to see, I have extracted some of the logs with significant time consumption, please pay attention, this is the result after we changed the Tomcat thread pool of Order):

10.100.69.248 - - [04/Feb/2021:00:37:15 +0800] "GET /sso/feign/info HTTP/1.1" 200 814 "-" "okhttp/3.14.8" 2348 ms 2348 ms
10.100.69.248 - - [04/Feb/2021:00:37:17 +0800] "GET /sso/feign/info HTTP/1.1" 200 816 "-" "okhttp/3.14.8" 4155 ms 4155 ms
10.100.69.248 - - [04/Feb/2021:00:37:17 +0800] "GET /sso/feign/info HTTP/1.1" 200 817 "-" "okhttp/3.14.8" 4968 ms 1813 ms
10.100.69.248 - - [04/Feb/2021:00:37:15 +0800] "GET /sso/feign/info HTTP/1.1" 200 810 "-" "okhttp/3.14.8" 2333 ms 2333 ms
10.100.69.248 - - [04/Feb/2021:00:37:17 +0800] "GET /sso/feign/info HTTP/1.1" 200 815 "-" "okhttp/3.14.8" 5206 ms 4970 ms
10.100.69.248 - - [04/Feb/2021:00:37:20 +0800] "GET /sso/feign/info HTTP/1.1" 200 818 "-" "okhttp/3.14.8" 6362 ms 6362 ms
10.100.69.248 - - [04/Feb/2021:00:37:20 +0800] "GET /sso/feign/info HTTP/1.1" 200 818 "-" "okhttp/3.14.8" 6710 ms 6710 ms
10.100.69.248 - - [04/Feb/2021:00:37:20 +0800] "GET /sso/feign/info HTTP/1.1" 200 817 "-" "okhttp/3.14.8" 6696 ms 6587 ms
10.100.69.248 - - [04/Feb/2021:00:37:21 +0800] "GET /sso/feign/info HTTP/1.1" 200 813 "-" "okhttp/3.14.8" 7987 ms 7976 ms
10.100.69.248 - - [04/Feb/2021:00:37:22 +0800] "GET /sso/feign/info HTTP/1.1" 200 814 "-" "okhttp/3.14.8" 8784 ms 8784 ms
10.100.69.248 - - [04/Feb/2021:00:37:22 +0800] "GET /sso/feign/info HTTP/1.1" 200 817 "-" "okhttp/3.14.8" 9100 ms 8764 ms
10.100.69.248 - - [04/Feb/2021:00:37:22 +0800] "GET /sso/feign/info HTTP/1.1" 200 834 "-" "okhttp/3.14.8" 9126 ms 9013 ms
10.100.69.248 - - [04/Feb/2021:00:37:22 +0800] "GET /sso/feign/info HTTP/1.1" 200 817 "-" "okhttp/3.14.8" 9058 ms 9058 ms
10.100.69.248 - - [04/Feb/2021:00:37:23 +0800] "GET /sso/feign/info HTTP/1.1" 200 820 "-" "okhttp/3.14.8" 9056 ms 9056 ms

Clearly, the response time of this Member service is too long. And in the interface for generating order information, the Member service is indeed called, because we need to use the Token. Since the longer response time of the Order application is due to the slow response of the Member service caused by increasing the Tomcat thread pool of Order, we have reason to conclude that the performance of Member is poor.

If we want to analyze the poor performance of Member, we can directly print out the stack information on Member to see. This is a lazy approach Professor Gao sometimes takes.

However, as we have been discussing, the complete analysis logic should be to first look at the global monitoring data, and then look at the targeted monitoring data. So, Professor Gao needs to be diligent here. Let’s use the global monitoring data to examine the overall resource consumption:

Worker-8’s CPU resources are actually being used so heavily! This indicates that increasing the Tomcat thread count for Order was valuable. Now, the bottleneck has shifted to another place, which is our Member service.

Since the resource usage of Worker-8 is high, let’s see what Pods are on it. It’s not hard to see that Member is on worker-8:

[root@k8s-master-2 ~]# kubectl get pods -o wide | grep k8s-worker-8
elasticsearch-client-0                      1/1     Running     0          38h   10.100.231.233   k8s-worker-8   <none>           <none>
monitor-mall-monitor-d8bb58fcb-kfbcj        1/1     Running     0          23d   10.100.231.242   k8s-worker-8   <none>           <none>
skywalking-oap-855f96b777-5nxll             1/1     Running     6          37h   10.100.231.235   k8s-worker-8   <none>           <none>
skywalking-oap-855f96b777-6b7jd             1/1     Running     5          37h   10.100.231.234   k8s-worker-8   <none>           <none>
svc-mall-admin-75ff7dcc9b-8gtr5             1/1     Running     0          17d   10.100.231.208   k8s-worker-8   <none>           <none>
svc-mall-demo-5584dbdc96-fskg9              1/1     Running     0          17d   10.100.231.207   k8s-worker-8   <none>           <none>
svc-mall-member-5fc984b57c-bk2fd            1/1     Running     0          12d   10.100.231.231   k8s-worker-8   <none>           <none>
[root@k8s-master-2 ~]# 

Meanwhile, we can see that there are quite a few services running on this node, and all of these services are CPU-intensive. During the stress process, there were instances when the sy CPU consumption was very high, as well as instances when the us CPU consumption was high. I will show you the data for both scenarios:

  • High sys CPU usage:
[root@k8s-worker-8 ~]# top
top - 00:38:51 up 28 days,  4:27,  3 users,  load average: 78.07, 62.23, 39.14
Tasks: 275 total,  17 running, 257 sleeping,   1 stopped,   0 zombie
%Cpu0  :  4.2 us, 95.4 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.4 st
%Cpu1  :  1.8 us, 98.2 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  2.1 us, 97.9 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  1.0 us, 99.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 16266296 total,  1819300 free,  7642004 used,  6804992 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  8086580 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                 
12902 root      20   0 1410452  32280  17744 S  48.1  0.2 751:39.59 calico-node -felix                                                                                      
    9 root      20   0       0      0      0 R  34.8  0.0 131:14.01 [rcu_sched]                                                                                             
 3668 techstar  20   0 4816688   1.3g  23056 S  33.9  8.5 111:17.12 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress+
26105 root      20   0  119604   6344   2704 R  25.8  0.0   0:02.36 runc --root /var/run/docker/runtime-runc/moby --log /run/containerd/io.containerd.runtime.v1.linux/moby+
26163 root      20   0   19368    880    636 R  25.2  0.0   0:00.95 iptables-legacy-save -t nat                                                                             
26150 root      20   0   18740   3136   1684 R  21.6  0.0   0:01.18 runc init                                                                                               
26086 root      20   0   18744   5756   2376 R  20.3  0.0   0:03.10 runc --root /var/run/docker/runtime-runc/moby --log /run/containerd/io.containerd.runtime.v1.linux/moby+
  410 root      20   0       0      0      0 S  19.4  0.0  42:42.56 [xfsaild/dm-1]                                                                                          
   14 root      20   0       0      0      0 S  14.8  0.0  54:28.76 [ksoftirqd/1]                                                                                           
    6 root      20   0       0      0      0 S  14.2  0.0  50:58.94 [ksoftirqd/0]                                                                                           
26158 root      20   0   18740   1548    936 R  14.2  0.0   0:00.90 runc --version                                                                                          
31715 nfsnobo+  20   0  129972  19856   9564 S  11.3  0.1  12:41.98 ./kube-rbac-proxy --logtostderr --secure-listen-address=[172.16.106.56]:9100 --tls-cipher-suites=TLS_EC+
10296 root      20   0 3402116 113200  39320 S  10.3  0.7   2936:50 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubern+
   22 root      rt   0       0      0      0 S   8.7  0.0   3:18.08 [watchdog/3]                                                                                            
26162 root      20   0  139592   2792   2508 R   8.4  0.0   0:00.39 /opt/cni/bin/calico                                                                                     
 6843 root      20   0  965824 110244  30364 S   7.7  0.7   1544:20 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock                                  
   24 root      20   0       0      0      0 S   7.4  0.0  49:03.89 [ksoftirqd/3]                                                                                           
 3636 techstar  20   0    4368    364    280 S   6.8  0.0   0:12.19 /tini -- /usr/local/bin/docker-entrypoint.sh eswrapper                                                  
26159 root      20   0   18740    760    552 R   6.5  0.0   0:00.28 runc --version                                                                                          
 1755 root      20   0  411108   5836   4416 S   4.8  0.0  35:39.97 /usr/libexec/packagekitd                       

The above content shows the memory usage and process information in the system. Here is the translation of the content:

KiB Mem : 16266296 total,  1819300 free,  7642004 used,  6804992 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  8086580 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                 
12902 root      20   0 1410452  32280  17744 S  48.1  0.2 751:39.59 calico-node -felix                                                                                      
    9 root      20   0       0      0      0 R  34.8  0.0 131:14.01 [rcu_sched]                                                                                             
 3668 techstar  20   0 4816688   1.3g  23056 S  33.9  8.5 111:17.12 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress+
26105 root      20   0  119604   6344   2704 R  25.8  0.0   0:02.36 runc --root /var/run/docker/runtime-runc/moby --log /run/containerd/io.containerd.runtime.v1.linux/moby+
26163 root      20   0   19368    880    636 R  25.2  0.0   0:00.95 iptables-legacy-save -t nat                                                                             
26150 root      20   0   18740   3136   1684 R  21.6  0.0   0:01.18 runc init                                                                                               
26086 root      20   0   18744   5756   2376 R  20.3  0.0   0:03.10 runc --root /var/run/docker/runtime-runc/moby --log /run/containerd/io.containerd.runtime.v1.linux/moby+
  410 root      20   0       0      0      0 S  19.4  0.0  42:42.56 [xfsaild/dm-1]                                                                                          
   14 root      20   0       0      0      0 S  14.8  0.0  54:28.76 [ksoftirqd/1]                                                                                           
    6 root      20   0       0      0      0 S  14.2  0.0  50:58.94 [ksoftirqd/0]                                                                                           
26158 root      20   0   18740   1548    936 R  14.2  0.0   0:00.90 runc --version                                                                                          
31715 nfsnobo+  20   0  129972  19856   9564 S  11.3  0.1  12:41.98 ./kube-rbac-proxy --logtostderr --secure-listen-address=[172.16.106.56]:9100 --tls-cipher-suites=TLS_EC+
10296 root      20   0 3402116 113200  39320 S  10.3  0.7   2936:50 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubern+
   22 root      rt   0       0      0      0 S   8.7  0.0   3:18.08 [watchdog/3]                                                                                            
26162 root      20   0  139592   2792   2508 R   8.4  0.0   0:00.39 /opt/cni/bin/calico                                                                                     
 6843 root      20   0  965824 110244  30364 S   7.7  0.7   1544:20 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock                                  
   24 root      20   0       0      0      0 S   7.4  0.0  49:03.89 [ksoftirqd/3]                                                                                           
 3636 techstar  20   0    4368    364    280 S   6.8  0.0   0:12.19 /tini -- /usr/local/bin/docker-entrypoint.sh eswrapper                                                  
26159 root      20   0   18740    760    552 R   6.5  0.0   0:00.28 runc --version                                                                                          
 1755 root      20   0  411108   5836   4416 S   4.8  0.0  35:39.97 /usr/libexec/packagekitd                       

The above content shows the memory usage and process information in the system.

  • High CPU usage situation [root@k8s-worker-8 ~]# top top - 00:43:01 up 28 days, 4:31, 3 users, load average: 72.51, 68.20, 47.01 Tasks: 263 total, 2 running, 260 sleeping, 1 stopped, 0 zombie %Cpu0 : 77.2 us, 15.7 sy, 0.0 ni, 2.2 id, 0.0 wa, 0.0 hi, 4.8 si, 0.0 st %Cpu1 : 77.0 us, 15.7 sy, 0.0 ni, 2.3 id, 0.0 wa, 0.0 hi, 5.0 si, 0.0 st %Cpu2 : 70.3 us, 20.9 sy, 0.0 ni, 2.9 id, 0.0 wa, 0.0 hi, 5.9 si, 0.0 st %Cpu3 : 76.6 us, 12.2 sy, 0.0 ni, 5.1 id, 0.0 wa, 0.0 hi, 6.1 si, 0.0 st KiB Mem : 16266296 total, 1996620 free, 7426512 used, 6843164 buff/cache KiB Swap: 0 total, 0 free, 0 used. 8302092 avail Mem

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    20072 root 20 0 7944892 689352 15924 S 137.1 4.2 3127:04 java -Dapp.id=svc-mall-member -javaagent:/opt/skywalking/agent/skywalking-agent.jar -Dskywalking.agent.+ 29493 root 20 0 3532496 248960 17408 S 98.3 1.5 0:06.70 java -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -Dmode=no-init -Xmx2g -Xms2g -cl+ 28697 root 20 0 3711520 1.0g 18760 S 61.6 6.7 124:41.08 java -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -Dmode=no-init -Xmx2g -Xms2g -cl+ 25885 root 20 0 3716560 1.2g 18908 S 59.3 7.6 183:12.97 java -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -Dmode=no-init -Xmx2g -Xms2g -cl+ 6843 root 20 0 965824 109568 30364 S 7.6 0.7 1544:49 /usr/bin/dockerd -H fd:// –containerd=/run/containerd/containerd.sock
    3668 techstar 20 0 4816688 1.3g 23056 S 6.6 8.5 111:58.56 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress+ 10296 root 20 0 3402372 111692 39320 S 6.6 0.7 2937:43 /usr/bin/kubelet –bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf –kubeconfig=/etc/kubern+ 18 root rt 0 0 0 0 S 5.0 0.0 5:57.54 [migration/2]
    6 root 20 0 0 0 0 S 2.6 0.0 51:21.52 [ksoftirqd/0]
    410 root 20 0 0 0 0 D 2.6 0.0 43:08.23 [xfsaild/dm-1]
    28310 root 20 0 7807048 565740 15924 S 2.6 3.5 1036:53 java -Dapp.id=svc-mall-admin -javaagent:/opt/skywalking/agent/skywalking-agent.jar -Dskywalking.agent.s+ 29741 root 20 0 7749608 540376 15848 S 2.3 3.3 304:41.47 java -Dapp.id=svc-mall-monitor -javaagent:/opt/skywalking/agent/skywalking-agent.jar -Dskywalking.agent+ 12902 root 20 0 1410452 30368 17744 S 2.0 0.2 752:30.32 calico-node -felix
    16712 root 0 -20 0 0 0 S 2.0 0.0 1:56.16 [kworker/2:0H]
    6381 root 20 0 7782400 491476 15928 S 1.7 3.0 441:08.96 java -Dapp.id=svc-mall-demo -

From the high sy CPU usage top data, it is obvious that this node is constantly scheduling system resources. This can be seen from processes such as rcu_sched/softirq in top. This situation is obviously caused by Kubernetes scheduling too many tasks on this node. So, I moved the Member service to another worker, and then the TPS is as follows:

You see, the TPS has increased to over 400, which means our direction is correct.

So why didn’t we see any improvement when we modified the Tomcat thread count for the Order service? This is because the pressure has been shifted to the Member service, causing the resource utilization of the worker node where the Member service is located to increase, making the Member service unable to respond to requests properly. Therefore, the overall TPS doesn’t seem to have any optimization effect. Now, we have moved the Member service away and seen a significant improvement, which shows that we are still on the right track.

Let’s take another look at the overall resource monitoring:

Now, none of the workers are using up or close to using up their resources, which is completely unacceptable considering our goal of “using all resources”. This is clearly not acceptable.

During the long period of pressure, we found that the resources cannot be fully utilized. And as you can see from the last image in the last section of the previous lesson, this is evident.

So far, we have looked at the performance analysis decision tree multiple times and made adjustments to parameters such as the Tomcat connection pool, JDBC pool, Jedis pool, MQ pool, etc. After the adjustments, the TPS seems to have a tendency to increase, but it is not very significant. Therefore, we can only start a new round of troubleshooting.

Phase 5 #

Determine Time Consumption #

In the analysis of the previous phase, we split the response time using logs to show you that we can extract the response time in various ways. This is what I have been emphasizing: focus on what you want, rather than how you achieve it.

In this phase, we will take a different approach: tracking the execution process of methods to determine time consumption. I want you to see that in the process of optimization, as long as the mindset remains the same, you can choose any means.

This method is actually similar to splitting time using logs. We can directly use Arthas to pinpoint the time consumption of methods. Please remember that besides Arthas, there are many other tools like JvisualVM/JMC/BTrace that can also be used.

We already know that the interface method is generateOrder in com.dunshan.mall.order.service.impl.PortalOrderServiceImpl, so we can just trace it directly.

You should note that in this step, we need to trace multiple times to ensure the correctness of the direction. I have to admit that this is a time-consuming and tedious task, it feels like counting sheep to fall asleep. However, some people can fall asleep while counting, while others get more excited as they count.

Now, let’s take a look at the results after tracing multiple times. Since the traced stack is too long, I have simplified the content of multiple traces. The important parts are shown below:

+---[91.314104ms] com.dunshan.mall.order.feign.MemberService:getCurrentMember() #150     
.................... 
+---[189.777528ms] com.dunshan.mall.order.feign.CartItemService:listPromotionnew() #154
....................
+---[47.300765ms] com.dunshan.mall.order.service.impl.PortalOrderServiceImpl:sendDelayMessageCancelOrder() #316  

Why are these methods important? Here I want to explain that for the traced content, what we mainly judge is whether the method consuming time is fixed. If the time is not consumed in fixed methods, then it becomes a bit troublesome because it means the problem lies not in the methods themselves, but in other resources that affect the execution time of the methods. If the time is consistently consumed in fixed methods, it is relatively easier as we just need to continue tracing these methods.

After tracing multiple times, I always found that the above-mentioned methods consume a considerable amount of time. Now that we know the time consumption of the methods, global monitoring cannot help us anymore, we can only analyze it through targeted monitoring.

Targeted Monitoring Analysis #

First, let me explain that according to our analysis approach, before targeted monitoring analysis, I repeatedly analyzed the global monitoring counters and did not find any issues with resource usage. Also, from the stress testing tool to the database, I did not find any bottlenecks. The whole path is smooth.

However, as we did see the response time was consumed in several specific methods, and these methods do not always consume such a long time, but fluctuate.

After repeated confirmation, I felt it was necessary to have a look at the business logic. Because for a complex business, if the business code logic is too long, no matter how we optimize it, there will be no effect, and in the end, we can only scale out the system.

However, even if we scale out the system, we need to provide logical reasoning. If the business can be optimized, we should try our best to do so. Because from a cost perspective, optimizing code is a more optimal choice.

Here, let me digress a bit. I have seen many companies that haven’t even done simple optimizations, but instead, they increase the number of machines to find a sense of security, resulting in a huge cost. This is very irrational. From a technical perspective, spending a small amount of time can save a significant amount of resource cost, which is obviously cost-effective. However, some companies are misled by some negative thinking and believe that as long as a problem can be solved by adding machines, it is not a big deal.

For this kind of thinking, we have to consider the cost. Most employees may think that since it is not their own money, why bother about how much the boss spends on machines? It is meaningless. However, from the global perspective of energy-saving and emission reduction, not doing an optimization that should be done is not only wasting the company’s cost but also not environmentally friendly at all! Uh… I seemed to have wandered off a bit.

Let’s get back to the point. Since we want to optimize the business, we need to know what the calling logic of the business looks like. So we open IntelliJ IDEA, find the generateOrder method, and then open the sequence diagram (an IDEA plugin), and we see a long business logic diagram:

If you don’t understand this image, it doesn’t matter. I will give you a brief description of what is in this image:

  1. Get username;
  2. Get shopping cart list;
  3. Get promotion information;
  4. Check inventory;
  5. Check coupons;
  6. Check points;
  7. Calculate amount;
  8. Convert order and insert into database;
  9. Get address information;
  10. Calculate bonus points and growth value;
  11. Insert into order table;
  12. Update coupon status;
  13. Deduct points;
  14. Delete shopping cart items;
  15. Send cancel order message;
  16. Return result.

Doesn’t it feel complicated? From the content I listed, you can understand how long it takes to place an order. For such a complex interface, if the business logic requires it to be this way, then there is no room for optimization at the interface level. Previously, we had already optimized the transactions per second (TPS) to over 400, and on this type of hardware, it is basically the limit.

In this lesson, we are not designing the business logic of placing an order because each enterprise’s order logic will have different business constraints. As performance engineers, we do not have the authority to modify the design of the business logic because modifying the business logic requires discussion and agreement from all relevant parties. However, we can provide optimization suggestions based on the results of the analysis.

Here, I have removed the steps of using coupons, points, and sending delayed cancel order messages from the order placement process. Some may ask if these changes are appropriate. Let me emphasize that I am not suggesting changing the business logic. Instead, I want to see if there is an increase in TPS after these changes. If there is an increase, it means we are on the right track and the business logic needs to be redesigned in consultation with all parties involved.

Let’s look at the TPS chart after the modifications:

As you can see, these modifications have indeed had an effect, and our optimization suggestions for the next steps are clearer. If you encounter such an interface in your project, you can provide optimization suggestions as follows:

  1. Sharding;
  2. Utilize caching;
  3. Process non-critical steps asynchronously;
  4. Split large interfaces into smaller ones.

However, these suggestions are ultimately just suggestions. Usually, in an enterprise, the technical team will conduct long-term technical analysis based on the specific business logic to determine how to implement the optimizations. If there is really no way to optimize technically, then the last resort is to scale up! It will no longer be scaling a specific part, but rather all the services involved in the entire chain.

Also, in a business chain, each enterprise typically accumulates technical knowledge and practices according to its development speed. It is not necessarily a good thing if the technical team focuses too much on following the latest trends and incurs high learning costs. Likewise, if they become too outdated, it will result in higher maintenance costs. Therefore, we can only continuously evolve business processes and technical implementations based on the actual development of the business in order to stay on the right path.

We have reached the end of our optimization here, or so it seems, but it is not over yet because the law of things not always going according to plan has never been broken. Therefore, we move on to the sixth stage.

Phase 6 #

Identifying the Problem Causing TPS Drop #

What could be the specific reason? I encountered the following problem while continuing the load testing:

As you can see, the TPS dropped, and I felt heartbroken… Although unexpected things can happen during the optimization process of each project, there seem to be too many unexpected issues with this interface. Anyway, let’s continue investigating.

Following Professor Gao’s emphasis on performance analysis decision trees (if you are not familiar, you can review Lesson 4), we examined each counter and finally found the following data in mysqlreport:

__ InnoDB Lock _________________________________________________________
Waits             889     0.1/s
Current            77
Time acquiring
  Total      36683515 ms
  Average       41263 ms
  Max           51977 ms

Clearly, there are quite a few locks at the moment, and the duration of these locks is also quite long. To investigate the locks, we need to first know what kind of transactions are currently running. So, we checked the innodb_trx table because MySQL records all executing transactions in this table. In the database, we found a large number of lock_wait occurrences:

Since there are lock waits, we naturally need to check the lock relationships to see what is causing the waits. Before examining the specific lock relationships, let’s also take a look at the application log.

Why do we need to check the application log? Because for databases, locks are used to safeguard data consistency, and the transactions that cause locks naturally come from the application. According to this logic, the transaction SQL statements we see in MySQL should correspond to those in the application. However, at the moment, we only see locks in MySQL and do not know what they look like in the application. Therefore, we need to examine the application log.

A friendly reminder: At this point, * we need to be careful not to use certain parameterized data in the stress testing tool that has uniqueness. Because when parameterized data is overused, locks can still occur when executing update statements in the database.

After reviewing the application log, we found the following information:

[2021-02-06 00:46:59.059] [org.apache.juli.logging.DirectJDKLog] [http-nio-8086-exec-72] [175] [ERROR] Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is org.springframework.dao.CannotAcquireLockException: 
### Error updating database.  Cause: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Lock wait timeout exceeded; try restarting transaction
### The error may involve com.dunshan.mall.mapper.OmsOrderMapper.insert-Inline
### The error occurred while setting parameters
### SQL: insert into oms_order (member_id, coupon_id, order_sn,        create_time, member_username, total_amount,        pay_amount, freight_amount, promotion_amount,        integration_amount, coupon_amount, discount_amount,        pay_type, source_type, status,        order_type, delivery_company, delivery_sn,        auto_confirm_day, integration, growth,        promotion_info, bill_type, bill_header,        bill_content, bill_receiver_phone, bill_receiver_email,        receiver_name, receiver_phone, receiver_post_code,        receiver_province, receiver_city, receiver_region,        receiver_detail_address, note, confirm_status,        delete_status, use_integration, payment_time,        delivery_time, receive_time, comment_time,        modify_time)     values (?, ?, ?,        ?, ?, ?,        ?, ?, ?,        ?, ?, ?,        ?, ?, ?,        ?, ?, ?,        ?, ?, ?,        ?, ?, ?,        ?, ?, ?,        ?, ?, ?,        ?, ?, ?,        ?, ?, ?,        ?, ?, ?,        ?)
### Cause: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Lock wait timeout exceeded; try restarting transaction
; Lock wait timeout exceeded; try restarting transaction; nested exception is com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Lock wait timeout exceeded; try restarting transaction] with root cause
com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Lock wait timeout exceeded; try restarting transaction

As you can see, even an insert statement triggers a lock_wait, which clearly indicates that table-level locks are involved. Because insert statements themselves do not cause table-level locks, there must be other information to consider. Continuing to review the log, we found the following information:

[2021-02-06 01:00:51.051] [org.springframework.scheduling.support.TaskUtils$LoggingErrorHandler] [scheduling-1] [95] [ERROR] Unexpected error occurred in scheduled task
org.springframework.dao.CannotAcquireLockException: 
### Error updating database.  Cause: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Lock wait timeout exceeded; try restarting transaction
### The error may involve defaultParameterMap
### The error occurred while setting parameters
### SQL: update oms_order         set status=?         where id in          (               ?          )
### Cause: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Lock wait timeout exceeded; try restarting transaction
; Lock wait timeout exceeded; try restarting transaction; nested exception is com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Lock wait timeout exceeded; try restarting transaction

You see, there is an update statement in it, so the logic is as follows: we know that update will lock the data, but MySQL uses the InnoDB engine. If the conditions of the update are exact queries, then there should be no table-level locks.

However, if the range of the update is large, there will be a problem because this will cause the insert statements to be blocked. After a while, you will see the following content:

We see that all the insert statements are in the LOCK WAIT state, which is the impact of table-level locks on insert. If you check the locks and lock waits, you will see the following information:

It can be seen that the values in the lock_mode column are all X, meaning exclusive locks. We know that an exclusive lock (X lock) is also called a write lock. The lock type (lock_type) in the image is all RECORD, meaning that the index is locked, and the index is a hidden clustered index created by InnoDB, which explains why this lock wait is caused.

When an SQL does not use any index, an X lock will be added after each clustered index, which is similar to table-level locks, but the underlying principles are different. For the purpose of description, we still use “table-level locks” to describe it.

To check the locks, we need to see who holds the locks. After carefully checking the information in INNODB_LOCK_WAIT mentioned above, we have determined that the transaction ID corresponding to this lock is 157710723, and the corresponding SQL is:

update oms_order set status=4 where id in  (       0  );

Let’s check this update code in the code:

/**
 * Batch update order status
 */
int updateOrderStatus(@Param("ids") List<Long> ids,@Param("status") Integer status);

It turns out that this is a call for batch tasks, and the specific logic is as follows:

The problem with this batch task is that it performs batch updates in an order table, and the content of the batch query is quite large. Because the above ID is 0, indicating that the order is unpaid, and there are many unpaid orders in this table, so when performing updates, a large range search is performed, which in turn leads to the occurrence of table-level locks.

The design of this batch task is obviously problematic. Think about it, if you want to perform such a large action of updates in an order table, it should be accurate updates, not range updates. In fact, for the update logic of orders, we can choose other implementation methods.

We have found the reason for the lock. Now we need to change the range update to a very accurate update to avoid table-level locks. After modifying it, the result of re-executing the scenario is as follows:

From the perspective of optimization results, the TPS has reached more than 700. This is already very good for such a complex interface.

In fact, there is still a lot of room for optimization in this order business, for example:

  1. Asynchronously generate order serial numbers and store them in Redis for on-demand retrieval.
  2. Batch operations need to be redesigned.
  3. After implementing read-write separation, update the business code accordingly.

Because the order logic is a very complex step in e-commerce, I won’t go into further detail, as it goes beyond the scope of performance.

Summary #

In this interface, we encountered several issues. Putting aside the problems and complexities, what I want to express is that in the process of performance optimization, problems unfold like peeling an onion layer by layer. Although it is possible for one optimization action to have a good effect, we must not be impatient. We need to analyze each problem slowly.

Let’s review all the analysis and optimization processes for this interface. In the first phase, we modified the thread pool and achieved some results, but new problems also occurred. In the second phase, we solved the issue of memory exhaustion caused by querying a large amount of data. In the third phase, we addressed the problem with indexes. In the fourth phase, we reallocated resources to make the system scheduling more reasonable.

In the fifth phase, we identified the time-consuming issue with methods. Please note that you should analyze the business logic before making corresponding optimizations. Don’t get caught up in pursuing performance optimization effects blindly.

In the sixth phase, we identified the issue of unreasonable design for batch tasks. In proper batch tasks, table-level locks and insert functionalities generated in batches must be separated.

In conclusion, during the analysis process, we should not set strict boundaries for the problems. Instead, we should solve whatever problems we encounter, without being hasty or reserved.

Homework #

Finally, please consider two questions:

  1. How to quickly identify situations where memory is exhausted?
  2. How to quickly identify situations where the business logic causes low TPS and unutilized resources?

Remember to discuss and exchange your ideas with me in the comment section. Every thought will help you make progress.

If you have gained something from reading this article, feel free to share it with your friends to learn and progress together. See you in the next lecture!