22 How to efficiently solve memory overflow issues produced by for loops in payment order information #
Hello, I am Gao Lou.
Today, we are going to optimize the payment order interface. Through this interface, we will explore how to efficiently solve the memory overflow problem caused by loops.
For JVM memory overflow or leakage, performance engineers can usually identify an application that has hung. However, it is not easy to further determine the cause of the application hang. This is because taking a heap dump with a large memory is time-consuming, and more importantly, it requires sufficient experience and time to associate objects in the heap with the stack. This is one of the difficulties.
In this lesson, I will show you how to solve this difficulty.
However, before that, we will first address a familiar issue, which is adding indexes to database tables. Obviously, we encountered it again when testing this interface. Although I emphasized this issue in Lesson 16, the full table scan this time is much higher than before. Through this explanation, I hope you can understand that as long as there is a full table scan, CPU consumption will quickly reach 100%. At the same time, I hope you can see the impact of a full table scan on CPU consumption from this.
Scenario Running Data #
First, let’s run the scenario:
This is a typical performance bottleneck where the TPS is too low and the response time keeps increasing. I have written about the analysis logic for this type of bottleneck many times in the previous lessons, and I believe you have already mastered it. Now let’s take a look at the specific problems.
Architecture Diagram #
The architecture of this interface is relatively simple: User - Gateway - Order - MySQL. It is enough to have a rough understanding of this in mind.
Phase 1 #
Here, I won’t go into the details of time and we’ll directly look at the overall monitoring. The problems in the first phase are relatively simple, with performance bottlenecks appearing in a different way than before.
Overall Monitoring Analysis #
The data for overall monitoring is as follows:
Looking at this image, don’t you feel like you finally see a typical performance bottleneck? The CPU utilization is so high! Isn’t it simple? Let’s check the code by looking at the stack!
However, we should first find out what is causing the high CPU utilization on k8s-worker-1. The services running on this worker are as follows:
[root@k8s-master-2 ~]# kubectl get pods -o wide | grep worker-1
mysql-min-6685c9ff76-4m5xr 1/1 Running 0 4d23h 10.100.230.14 k8s-worker-1 <none> <none>
skywalking-es-init-ls7j5 0/1 Completed 0 4d11h 10.100.230.18 k8s-worker-1 <none> <none>
[root@k8s-master-2 ~]#
As seen above, there are two services running on this worker, one is an initialization container and the other is MySQL. The initialization container is in a completed state, so the high CPU usage must be due to MySQL. Therefore, let’s enter the container and execute ’top’ to see the degree of resource consumption.
What’s going on? Is the CPU utilization so high? Since it is, let’s take a look at the overall monitoring of MySQL.
Here, checking the overall monitoring of MySQL is actually a necessary step in analyzing MySQL. If we directly look at the trx table, there is some discontinuity in the middle because looking at the overall monitoring data of MySQL is a link between the previous and next steps. I will write this process in more detail so that you don’t get confused.
So, let’s execute the ‘mysqlreport’ command to see what the global monitoring data of MySQL looks like. Here, I have captured some important information:
__ Questions ___________________________________________________________
Total 307.93M 912.0/s
+Unknown 201.91M 598.0/s %Total: 65.57
DMS 43.20M 128.0/s 14.03
Com_ 32.90M 97.5/s 10.69
QC Hits 29.91M 88.6/s 9.71
COM_QUIT 389 0.0/s 0.00
Slow 20 ms 273.66k 0.8/s 0.09 %DMS: 0.63 Log:
DMS 43.20M 128.0/s 14.03
SELECT 32.39M 95.9/s 10.52 74.98
INSERT 10.64M 31.5/s 3.46 24.63
UPDATE 170.15k 0.5/s 0.06 0.39
REPLACE 0 0/s 0.00 0.00
DELETE 0 0/s 0.00 0.00
Com_ 32.90M 97.5/s 10.69
set_option 21.98M 65.1/s 7.14
commit 10.70M 31.7/s 3.48
admin_comma 137.68k 0.4/s 0.04
__ SELECT and Sort _____________________________________________________
Scan 20.04M 59.4/s %SELECT: 61.88
Range 0 0/s 0.00
Full join 32 0.0/s 0.00
Range check 0 0/s 0.00
Full rng join 0 0/s 0.00
Sort scan 120 0.0/s
Sort range 2.41k 0.0/s
Sort mrg pass 0 0/s
You see, the ‘SELECT’ in DMS accounts for a relatively large percentage. In fact, if only the percentage of ‘SELECT’ is relatively large, it’s not a big deal. The key is that there is also a ‘Scan’ (full-table scan) in the data below, and a full-table scan is a typical performance issue.
By now, I think you should be very clear about my next steps. It is to find the SQL, look at the execution plan, and then determine the optimization plan. If you are not very clear, you can review Lesson16 or Lesson20, both of which describe it.
Targeted Monitoring Analysis #
So, now we move on to the targeted monitoring analysis stage. By looking at the innodb_trx table, we see the slow-executing SQL statement, and its execution plan is as follows:
This is a typical full-table scan. Although the amount of data is not large, we still need to add an index. The statement for adding the index is as follows:
Here, you should pay attention. When creating an index, if the amount of data is too large, creating the index may be stuck for a long time, depending on the single CPU capabilities of the machine.
Optimization Effect #
After adding the index, let’s directly look at the optimization effect:
You see, the TPS is over a thousand now!
In fact, for SQL optimization, if we only add an index, it is a very simple step, and the effect will be very good. It is possible to increase the TPS by thousands of times, or even tens of thousands of times. However, if the optimization involves business logic, it becomes more troublesome.
If you think this lesson is just about giving you an example of adding an index, then you are a bit naive. Next, let’s look at a more complex problem.
Phase 2 #
As I continued to execute the stress test, I came across a very interesting situation. Let’s dive into it together!
Scenario Execution Data #
Look at this image. After approximately 6 minutes of continuous stress, not only is the TPS unstable, it actually drops! Not only does it drop, it even disconnects! And despite that, the response time doesn’t increase!
How do we analyze this? There is no suitable reason to split the time!
At this point, we need to use Brother’s performance analysis decision tree. Let’s take a look at the relevant global monitoring counters, step by step. Luckily, the logical chain of this interface is not very long.
Global Monitoring Analysis #
Following Teacher Gao’s habit, let’s start by looking at the global monitoring:
From this image, we can’t see anything. So, let’s proceed to check the performance analysis decision tree, and look at it layer by layer. When we see the GC health status of “Order,” we see the following data:
[root@svc-mall-order-568bd9b79-twhcw /]# jstat -gcutil 1 1s
S0 S1 E O M CCS YGC YGCT FGC FGCT GCT
100.00 0.00 100.00 100.00 95.06 92.82 1182 34.966 495 3279.704 3314.670
100.00 0.00 100.00 100.00 95.06 92.82 1182 34.966 495 3279.704 3314.670
100.00 0.00 100.00 100.00 95.06 92.82 1182 34.966 495 3279.704 3314.670
100.00 0.00 100.00 100.00 95.06 92.82 1182 34.966 495 3279.704 3314.670
90.88 0.00 100.00 100.00 95.08 92.82 1182 34.966 495 3286.621 3321.58
100.00 0.00 100.00 100.00 95.08 92.82 1182 34.966 496 3286.621 3321.586
100.00 0.00 100.00 100.00 95.08 92.82 1182 34.966 496 3286.621 3321.586
100.00 0.00 100.00 100.00 95.08 92.82 1182 34.966 496 3286.621 3321.586
100.00 0.00 100.00 100.00 95.08 92.82 1182 34.966 496 3286.621 3321.586
100.00 0.00 100.00 100.00 95.08 92.82 1182 34.966 496 3286.621 3321.586
100.00 0.00 100.00 100.00 95.08 92.82 1182 34.966 496 3286.621 3321.586
...........................
Do you feel desperate? There is a memory leak! Both the young generation and old generation memory are at 100%! Even after Full GC, the memory does not get reclaimed. Poor thing.
Targeted Monitoring Analysis #
Since the memory is exhausted, naturally we need to check what type of object is using up the memory. So, let’s go into the container and execute jmap -histo 1|more
to have a look:
num #instances #bytes class name
----------------------------------------------
1: 49727866 1397691896 [
2: 12426103 795269200 [[
3: 12426038 397633216 com.mysql.cj.protocol.a.result.ByteArrayRo
4: 2002596 384498432 com.dunshan.mall.order.domain.OmsOrderDetai
5: 12426082 198817312 com.mysql.cj.protocol.a.MysqlTextValueDecode
6: 2070085 182840264 [Ljava.lang.Object
7: 6008660 144207840 java.lang.Lon
8: 2207452 132116320 [
9: 4072895 97749480 java.util.ArrayLis
10: 2002690 80107600 org.apache.ibatis.cache.CacheKe
11: 2039613 65267616 java.util.HashMap$Nod
12: 2197616 52742784 java.lang.Strin
13: 14736 23246672 [Ljava.util.HashMap$Node
14: 36862 3243856 java.lang.reflect.Metho
15: 97195 3110240 java.util.concurrent.ConcurrentHashMap$Nod
16: 62224 2986752 java.util.HashMap
17: 19238 2452264 [
18: 21482 2360328 java.lang.Class
19: 26958 1078320 java.util.LinkedHashMap$Entry
...........................
From the above, it seems that we have found the problematic point. You see, there is a large amount of memory occupied by the MySQL result, and below it we also see the OmsOrderDetail
class, which is used to query detailed order information from the database.
Logically, when we want to see the detailed information of an order, we are actually querying for data from the database and then putting the retrieved data into the application’s memory. Therefore, the more data the MySQL result fetches, the more JVM memory the application consumes.
You might be wondering, should we directly look at the code for OmsOrderDetail
next? You can certainly take a look, but in this case it’s not that straightforward. Because we already know the code and have clarified the logic, looking at the code again might not reveal any problems.
So why does the JVM memory usage become high? This is where we need to check what the thread is doing:
-- top
[root@k8s-worker-3 ~]# docker exec -it 66d3639cf4a8 /bin/bash
[root@svc-mall-order-568bd9b79-twhcw /]# top
top - 16:10:50 up 11 days, 2:37, 0 users, load average: 3.92, 4.77, 3.35
Tasks: 4 total, 1 running, 3 sleeping, 0 stopped, 0 zombie
%Cpu0 : 46.7 us, 8.6 sy, 0.0 ni, 43.6 id, 0.0 wa, 0.0 hi, 0.7 si, 0.3 st
%Cpu1 : 23.3 us, 9.2 sy, 0.0 ni, 66.1 id, 0.0 wa, 0.0 hi, 1.0 si, 0.3 st
%Cpu2 : 50.0 us, 7.2 sy, 0.3 ni, 41.4 id, 0.0 wa, 0.0 hi, 0.7 si, 0.3 st
%Cpu3 : 46.4 us, 8.5 sy, 0.0 ni, 43.7 id, 0.0 wa, 0.0 hi, 1.0 si, 0.3 st
%Cpu4 : 50.5 us, 8.0 sy, 0.0 ni, 41.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 50.2 us, 3.1 sy, 0.0 ni, 46.1 id, 0.0 wa, 0.0 hi, 0.7 si, 0.0 st
KiB Mem : 16265992 total, 171760 free, 9077080 used, 7017152 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 6676508 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 8788300 4.2g 13860 S 127.9 27.1 115:17.15 java
575 root 20 0 11828 1776 1328 S 0.0 0.0 0:00.01 sh
789 root 20 0 11964 1980 1484 S 0.0 0.0 0:00.02 bash
802 root 20 0 56232 2012 1432 R 0.0 0.0 0:00.05 to
-- top -Hp 1
top - 16:11:39 up 11 days, 2:38, 0 users, load average: 8.87, 6.09, 3.87
Threads: 85 total, 1 running, 84 sleeping, 0 stopped, 0 zombie
%Cpu0 : 55.6 us, 7.1 sy, 0.0 ni, 36.6 id, 0.0 wa, 0.0 hi, 0.3 si, 0.3 st
%Cpu1 : 41.3 us, 3.8 sy, 0.0 ni, 54.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 30.4 us, 9.9 sy, 0.0 ni, 59.4 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu3 : 60.3 us, 6.7 sy, 0.0 ni, 32.7 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu4 : 21.2 us, 9.2 sy, 0.0 ni, 69.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 45.6 us, 10.1 sy, 0.3 ni, 43.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.3 st
KiB Mem : 16265992 total, 197656 free, 9071444 used, 6996892 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 6681848 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7 root 20 0 8788300 4.2g 13836 R 96.0 27.1 70:13.42 VM Thread
26 root 20 0 8788300 4.2g 13836 S 0.7 27.1 0:05.70 VM Periodic Tas
After executing the above two commands, did you notice that only one thread is consuming CPU? According to the GC status we checked earlier, this thread should be busy doing a Full GC. Let’s print the stack information to confirm, and indeed it is:
"VM Thread" os_prio=0 tid=0x00007fb18c0f4800 nid=0x7 runnable
At this point, it is very clear what our next step should be: capturing a heap dump to see the proportion of objects in memory. So, now we execute the following command to generate a heap dump.
jmap -dump:format=b,file=filename pid
Then, we download the generated heap dump and open it with MAT.
While opening the heap dump, I encountered a small issue, which you should also be aware of. If the memory of the heap dump is too large, we will get the following error:
In this case, we need to adjust the JVM’s maximum value parameter -Xmx in the MemoryAnalyzer.ini file. The default value of -Xmx is 1G, and you need to adjust it based on the size of the file you want to open. If the -Xmx value is not large enough, you may encounter the following error:
After struggling, we finally managed to open the heap dump and see the following information:
A certain memory segment consumed 2.6G! This suspicious memory point is also shown in the above image (using MAT or jmap to view memory consumption of objects are two different methods, and you can choose either one. Only in MAT, we can see the reminder of suspicious memory consumption points. In jmap, no such reminder is given, and we need to judge by ourselves). Let’s click into it and see what specific object is in this suspicious memory point:
As you can see, it is indeed a large amount of data returned by the SQL. There are more than 10 million records in the list shown in the above image. Let’s expand the corresponding stack to take a look:
Can you see the class OmsOrderDetail? You might think that there is a problem with the interface for the payment order information! However, it is not enough evidence. We need to determine whether OmsOrderDetail is called or generated in this interface in order to determine if it is a problem with this interface.
Since the current interface being used is paySuccess, let’s take a look at the calling logic of paySuccess. We found that paySuccess has a function called getDetail. Seeing this “getDetail,” and also the “Detail” substring it contains, do you feel like it can be associated with OmsOrderDetail? Let’s check the code corresponding to getDetail and see the relationship between it and OmsOrderDetail:
It is not difficult to find out that getDetail is an OmsOrderDetail class. In this case, our interface does use the OmsOrderDetail class. Do you have a sense of excitement knowing that we’ve caught the culprit? But don’t rush, unfortunate things always happen. Next, let’s take a look at the SQL logic corresponding to this code block.
After examining the code, we can see that there are two update statements and one select statement in this interface:
<update id="updateByPrimaryKeySelective" parameterType="com.dunshan.mall.model.OmsOrder">
update oms_orde
........................
where id = #{id,jdbcType=BIGINT}
</update>
<update id="updateByPrimaryKeySelective" parameterType="com.dunshan.mall.model.PmsSkuStock">
update pms_sku_stoc
........................
where id = #{id,jdbcType=BIGINT}
</update>
<select id="selectByPrimaryKey" parameterType="java.lang.Long" resultMap="BaseResultMap">
select
<include refid="Base_Column_List" />
from pms_sku_stoc
where id = #{id,jdbcType=BIGINT}
</select>
After checking the corresponding SQL, table, and index for these three SQL statements, I found that they all perform precise lookups and the indexes have been created in the preliminary analysis phase, so there should not be a large amount of data. However, we did see that OmsOrderDetail generated a huge amount of data earlier. What’s going on here?
To clarify this issue, let’s check who else is calling OmsOrderDetail.
After searching for a while, we finally found two for loops (which can actually be seen in the previous code snippet). When I clicked into them, I realized that they were scheduled tasks. At this point, the logic behind the problem became clear: due to the existence of the for loops, a certain block of code would be executed repeatedly during API calls, which could lead to continuous increase of JVM memory.
Do you still remember? In the previous lesson, we actually identified the problem of “slow SQL execution corresponding to for loops”, but because the duration of the pressure was not long enough, the memory was not exhausted, so the problem of memory depletion did not manifest itself. However, when the duration of the scenario execution became longer, the strange phenomenon of intermittent TPS occurred.
Optimization Effect #
At this point, our optimization solution becomes very clear, which is to not query so much data in one go when performing scheduled tasks. Therefore, I added a limit here to restrict the query to 500 rows at a time. If the processing cannot keep up, we can directly write a separate service to handle it using multiple threads.
After the modification, let’s take a look at the JVM memory consumption of the Order service again:
[root@svc-mall-order-f7b6d6f7-xl2kp /]# jstat -gcutil 1 1s
S0 S1 E O M CCS YGC YGCT FGC FGCT GCT
0.00 5.44 100.00 96.03 95.03 93.17 43710 1044.757 161 83.635 1128.39
0.00 3.18 82.83 96.21 95.03 93.17 43713 1044.797 161 83.635 1128.43
2.09 0.00 4.54 96.21 95.03 93.17 43718 1044.850 161 83.635 1128.48
1.99 0.00 44.92 96.21 95.03 93.17 43722 1044.891 161 83.635 1128.52
0.00 2.24 1.51 96.22 95.03 93.17 43727 1044.936 161 83.635 1128.57
2.23 0.00 0.00 96.22 95.03 93.17 43732 1044.987 161 83.635 1128.62
40.97 0.00 76.46 96.22 95.03 93.17 43736 1045.051 161 83.635 1128.68
0.00 41.76 47.74 98.81 95.03 93.17 43741 1045.136 161 83.635 1128.77
45.59 0.00 77.61 98.81 95.03 93.17 43746 1045.210 161 83.635 1128.84
0.00 0.00 51.01 52.55 95.03 93.17 43749 1045.270 162 84.021 1129.29
52.34 0.00 60.57 53.23 95.03 93.17 43754 1045.353 162 84.021 1129.37
0.00 51.85 0.00 56.32 95.03 93.17 43759 1045.450 162 84.021 1129.47
0.00 57.97 98.59 58.79 95.03 93.17 43764 1045.526 162 84.021 1129.54
42.02 0.00 83.01 60.83 95.03 93.17 43768 1045.602 162 84.021 1129.62
0.00 42.51 72.69 60.83 95.03 93.17 43773 1045.668 162 84.021 1129.68
28.95 0.00 67.94 61.52 95.03 93.17 43778 1045.735 162 84.021 1129.75
0.00 4.50 32.29 62.74 95.03 93.17 43783 1045.788 162 84.021 1129.80
62.60 0.00 27.04 62.80 95.03 93.17 43788 1045.866 162 84.021 1129.88
0.00 45.52 0.00 65.14 95.03 93.17 43793 1045.950 162 84.021 1129.97
0.00 47.10 71.13 65.14 95.03 93.17 43797 1046.015 162 84.021 1130.03
49.36 0.00 33.30 65.14 95.03 93.17 43802 1046.080 162 84.021 1130.10
5.92 0.00 35.47 67.33 95.03 93.17 43806 1046.132 162 84.021 1130.15
0.00 50.15 65.90 67.37 95.03 93.17 43811 1046.209 162 84.021 1130.23
46.75 0.00 10.39 69.71 95.03 93.17 43816 1046.305 162 84.021 1130.32
47.27 0.00 54.91 69.71 95.03 93.17 43820 1046.364 162 84.021 1130.38
0.00 45.69 46.99 69.71 95.03 93.17 43825 1046.430 162 84.021 1130.45
3.25 0.00 13.65 71.93 95.03 93.17 43830 1046.488 162 84.021 1130.50
0.00 38.00 46.98 71.94 95.03 93.17 43835 1046.551 162 84.021 1130.57
43.74 0.00 37.69 74.44 95.03 93.17 43840 1046.634 162 84.021 1130.65
0.00 42.88 15.64 74.44 95.03 93.17 43845 1046.702 162 84.021 1130.72
0.00 44.13 12.90 74.44 95.03 93.17 43849 1046.756 162 84.021 1130.77
0.00 16.42 33.96 75.79 95.03 93.17 43853 1046.813 162 84.021 1130.83
4.25 0.00 20.10 76.45 95.03 93.17 43858 1046.863 162 84.021 1130.88
0.00 3.22 0.00 76.46 95.03 93.17 43863 1046.914 162 84.021 1130.93
From the GC status, it seems that memory recycling has improved. I need to clarify that the JVM data mentioned above was not captured at the beginning of the scenario, but after a long period of execution. Since we can see that the JVM memory can be recycled normally, the above data is valid under the premise of no memory overflow.
However, there is still an issue here. I don’t know if you have noticed it, but the YGC is too fast. You can see it from the YGC column, where I print every second, resulting in four or five YGC occurrences per second, with an average YGC time of about 10 to 20 milliseconds. Although this issue has not yet had a significant impact on TPS, it is on the verge of danger. However, because it does not have a significant impact on TPS at the moment, I will not address the issue of fast YGC here.
Let’s re-run the scenario, and the results of the scenario execution are as follows:
See, hasn’t the TPS improved? And the drop at the beginning of the second phase is no longer present. Previously, the problem started to occur after 6 minutes, but here, we have been running for more than 30 minutes and still haven’t encountered a drop.
You can cheer now, right? Yes! We don’t have to wait for the car to crash anymore.
Summary #
In this lesson, we discussed two problems in two stages.
The first problem is relatively simple, which is the issue of full table scan without an index. We have mentioned it before. The reason for revisiting it in this lesson is that SQL consumes all the CPU resources, and this is a common performance problem caused by full table scan.
The second problem is related to memory overflow. For Java applications, in order to identify memory overflow, we must understand the logic of the code, know where variables are defined, who is accessing them, and the hierarchical relationships. The development tools nowadays are very user-friendly and can show us the call sequence of the code.
In this example, the challenge is not to find the point of memory overflow, but to identify the cause of the overflow. By examining the HeapDump, we can see where the overflow occurs and who references that class. There may be multiple places where the class is used and we need to check them one by one.
The uniqueness of this example is that our interface itself uses the pms_sku_stock
table and OmsOrderDetail
class, but based on the statements, this interface should not cause memory overflow. Therefore, we need to find out who is using the pms_sku_stock
table and OmsOrderDetail
class and causing the overflow. This is the key turning point.
I hope you have benefited from this lesson.
Homework #
Finally, I have two questions for you to ponder:
-
Can you quickly find the SQL that needs to be indexed?
-
How can you quickly identify which object is causing the continuous increase in memory?
Remember to discuss and exchange your ideas with me in the comments section. Each thought will help you make further progress.
If you have gained something from reading this article, feel free to share it with your friends to learn and improve together. See you in the next lesson!