22 How to Efficiently Solve Memory Overflow Issues Produced by for Loops in Payment Order Information

22 How to efficiently solve memory overflow issues produced by for loops in payment order information #

Hello, I am Gao Lou.

Today, we are going to optimize the payment order interface. Through this interface, we will explore how to efficiently solve the memory overflow problem caused by loops.

For JVM memory overflow or leakage, performance engineers can usually identify an application that has hung. However, it is not easy to further determine the cause of the application hang. This is because taking a heap dump with a large memory is time-consuming, and more importantly, it requires sufficient experience and time to associate objects in the heap with the stack. This is one of the difficulties.

In this lesson, I will show you how to solve this difficulty.

However, before that, we will first address a familiar issue, which is adding indexes to database tables. Obviously, we encountered it again when testing this interface. Although I emphasized this issue in Lesson 16, the full table scan this time is much higher than before. Through this explanation, I hope you can understand that as long as there is a full table scan, CPU consumption will quickly reach 100%. At the same time, I hope you can see the impact of a full table scan on CPU consumption from this.

Scenario Running Data #

First, let’s run the scenario:

This is a typical performance bottleneck where the TPS is too low and the response time keeps increasing. I have written about the analysis logic for this type of bottleneck many times in the previous lessons, and I believe you have already mastered it. Now let’s take a look at the specific problems.

Architecture Diagram #

The architecture of this interface is relatively simple: User - Gateway - Order - MySQL. It is enough to have a rough understanding of this in mind.

Phase 1 #

Here, I won’t go into the details of time and we’ll directly look at the overall monitoring. The problems in the first phase are relatively simple, with performance bottlenecks appearing in a different way than before.

Overall Monitoring Analysis #

The data for overall monitoring is as follows:

Looking at this image, don’t you feel like you finally see a typical performance bottleneck? The CPU utilization is so high! Isn’t it simple? Let’s check the code by looking at the stack!

However, we should first find out what is causing the high CPU utilization on k8s-worker-1. The services running on this worker are as follows:

[root@k8s-master-2 ~]# kubectl get pods -o wide | grep worker-1
mysql-min-6685c9ff76-4m5xr                  1/1     Running     0          4d23h   10.100.230.14    k8s-worker-1   <none>           <none>
skywalking-es-init-ls7j5                    0/1     Completed   0          4d11h   10.100.230.18    k8s-worker-1   <none>           <none>
[root@k8s-master-2 ~]# 

As seen above, there are two services running on this worker, one is an initialization container and the other is MySQL. The initialization container is in a completed state, so the high CPU usage must be due to MySQL. Therefore, let’s enter the container and execute ’top’ to see the degree of resource consumption.

What’s going on? Is the CPU utilization so high? Since it is, let’s take a look at the overall monitoring of MySQL.

Here, checking the overall monitoring of MySQL is actually a necessary step in analyzing MySQL. If we directly look at the trx table, there is some discontinuity in the middle because looking at the overall monitoring data of MySQL is a link between the previous and next steps. I will write this process in more detail so that you don’t get confused.

So, let’s execute the ‘mysqlreport’ command to see what the global monitoring data of MySQL looks like. Here, I have captured some important information:

__ Questions ___________________________________________________________
Total         307.93M   912.0/s
  +Unknown    201.91M   598.0/s  %Total:  65.57
  DMS          43.20M   128.0/s           14.03
  Com_         32.90M    97.5/s           10.69
  QC Hits      29.91M    88.6/s            9.71
  COM_QUIT        389     0.0/s            0.00
Slow 20 ms    273.66k     0.8/s            0.09  %DMS:   0.63  Log:
DMS            43.20M   128.0/s           14.03
  SELECT       32.39M    95.9/s           10.52         74.98
  INSERT       10.64M    31.5/s            3.46         24.63
  UPDATE      170.15k     0.5/s            0.06          0.39
  REPLACE           0       0/s            0.00          0.00
  DELETE            0       0/s            0.00          0.00
Com_           32.90M    97.5/s           10.69
  set_option   21.98M    65.1/s            7.14
  commit       10.70M    31.7/s            3.48
  admin_comma 137.68k     0.4/s            0.04


__ SELECT and Sort _____________________________________________________
Scan           20.04M    59.4/s %SELECT:  61.88
Range               0       0/s            0.00
Full join          32     0.0/s            0.00
Range check         0       0/s            0.00
Full rng join       0       0/s            0.00
Sort scan         120     0.0/s
Sort range      2.41k     0.0/s
Sort mrg pass       0       0/s

You see, the ‘SELECT’ in DMS accounts for a relatively large percentage. In fact, if only the percentage of ‘SELECT’ is relatively large, it’s not a big deal. The key is that there is also a ‘Scan’ (full-table scan) in the data below, and a full-table scan is a typical performance issue.

By now, I think you should be very clear about my next steps. It is to find the SQL, look at the execution plan, and then determine the optimization plan. If you are not very clear, you can review Lesson16 or Lesson20, both of which describe it.

Targeted Monitoring Analysis #

So, now we move on to the targeted monitoring analysis stage. By looking at the innodb_trx table, we see the slow-executing SQL statement, and its execution plan is as follows:

This is a typical full-table scan. Although the amount of data is not large, we still need to add an index. The statement for adding the index is as follows:

Here, you should pay attention. When creating an index, if the amount of data is too large, creating the index may be stuck for a long time, depending on the single CPU capabilities of the machine.

Optimization Effect #

After adding the index, let’s directly look at the optimization effect:

You see, the TPS is over a thousand now!

In fact, for SQL optimization, if we only add an index, it is a very simple step, and the effect will be very good. It is possible to increase the TPS by thousands of times, or even tens of thousands of times. However, if the optimization involves business logic, it becomes more troublesome.

If you think this lesson is just about giving you an example of adding an index, then you are a bit naive. Next, let’s look at a more complex problem.

Phase 2 #

As I continued to execute the stress test, I came across a very interesting situation. Let’s dive into it together!

Scenario Execution Data #

Look at this image. After approximately 6 minutes of continuous stress, not only is the TPS unstable, it actually drops! Not only does it drop, it even disconnects! And despite that, the response time doesn’t increase!

How do we analyze this? There is no suitable reason to split the time!

At this point, we need to use Brother’s performance analysis decision tree. Let’s take a look at the relevant global monitoring counters, step by step. Luckily, the logical chain of this interface is not very long.

Global Monitoring Analysis #

Following Teacher Gao’s habit, let’s start by looking at the global monitoring:

From this image, we can’t see anything. So, let’s proceed to check the performance analysis decision tree, and look at it layer by layer. When we see the GC health status of “Order,” we see the following data:

[root@svc-mall-order-568bd9b79-twhcw /]# jstat -gcutil 1 1s
  S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT     GCT
100.00   0.00 100.00 100.00  95.06  92.82   1182   34.966   495 3279.704 3314.670
100.00   0.00 100.00 100.00  95.06  92.82   1182   34.966   495 3279.704 3314.670
100.00   0.00 100.00 100.00  95.06  92.82   1182   34.966   495 3279.704 3314.670
100.00   0.00 100.00 100.00  95.06  92.82   1182   34.966   495 3279.704 3314.670
 90.88   0.00 100.00 100.00  95.08  92.82   1182   34.966   495 3286.621 3321.58
100.00   0.00 100.00 100.00  95.08  92.82   1182   34.966   496 3286.621 3321.586
100.00   0.00 100.00 100.00  95.08  92.82   1182   34.966   496 3286.621 3321.586
100.00   0.00 100.00 100.00  95.08  92.82   1182   34.966   496 3286.621 3321.586
100.00   0.00 100.00 100.00  95.08  92.82   1182   34.966   496 3286.621 3321.586
100.00   0.00 100.00 100.00  95.08  92.82   1182   34.966   496 3286.621 3321.586
100.00   0.00 100.00 100.00  95.08  92.82   1182   34.966   496 3286.621 3321.586
...........................

Do you feel desperate? There is a memory leak! Both the young generation and old generation memory are at 100%! Even after Full GC, the memory does not get reclaimed. Poor thing.

Targeted Monitoring Analysis #

Since the memory is exhausted, naturally we need to check what type of object is using up the memory. So, let’s go into the container and execute jmap -histo 1|more to have a look:

num     #instances         #bytes  class name
----------------------------------------------
   1:      49727866     1397691896  [
   2:      12426103      795269200  [[
   3:      12426038      397633216  com.mysql.cj.protocol.a.result.ByteArrayRo
   4:       2002596      384498432  com.dunshan.mall.order.domain.OmsOrderDetai
   5:      12426082      198817312  com.mysql.cj.protocol.a.MysqlTextValueDecode
   6:       2070085      182840264  [Ljava.lang.Object
   7:       6008660      144207840  java.lang.Lon
   8:       2207452      132116320  [
   9:       4072895       97749480  java.util.ArrayLis
  10:       2002690       80107600  org.apache.ibatis.cache.CacheKe
  11:       2039613       65267616  java.util.HashMap$Nod
  12:       2197616       52742784  java.lang.Strin
  13:         14736       23246672  [Ljava.util.HashMap$Node
  14:         36862        3243856  java.lang.reflect.Metho
  15:         97195        3110240  java.util.concurrent.ConcurrentHashMap$Nod
  16:         62224        2986752  java.util.HashMap
  17:         19238        2452264  [
  18:         21482        2360328  java.lang.Class
  19:         26958        1078320  java.util.LinkedHashMap$Entry
...........................

From the above, it seems that we have found the problematic point. You see, there is a large amount of memory occupied by the MySQL result, and below it we also see the OmsOrderDetail class, which is used to query detailed order information from the database.

Logically, when we want to see the detailed information of an order, we are actually querying for data from the database and then putting the retrieved data into the application’s memory. Therefore, the more data the MySQL result fetches, the more JVM memory the application consumes.

You might be wondering, should we directly look at the code for OmsOrderDetail next? You can certainly take a look, but in this case it’s not that straightforward. Because we already know the code and have clarified the logic, looking at the code again might not reveal any problems.

So why does the JVM memory usage become high? This is where we need to check what the thread is doing:

-- top
[root@k8s-worker-3 ~]# docker exec -it 66d3639cf4a8 /bin/bash
[root@svc-mall-order-568bd9b79-twhcw /]# top
top - 16:10:50 up 11 days,  2:37,  0 users,  load average: 3.92, 4.77, 3.35
Tasks:   4 total,   1 running,   3 sleeping,   0 stopped,   0 zombie
%Cpu0  : 46.7 us,  8.6 sy,  0.0 ni, 43.6 id,  0.0 wa,  0.0 hi,  0.7 si,  0.3 st
%Cpu1  : 23.3 us,  9.2 sy,  0.0 ni, 66.1 id,  0.0 wa,  0.0 hi,  1.0 si,  0.3 st
%Cpu2  : 50.0 us,  7.2 sy,  0.3 ni, 41.4 id,  0.0 wa,  0.0 hi,  0.7 si,  0.3 st
%Cpu3  : 46.4 us,  8.5 sy,  0.0 ni, 43.7 id,  0.0 wa,  0.0 hi,  1.0 si,  0.3 st
%Cpu4  : 50.5 us,  8.0 sy,  0.0 ni, 41.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  : 50.2 us,  3.1 sy,  0.0 ni, 46.1 id,  0.0 wa,  0.0 hi,  0.7 si,  0.0 st
KiB Mem : 16265992 total,   171760 free,  9077080 used,  7017152 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  6676508 avail Mem 


  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    1 root      20   0 8788300   4.2g  13860 S 127.9 27.1 115:17.15 java
  575 root      20   0   11828   1776   1328 S   0.0  0.0   0:00.01 sh
  789 root      20   0   11964   1980   1484 S   0.0  0.0   0:00.02 bash
  802 root      20   0   56232   2012   1432 R   0.0  0.0   0:00.05 to


-- top -Hp 1
top - 16:11:39 up 11 days,  2:38,  0 users,  load average: 8.87, 6.09, 3.87
Threads:  85 total,   1 running,  84 sleeping,   0 stopped,   0 zombie
%Cpu0  : 55.6 us,  7.1 sy,  0.0 ni, 36.6 id,  0.0 wa,  0.0 hi,  0.3 si,  0.3 st
%Cpu1  : 41.3 us,  3.8 sy,  0.0 ni, 54.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  : 30.4 us,  9.9 sy,  0.0 ni, 59.4 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st
%Cpu3  : 60.3 us,  6.7 sy,  0.0 ni, 32.7 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st
%Cpu4  : 21.2 us,  9.2 sy,  0.0 ni, 69.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  : 45.6 us, 10.1 sy,  0.3 ni, 43.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.3 st
KiB Mem : 16265992 total,   197656 free,  9071444 used,  6996892 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  6681848 avail Mem 


  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
    7 root      20   0 8788300   4.2g  13836 R 96.0 27.1  70:13.42 VM Thread
   26 root      20   0 8788300   4.2g  13836 S  0.7 27.1   0:05.70 VM Periodic Tas

After executing the above two commands, did you notice that only one thread is consuming CPU? According to the GC status we checked earlier, this thread should be busy doing a Full GC. Let’s print the stack information to confirm, and indeed it is:

"VM Thread" os_prio=0 tid=0x00007fb18c0f4800 nid=0x7 runnable

At this point, it is very clear what our next step should be: capturing a heap dump to see the proportion of objects in memory. So, now we execute the following command to generate a heap dump.

jmap -dump:format=b,file=filename pid

Then, we download the generated heap dump and open it with MAT.

While opening the heap dump, I encountered a small issue, which you should also be aware of. If the memory of the heap dump is too large, we will get the following error:

In this case, we need to adjust the JVM’s maximum value parameter -Xmx in the MemoryAnalyzer.ini file. The default value of -Xmx is 1G, and you need to adjust it based on the size of the file you want to open. If the -Xmx value is not large enough, you may encounter the following error:

After struggling, we finally managed to open the heap dump and see the following information:

A certain memory segment consumed 2.6G! This suspicious memory point is also shown in the above image (using MAT or jmap to view memory consumption of objects are two different methods, and you can choose either one. Only in MAT, we can see the reminder of suspicious memory consumption points. In jmap, no such reminder is given, and we need to judge by ourselves). Let’s click into it and see what specific object is in this suspicious memory point:

As you can see, it is indeed a large amount of data returned by the SQL. There are more than 10 million records in the list shown in the above image. Let’s expand the corresponding stack to take a look:

Can you see the class OmsOrderDetail? You might think that there is a problem with the interface for the payment order information! However, it is not enough evidence. We need to determine whether OmsOrderDetail is called or generated in this interface in order to determine if it is a problem with this interface.

Since the current interface being used is paySuccess, let’s take a look at the calling logic of paySuccess. We found that paySuccess has a function called getDetail. Seeing this “getDetail,” and also the “Detail” substring it contains, do you feel like it can be associated with OmsOrderDetail? Let’s check the code corresponding to getDetail and see the relationship between it and OmsOrderDetail:

It is not difficult to find out that getDetail is an OmsOrderDetail class. In this case, our interface does use the OmsOrderDetail class. Do you have a sense of excitement knowing that we’ve caught the culprit? But don’t rush, unfortunate things always happen. Next, let’s take a look at the SQL logic corresponding to this code block.

After examining the code, we can see that there are two update statements and one select statement in this interface:

<update id="updateByPrimaryKeySelective" parameterType="com.dunshan.mall.model.OmsOrder">
    update oms_orde
    ........................
    where id = #{id,jdbcType=BIGINT}
  </update>
  
  
<update id="updateByPrimaryKeySelective" parameterType="com.dunshan.mall.model.PmsSkuStock">
    update pms_sku_stoc
    ........................
    where id = #{id,jdbcType=BIGINT}
  </update>
   
<select id="selectByPrimaryKey" parameterType="java.lang.Long" resultMap="BaseResultMap">
    select
    <include refid="Base_Column_List" />
    from pms_sku_stoc
    where id = #{id,jdbcType=BIGINT}
</select>

After checking the corresponding SQL, table, and index for these three SQL statements, I found that they all perform precise lookups and the indexes have been created in the preliminary analysis phase, so there should not be a large amount of data. However, we did see that OmsOrderDetail generated a huge amount of data earlier. What’s going on here?

To clarify this issue, let’s check who else is calling OmsOrderDetail.

After searching for a while, we finally found two for loops (which can actually be seen in the previous code snippet). When I clicked into them, I realized that they were scheduled tasks. At this point, the logic behind the problem became clear: due to the existence of the for loops, a certain block of code would be executed repeatedly during API calls, which could lead to continuous increase of JVM memory.

Do you still remember? In the previous lesson, we actually identified the problem of “slow SQL execution corresponding to for loops”, but because the duration of the pressure was not long enough, the memory was not exhausted, so the problem of memory depletion did not manifest itself. However, when the duration of the scenario execution became longer, the strange phenomenon of intermittent TPS occurred.

Optimization Effect #

At this point, our optimization solution becomes very clear, which is to not query so much data in one go when performing scheduled tasks. Therefore, I added a limit here to restrict the query to 500 rows at a time. If the processing cannot keep up, we can directly write a separate service to handle it using multiple threads.

After the modification, let’s take a look at the JVM memory consumption of the Order service again:

[root@svc-mall-order-f7b6d6f7-xl2kp /]# jstat -gcutil 1 1s
  S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT     GCT
  0.00   5.44 100.00  96.03  95.03  93.17  43710 1044.757   161   83.635 1128.39
  0.00   3.18  82.83  96.21  95.03  93.17  43713 1044.797   161   83.635 1128.43
  2.09   0.00   4.54  96.21  95.03  93.17  43718 1044.850   161   83.635 1128.48
  1.99   0.00  44.92  96.21  95.03  93.17  43722 1044.891   161   83.635 1128.52
  0.00   2.24   1.51  96.22  95.03  93.17  43727 1044.936   161   83.635 1128.57
  2.23   0.00   0.00  96.22  95.03  93.17  43732 1044.987   161   83.635 1128.62
 40.97   0.00  76.46  96.22  95.03  93.17  43736 1045.051   161   83.635 1128.68
  0.00  41.76  47.74  98.81  95.03  93.17  43741 1045.136   161   83.635 1128.77
 45.59   0.00  77.61  98.81  95.03  93.17  43746 1045.210   161   83.635 1128.84
  0.00   0.00  51.01  52.55  95.03  93.17  43749 1045.270   162   84.021 1129.29
 52.34   0.00  60.57  53.23  95.03  93.17  43754 1045.353   162   84.021 1129.37
  0.00  51.85   0.00  56.32  95.03  93.17  43759 1045.450   162   84.021 1129.47
  0.00  57.97  98.59  58.79  95.03  93.17  43764 1045.526   162   84.021 1129.54
 42.02   0.00  83.01  60.83  95.03  93.17  43768 1045.602   162   84.021 1129.62
  0.00  42.51  72.69  60.83  95.03  93.17  43773 1045.668   162   84.021 1129.68
 28.95   0.00  67.94  61.52  95.03  93.17  43778 1045.735   162   84.021 1129.75
   0.00   4.50  32.29  62.74  95.03  93.17  43783 1045.788   162   84.021 1129.80
  62.60   0.00  27.04  62.80  95.03  93.17  43788 1045.866   162   84.021 1129.88
   0.00  45.52   0.00  65.14  95.03  93.17  43793 1045.950   162   84.021 1129.97
   0.00  47.10  71.13  65.14  95.03  93.17  43797 1046.015   162   84.021 1130.03
  49.36   0.00  33.30  65.14  95.03  93.17  43802 1046.080   162   84.021 1130.10
   5.92   0.00  35.47  67.33  95.03  93.17  43806 1046.132   162   84.021 1130.15
   0.00  50.15  65.90  67.37  95.03  93.17  43811 1046.209   162   84.021 1130.23
  46.75   0.00  10.39  69.71  95.03  93.17  43816 1046.305   162   84.021 1130.32
  47.27   0.00  54.91  69.71  95.03  93.17  43820 1046.364   162   84.021 1130.38
   0.00  45.69  46.99  69.71  95.03  93.17  43825 1046.430   162   84.021 1130.45
   3.25   0.00  13.65  71.93  95.03  93.17  43830 1046.488   162   84.021 1130.50
   0.00  38.00  46.98  71.94  95.03  93.17  43835 1046.551   162   84.021 1130.57
  43.74   0.00  37.69  74.44  95.03  93.17  43840 1046.634   162   84.021 1130.65
   0.00  42.88  15.64  74.44  95.03  93.17  43845 1046.702   162   84.021 1130.72
   0.00  44.13  12.90  74.44  95.03  93.17  43849 1046.756   162   84.021 1130.77
   0.00  16.42  33.96  75.79  95.03  93.17  43853 1046.813   162   84.021 1130.83
   4.25   0.00  20.10  76.45  95.03  93.17  43858 1046.863   162   84.021 1130.88
   0.00   3.22   0.00  76.46  95.03  93.17  43863 1046.914   162   84.021 1130.93

From the GC status, it seems that memory recycling has improved. I need to clarify that the JVM data mentioned above was not captured at the beginning of the scenario, but after a long period of execution. Since we can see that the JVM memory can be recycled normally, the above data is valid under the premise of no memory overflow.

However, there is still an issue here. I don’t know if you have noticed it, but the YGC is too fast. You can see it from the YGC column, where I print every second, resulting in four or five YGC occurrences per second, with an average YGC time of about 10 to 20 milliseconds. Although this issue has not yet had a significant impact on TPS, it is on the verge of danger. However, because it does not have a significant impact on TPS at the moment, I will not address the issue of fast YGC here.

Let’s re-run the scenario, and the results of the scenario execution are as follows:

See, hasn’t the TPS improved? And the drop at the beginning of the second phase is no longer present. Previously, the problem started to occur after 6 minutes, but here, we have been running for more than 30 minutes and still haven’t encountered a drop.

You can cheer now, right? Yes! We don’t have to wait for the car to crash anymore.

Summary #

In this lesson, we discussed two problems in two stages.

The first problem is relatively simple, which is the issue of full table scan without an index. We have mentioned it before. The reason for revisiting it in this lesson is that SQL consumes all the CPU resources, and this is a common performance problem caused by full table scan.

The second problem is related to memory overflow. For Java applications, in order to identify memory overflow, we must understand the logic of the code, know where variables are defined, who is accessing them, and the hierarchical relationships. The development tools nowadays are very user-friendly and can show us the call sequence of the code.

In this example, the challenge is not to find the point of memory overflow, but to identify the cause of the overflow. By examining the HeapDump, we can see where the overflow occurs and who references that class. There may be multiple places where the class is used and we need to check them one by one.

The uniqueness of this example is that our interface itself uses the pms_sku_stock table and OmsOrderDetail class, but based on the statements, this interface should not cause memory overflow. Therefore, we need to find out who is using the pms_sku_stock table and OmsOrderDetail class and causing the overflow. This is the key turning point.

I hope you have benefited from this lesson.

Homework #

Finally, I have two questions for you to ponder:

  1. Can you quickly find the SQL that needs to be indexed?

  2. How can you quickly identify which object is causing the continuous increase in memory?

Remember to discuss and exchange your ideas with me in the comments section. Each thought will help you make further progress.

If you have gained something from reading this article, feel free to share it with your friends to learn and improve together. See you in the next lesson!