24 Continued Memory Increase How Should I Troubleshoot the Issue

24 Continued memory increase How should I troubleshoot the issue #

Hello, I am Liu Chao.

I think you must have encountered the problem of memory overflow or high memory utilization. When faced with a situation where memory continues to rise, it is actually difficult for us to find the specific problem from the business logs. So, how can we accurately find the underlying cause when dealing with multiple processes and a large number of business threads?

Common Memory Monitoring and Diagnostic Tools #

To do a good job, one must first sharpen one’s tools. During the routine investigation of memory performance bottlenecks, we often need to use Linux command-line or JDK tools to assist us in monitoring the usage of system or virtual machine memory. Here, I will introduce several useful and commonly used tools.

Linux Command-Line Tool: top Command #

The top command is one of the most commonly used commands in Linux. It can display real-time information such as CPU usage, memory usage, and system load of the processes currently running. The upper part of the output shows system statistics, while the lower part shows process usage statistics.

img

In addition to the simple top command, we can also use top -Hp pid to view the specific thread’s usage of system resources:

img

Linux Command-Line Tool: vmstat Command #

vmstat is a functional monitoring tool that allows us to specify sampling period and count. It can not only monitor memory usage but also observe CPU usage and swap usage. However, vmstat is generally not used to view memory usage; instead, it is often used to observe process context switching.

img

  • r: The number of processes waiting for run time.
  • b: The number of processes in uninterruptible sleep state.
  • swpd: Virtual memory usage.
  • free: Free memory.
  • buff: Memory used as buffers.
  • si: Memory pages swapped in from disk.
  • so: Memory pages swapped out to disk.
  • bi: Blocks sent to block devices.
  • bo: Blocks received from block devices.
  • in: Interrupts per second.
  • cs: Context switches per second.
  • us: User CPU time.
  • sy: System CPU time.
  • id: Idle time.
  • wa: I/O wait time.
  • st: Time stolen from a running virtual machine.

Linux Command-Line Tool: pidstat Command #

pidstat is a component of the Sysstat package. It is a powerful performance monitoring tool. You can install Sysstat by running the command: yum install sysstat. While the top and vmstat commands monitor process memory, CPU, and I/O usage, the pidstat command goes deep into the thread level.

By running the pidstat -help command, we can see several commonly used options to monitor thread performance:

img

Common parameters:

  • -u: Default parameter, displaying CPU usage of each process.
  • -r: Displaying memory usage of each process.
  • -d: Displaying I/O usage of each process.
  • -w: Displaying context switching of each process.
  • -p: Specify a process ID.
  • -t: Displaying thread statistics in a process.

We can obtain the relevant process ID through related commands (such as ps or jps), and then run the following command to monitor the memory usage of that process:

img

In the command, the option -p of pidstat is used to specify the process ID, -r is used to monitor the memory usage, 1 means every second, and 3 means the number of sampling.

The meanings of several important indicators displayed are:

  • Minflt/s: The minor faults the task commits per second, which don’t require loading pages from disk.
  • Majflt/s: The major faults the task commits per second, which require loading pages from disk.
  • VSZ: Virtual size of the process in kilobytes.
  • RSS: Resident set size, which is the non-swapped physical memory used by the task in kilobytes.

If we need to continue to view the memory usage of threads in this process, we can add the -t option at the end of the command:

img

We know that Java runs on top of the JVM, and most of the memory is created in the user memory of the JVM. Therefore, besides using the above Linux commands to monitor the overall memory usage of the server, we also need to know the memory usage of the JVM. The JDK comes with many command-line tools that can monitor memory allocation and usage in the JVM.

JDK Tools: jstat command #

jstat is used to monitor the real-time running status of Java applications, including heap memory information and garbage collection information. We can run jstat -help to view some key parameter information:

img

Then we can use jstat -option to see what operations jstat can perform:

img

  • -class: displays information related to ClassLoad;
  • -compiler: displays information related to JIT compilation;
  • -gc: displays heap information related to garbage collection;
  • -gccapacity: displays the capacity and usage of each generation;
  • -gcmetacapacity: displays the size of Metaspace;
  • -gcnew: displays information about the young generation;
  • -gcnewcapacity: displays the size and usage of the young generation;
  • -gcold: displays information about the old generation and permanent generation;
  • -gcoldcapacity: displays the size of the old generation;
  • -gcutil: displays garbage collection information;
  • -gccause: displays garbage collection information (similar to -gcutil), and also shows the cause of the last or current garbage collection;
  • -printcompilation: outputs information about JIT compilation.

It has multiple functions. Here, I will give an example of a commonly used function: how to use jstat to view the usage of heap memory. We can use jstat -gc pid to view it:

img

  • S0C: capacity of survivors (To Survivor) in the young generation (in KB);
  • S1C: capacity of survivors (From Survivor) in the young generation (in KB);
  • S0U: current usage of To Survivor in the young generation (in KB);
  • S1U: current usage of From Survivor in the young generation (in KB);
  • EC: capacity of Eden in the young generation (in KB);
  • EU: current usage of Eden in the young generation (in KB);
  • OC: capacity of the old generation (in KB);
  • OU: current usage of the old generation (in KB);
  • MC: capacity of Metaspace (in KB);
  • MU: current usage of Metaspace (in KB);
  • YGC: number of young generation garbage collections since the application started;
  • YGCT: time spent in young generation garbage collections since the application started (in seconds);
  • FGC: number of old generation (full GC) garbage collections since the application started;
  • FGCT: time spent in old generation (full GC) garbage collections since the application started (in seconds);
  • GCT: total time spent in garbage collection since the application started (in seconds).

JDK Tools: jstack command #

This tool was introduced in the Q&A session of Module 3. It is a thread stack analysis tool, and the most commonly used function is to use the jstack pid command to view the stack information of threads. It is usually used in conjunction with top -Hp pid or pidstat -p pid -t to view the specific status of threads, and it is also often used to troubleshoot deadlock exceptions.

img

In the information of each thread stack, we can view the thread ID, the thread’s status (wait, sleep, running, etc.), and whether it holds a lock, etc.

JDK Tools: jmap command #

In Lecture 23, we used jmap to view the initial configuration information of the heap memory and its usage. Besides this function, we can also use jmap to output the object information in the heap memory, including which objects have been generated and how many objects there are.

We can use jmap to view the initial configuration information of the heap memory and its usage:

img

We can use jmap -histo[:live] pid to view a histogram of the number and size of objects in the heap memory. If “live” is included, only live objects will be counted:

img

We can use the jmap command to dump the usage of the heap memory to a file:

img

We can download the file and analyze it using the MAT tool:

img

Next, let’s use a practical case to integrate the use of the tools mentioned above and analyze a memory leak problem.

Practical Exercise #

The memory overflow problems we encounter in our daily work generally fall into two categories. One is caused by the lack of flow control during peak periods, resulting in the instant creation of a large number of objects and memory overflow. The other is caused by memory leaks leading to memory overflow.

We can generally solve the first type of memory overflow problem by implementing flow control. However, in many cases, memory overflow is actually caused by memory leaks, which are bugs in the program. We need to identify the problem code in a timely manner.

Below, I have simulated a memory overflow case caused by a memory leak. Let’s practice it.

As we know, the purpose of ThreadLocal is to provide thread-specific variables. These variables can be passed throughout the entire lifecycle of a thread. They can reduce the complexity of passing information between multiple functions or classes in a thread, thus avoiding memory overflow. However, if ThreadLocal is used improperly, it may result in memory leaks.

In this case, the scenario involves ThreadLocal. Now let’s create 100 threads. Run the following code, and the system will throw an out-of-memory exception after a while:

final static ThreadPoolExecutor poolExecutor = new ThreadPoolExecutor(100, 100, 1, TimeUnit.MINUTES,
            new LinkedBlockingQueue<>()); // Create a thread pool to ensure that the created threads stay alive

final static ThreadLocal<Byte[]> localVariable = new ThreadLocal<>(); // Declare local variable

@RequestMapping(value = "/test0")
public String test0(HttpServletRequest request) {
      poolExecutor.execute(new Runnable() {
          public void run() {
              Byte[] c = new Byte[4096*1024];
              localVariable.set(c); // Add variables to the thread

          }
      });
    return "success";
}

@RequestMapping(value = "/test1")
public String test1(HttpServletRequest request) {
    List<Byte[]> temp1 = new ArrayList<Byte[]>();

    Byte[] b = new Byte[1024*20];
    temp1.add(b); // Add local variables

    return "success";
}

Before starting the application, we can enable heap memory exception logging using the parameters HeapDumpOnOutOfMemoryError and HeapDumpPath. Start the application with the following command:

java -jar -Xms1000m -Xmx4000m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/heapdump.hprof -Xms1g -Xmx1g -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Xloggc:/tmp/heapTest.log heapTest-0.0.1-SNAPSHOT.jar

First, request the “test0” link 10,000 times, and then request the “test1” link 10,000 times. At this point, the request to the “test1” interface will throw an exception.

img

From the logs, we can clearly see that this is a memory overflow exception. We can first use the Linux system command to view the memory usage of the process in the system. The easiest way is to use the “top” command.

img

By checking the memory usage of the process using the “top” command, we can see that in a system with only 8GB of memory and only 4GB of memory allocated to the Java process, the memory usage of the Java process has reached 55%. We can further check the specific thread’s utilization of system resources by using the “top -Hp pid” command.

img

By using the “jstack pid” command to view the specific thread’s stack information, we can see that the thread is in a TIMED_WAITING state. At this point, the CPU utilization and load are not abnormal, so we can exclude the possibility of deadlock or I/O blocking.

img

By using the “jmap” command to check the usage of heap memory, we can see that the usage of the old generation is almost full and the memory is not being released.

img

Based on the heap memory situation above, we can conclude that a memory leak has occurred. Now we need to find out which object cannot be garbage collected and what caused the memory leak.

We need to check the specific objects in the heap memory to see which objects occupy the memory. We can use the “jstat” command to view the number of live objects.

img

It is evident that the Byte objects occupy abnormal memory, indicating a memory leak in the Byte objects in the code. When starting the application, we have already set the dump file. By opening the memory log file with MAT, we can see that MAT has already indicated the Byte memory exception.

img

Clicking into the Histogram page, we can view the object count sorting. We can see that the Byte[] array is ranked first. After selecting the object, right-click and choose the “with incoming reference” function to see which object references this object.

img

Here we can clearly see that the problematic code is in the area of ThreadLocal.

img

Summary #

In some relatively simple business scenarios, it is relatively easy to troubleshoot system performance issues and find the specific causes. However, in complex business scenarios or when dealing with open source framework source code issues, it becomes much more difficult to troubleshoot. Sometimes, tools can only make educated guesses about where the problem might be, and actual troubleshooting requires a detailed analysis combined with the source code.

It can be said that there are no shortcuts when it comes to troubleshooting performance issues in production. It is not a simple task. In addition to mastering the tools introduced today, we also need to continuously accumulate experience in order to truly achieve performance optimization.

Questions for Reflection #

Besides the tools I mentioned above for troubleshooting memory performance bottlenecks, do you know the commonly used methods for monitoring the JVM memory in code?

Question:

Could you please explain how to avoid memory leaks caused by ThreadLocal?

We know that ThreadLocal is implemented based on ThreadLocalMap, and the Entry in this Map inherits from WeakReference. The key of the Entry object is encapsulated with WeakReference, which means that the key in Entry is a weak reference type and can only survive until the next GC.

If a thread calls the set method of ThreadLocal to set a variable, a new record will be added to the current ThreadLocalMap. However, if a garbage collection occurs at this time, the key will be reclaimed, but the value will still exist in memory. Since the current thread is still alive, the value will continue to be referenced.

These key values that have been garbage collected form a reference chain that always exists: Thread -> ThreadLocalMap -> Entry -> Value. This reference chain prevents Entry and Value from being reclaimed, even though the Key in Entry has been reclaimed, causing a memory leak.

To prevent memory leaks, we only need to remove the key using the remove method after we have finished using it.

What is the specific difference between memory leaks and memory overflow?

Memory leaks refer to objects that are no longer in use but cannot be reclaimed in a timely manner, thus occupying memory space and causing memory waste. For example, in the previous class, we discussed how the substring method in Java 6 can potentially cause memory leaks. When calling the substring method, it invokes the new string constructor, which reused the char array of the original string. If we only retrieve a small portion of characters using substring, but the original string is very large, and if the object returned by substring is constantly referenced, the original string cannot be garbage collected because the char array used by substring still points to it, resulting in a memory leak.

Memory overflow, on the other hand, refers to the occurrence of an OutOfMemoryException. There are many situations that can cause memory overflow, such as insufficient heap space, stack space, or method area space.

The relationship between memory leaks and memory overflow: Memory leaks can easily lead to memory overflow, but memory overflow is not necessarily caused by memory leaks.