09 Analysis How to Perform Basic Analysis of Kernel Memory Leaks

09 Analysis How to Perform Basic Analysis of Kernel Memory Leaks #

Hello, I am Yao Fangshao.

If you are an application developer, you should be familiar with memory leaks caused by application programs. However, have you ever considered that memory leaks could also be caused by issues within the operating system (kernel) itself? This is an area that many application developers and operation and maintenance personnel often neglect or are relatively unfamiliar with.

However, unfamiliarity doesn’t mean there are no problems. If a problem occurs in an unfamiliar area and you are accustomed to analyzing the application program itself, you may waste a lot of analysis time and still gain nothing. Therefore, for application developers or operation and maintenance personnel, it is necessary to master the basic methods of kernel memory leak analysis. This way, when it encounters a problem, you can make a preliminary judgment instead of being at a loss.

Kernel memory leaks are often serious problems, which usually require restarting the server to resolve. We definitely don’t want to rely solely on restarting the server to solve it, as that would mean endless cycles of restarts. What we hope for is that after a memory leak occurs, we can determine whether it is caused by the kernel, find the root cause of the problem, or seek help from more experienced kernel developers to identify the root cause and completely resolve it, so as to avoid restarting the server again.

So, how do we determine if a memory leak is caused by the kernel? In this lesson, we will discuss the basic analysis methods for kernel memory leaks.

What is kernel memory leakage? #

Before we dive into the specifics of the analysis, we need to have a basic understanding of what kernel memory leakage refers to. This requires us to first understand the basic methods of kernel space memory allocation.

As we discussed in the Fundamentals chapter, a process’s virtual address space includes both the user address space and the kernel address space. To put it simply, the memory allocated by a process in user mode corresponds to the user address space, while the memory allocated by a process in kernel mode corresponds to the kernel address space, as shown in the following diagram:

Applications can allocate and release memory in user mode using malloc() and free(), respectively. Similarly, memory can be allocated and released in kernel mode using kmalloc()/kfree() and vmalloc()/vfree(). Of course, there are other methods for memory allocation and release, but they can be roughly classified into these two categories.

By examining the physical memory on the far right, you can see the main difference between these two types of memory allocation methods. The physical addresses of kmalloc() memory are continuous, while the physical addresses of vmalloc() memory are non-continuous. These two different types of memory can also be observed using /proc/meminfo:

$ cat /proc/meminfo
...
Slab:            2400284 kB
SReclaimable:      47248 kB
SUnreclaim:      2353036 kB
...
VmallocTotal:   34359738367 kB
VmallocUsed:     1065948 kB
...

The amount of memory allocated by vmalloc is reflected in the VmallocUsed field, which represents the size of the used Vmalloc area. On the other hand, memory allocated by kmalloc is reflected in the Slab field, which is further divided into two parts: SReclaimable refers to memory that can be reclaimed when memory is scarce, while SUnreclaim refers to memory that cannot be reclaimed and can only be released actively.

The reason why the kernel exports information about kmalloc and vmalloc through /proc/meminfo is to allow us to investigate when they cause problems. Before we delve into specific cases and troubleshooting methods, let’s take a look at how memory allocation and release are performed in the kernel space using a simple program.

/* kmem_test */
#include <linux/init.h>
#include <linux/vmalloc.h>

#define SIZE (1024 * 1024 * 1024)

char *kaddr;

char *kmem_alloc(unsigned long size)
{
        char *p;
        p = vmalloc(size);
        if (!p)
                pr_info("[kmem_test]: vmalloc failed\n");
        return p;
}

void kmem_free(const void *addr)
{
        if (addr)
                vfree(addr);
}


int __init kmem_init(void)
{
        pr_info("[kmem_test]: kernel memory init\n");
        kaddr = kmem_alloc(SIZE);
        return 0;

}
    
    
void __exit kmem_exit(void)
{
    kmem_free(kaddr);
    pr_info("[kmem_test]: kernel memory exit\n");
}

module_init(kmem_init)
module_exit(kmem_exit)

MODULE_LICENSE("GPLv2");

This is a typical kernel module. In this kernel module, we use vmalloc to allocate 1GB of memory space, and then free it using vfree when the module exits. This is similar to the process of allocating and releasing memory in applications, except that the interface functions for memory allocation and release are different.

We need to use a Makefile to compile this kernel module:

obj-m = kmem_test.o

all:
    make -C /lib/modules/$(uname -r)/build M=$(pwd)
clean:
    rm -f *.o *.ko *.mod.c *.mod *.a modules.order Module.symvers

After executing the make command, a kernel module named kmem_test will be generated. Then, the following command can be used to install the module:

$ insmod kmem_test

The module can be uninstalled using the rmmod command:

$ rmmod kmem_test

This example program demonstrates the basic method of kernel space memory allocation. You can observe the change of VmallocUsed before and after inserting/removing the module to better understand its meaning.

So, when does kernel space memory leak occur?

Similar to user space memory leaks, kernel space memory leaks also refer to the situation where memory is allocated but not released. For example, if we do not call kmem_free in the kmem_exit function, a memory leak problem will occur.

Then, how is kernel space memory leak different from user space memory leak? We know that the lifetime of user space memory is consistent with the user process. This part of memory will be automatically released when the process exits. However, the lifetime of kernel space memory is consistent with the kernel itself, not with the kernel module. In other words, the memory allocated by a kernel module will not be released when the module exits, and will only be released when the kernel is restarted (i.e. when the server is restarted).

In summary, once a kernel memory leak occurs, it is difficult to find an elegant solution to solve it. Many times, the only solution is to restart the server, which is obviously a serious problem. Similarly, I recommend you observe this behavior, but be prepared to restart the server.

The usage of kmalloc is slightly different from vmalloc. You can refer to the kmalloc API and kfree API to modify the test program above, and then observe the relationship between kmalloc memory and the items in /proc/meminfo. I won’t demonstrate it here, leaving it as a homework assignment for you.

Kernel memory leaks often occur in some driver programs, such as network card drivers and SSD drivers, as well as in our own developed drivers. Because these drivers do not undergo extensive functional verification and testing like the Linux kernel, they are relatively prone to hidden problems.

We have encountered many memory leak issues caused by third-party drivers in production environments, and troubleshooting them can be time-consuming. As someone who has dealt with many such problems, my advice to you is that when you discover a kernel memory leak, the first thing you should question is the third-party driver programs in your system, as well as the drivers you have developed yourself.

So, how can we observe kernel memory leaks?

How to observe kernel memory leaks? #

As mentioned earlier, we can observe the allocation of kernel memory by using /proc/meminfo, which provides a convenient method for observing kernel memory:

If the kernel memory in /proc/meminfo (such as VmallocUsed and SUnreclaim) is too large, it is likely that a kernel memory leak has occurred.
In addition, you can periodically observe the changes in VmallocUsed and SUnreclaim. If they continue to increase without decreasing, it may indicate a kernel memory leak.

/proc/meminfo only provides an overview of system memory usage. If we want to see which specific module is using memory, what should we do?

This can also be viewed through /proc, so once again, when you are unsure of how to analyze, you can try looking at the files under the /proc directory. Using the example above, after installing the kernel module kmem_test, we can see its memory usage through /proc/vmallocinfo:

$ cat /proc/vmallocinfo | grep kmem_test
0xffffc9008a003000-0xffffc900ca004000 1073745920 kmem_alloc+0x13/0x30 [kmem_test] pages=262144 vmalloc vpages N0=262144

As we can see, within the [kmem_test] module, 262144 pages have been allocated through the kmem_alloc function, which amounts to a total of 1GB of memory. Assuming we suspect there is a problem with the kmem_test module, we can examine whether the memory allocated by the kmem_alloc function has been released.

The above test program is relatively simple, so it is easy to see if there are any issues based on the information in /proc/vmallocinfo. However, for drivers or kernel modules running in production environments, the logic can be much more complex, and it can be difficult to determine at a glance whether there are memory leaks. This often requires extensive analysis.

So, what is the basic analysis approach for kernel memory leak problems in complex scenarios?

Analysis Approach for Kernel Memory Leaks in Complex Scenarios #

If we want to do some basic analysis on kernel memory leaks, it is best to use some kernel memory leak analysis tools. The most commonly used analysis tool is kmemleak.

Kmemleak is a powerful tool for kernel memory leak detection. However, its use also has some inconvenience because enabling this feature can lead to some performance loss. Therefore, the kernel in the production environment usually has this feature disabled. We generally only use this feature in test environments, where we run the driver programs and other kernel modules that need to be analyzed.

Similar to other memory leak detection tools, kmemleak determines whether there is an unreferenced memory leak by checking the allocation and release of kernel memory. If a leak is detected, the information about these leaks is exported to users for analysis through the /sys/kernel/debug/kmemleak file. Taking our previous program as an example, the check result is as follows:

unreferenced object 0xffffc9008a003000 (size 1073741824):
  comm "insmod", pid 11247, jiffies 4344145825 (age 3719.606s)
  hex dump (first 32 bytes):
    38 40 18 ba 80 88 ff ff 00 00 00 00 00 00 00 00  8@..............
    f0 13 c9 73 80 88 ff ff 18 40 18 ba 80 88 ff ff  ...s.....@......
  backtrace:
    [<00000000fbd7cb65>] __vmalloc_node_range+0x22f/0x2a0
    [<000000008c0afaef>] vmalloc+0x45/0x50
    [<000000004f3750a2>] 0xffffffffa0937013
    [<0000000078198a11>] 0xffffffffa093c01a
    [<000000002041c0ec>] do_one_initcall+0x4a/0x200
    [<000000008d10d1ed>] do_init_module+0x60/0x220
    [<000000003c285703>] load_module+0x156c/0x17f0
    [<00000000c428a5fe>] __do_sys_finit_module+0xbd/0x120
    [<00000000bc613a5a>] __x64_sys_finit_module+0x1a/0x20
    [<000000004b0870a2>] do_syscall_64+0x52/0x90
    [<000000002f458917>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Since the memory allocated by the program through vmalloc is no longer used, it is marked as an “unreferenced object” by kmemleak. We need to release it after using to save memory.

If we want to observe kernel memory leaks in a production environment, we cannot use kmemleak. Are there any other methods?

We can use the tracepoints provided by the kernel for kernel memory allocation and release to dynamically observe the kernel’s memory usage:

When we enable these tracepoints, we can observe the dynamic allocation and release of memory. However, this analysis process is not as efficient as kmemleak.

When we want to observe the allocation and release of specific kernel structures and there are no corresponding tracepoints, we need to use kprobe or systemtap to trace the specific functions for allocating and releasing these kernel structures. Here is a specific case in our production environment.

The business team reported that the available memory in Docker was decreasing and they were unsure about the situation. After determining from the file under /proc (/proc/slabinfo) that a large amount of memory was being consumed by dentry, we wrote a systemtap script to observe the allocation and release of dentry:

# dalloc_dfree.stp
# usage : stap -x pid dalloc_dfree.stp
global free = 0;
global alloc = 0;

probe kernel.function("d_free") {
        if (target() == pid()) {
                free++;
        }   
}

probe kernel.function("d_alloc").return {
        if (target() == pid()) {
                alloc++;
        }   
}

probe end {
        printf("alloc %d free %d\n", alloc, free);
}

We used this tool for multiple statistics and found that the allocation of dentry was much greater than its release:

alloc 2041 free 1882
alloc 18137 free 6852
alloc 22505 free 10834
alloc 33118 free 20531

Therefore, we determined that there was a problem with the recovery of dentry in the container environment. Eventually, we identified it as a bug in the 3.10 version of the kernel: if the internal memory usage of Docker reached the limit but there was still a lot of available global memory, the slab inside Docker could not be recovered. Of course, this bug has been fixed in newer kernel versions.

That’s all for this lesson.

Class Summary #

In this class, we discussed a more difficult type of memory leak with greater consequences: kernel memory leaks. We also talked about common analysis methods for addressing these memory leaks:

You can analyze the usage of kernel memory by checking the information in /proc/meminfo and make some basic judgments based on the information. If the kernel memory is too large, it is worth suspecting a memory leak.
kmemleak is a powerful tool for analyzing kernel memory, but it is generally only used in testing environments because it has a noticeable impact on performance.
In a production environment, you can use tracepoints or kprobes to trace the allocation and release of specific types of kernel memory, which can help us determine if there is a memory leak. However, this often requires specialized knowledge, so you can seek advice from kernel experts when you do not understand.
Kernel memory leaks are usually caused by third-party drivers or kernel modules that you have written. When encountering kernel memory leaks, you should prioritize investigating them.

Homework #

The content we covered in this lesson may be a bit challenging for application developers and is also necessary for operations personnel to master. Therefore, our homework assignment mainly targets operations personnel or beginners in kernel programming: please write a SystemTap script to trace memory allocation and deallocation in the kernel. Feel free to discuss it with me in the comment section.

Thank you for reading, and if you found this lesson helpful, please consider sharing it with your friends. See you in the next lecture.