20 Case Study Why System's Swap Increased Part Two

20 Case Study Why System’s Swap Increased Part Two #

Hello, I’m Ni Pengfei.

In the previous section, we learned in detail about Linux memory reclamation, particularly the principles of Swap. Let’s briefly recap.

When memory resources are running low, Linux frees up file pages and anonymous pages through direct memory reclamation and periodic scanning, so that the memory can be allocated to processes that need it more.

It is relatively easy to reclaim file pages. Just clear the cache directly or write back dirty data to the disk before releasing the cache.
On the other hand, for infrequently accessed anonymous pages, they need to be swapped out to the disk. This way, when they are accessed again, they can be swapped back into the memory from the disk.

After enabling Swap, you can set /proc/sys/vm/min_free_kbytes to adjust the threshold for periodic memory reclamation, and you can also set /proc/sys/vm/swappiness to adjust the inclination for reclaiming file pages and anonymous pages.

So, how do we locate and analyze when the Swap usage increases? Next, let’s look at a case study involving disk I/O and practice our analysis skills.

Case Study #

The following case study is based on Ubuntu 18.04 but is also applicable to other Linux systems.

Machine configuration: 2 CPUs, 8GB memory
You need to install tools such as sysstat in advance, such as apt install sysstat

First, let’s open two terminals and SSH into two machines separately and install the mentioned tools.

As with previous case studies, all the following commands are assumed to be run as the root user. If you are logged into the system as a regular user, please run the command sudo su root to switch to the root user.

If you encounter any problems during the installation process, I encourage you to search for a solution first. If you cannot find a solution, you can ask me in the comments section.

Next, run the free command in the terminal to check the usage of Swap. For example, on my machine, the output is as follows:

$ free
             total        used        free      shared  buff/cache   available
Mem:        8169348      331668     6715972         696     1121708     7522896
Swap:             0           0           0

From this free output, you can see that the size of Swap is 0, indicating that my machine does not have Swap configured.

In order to proceed with the Swap case study, we need to configure and enable Swap. If Swap is already enabled in your environment, you can skip the following steps and move on.

To enable Swap, we first need to understand that Linux itself supports two types of Swap: Swap partition and Swap file. Taking Swap file as an example, run the following commands in the first terminal to enable Swap. Here, I am configuring a Swap file with a size of 8GB:

# Create a Swap file
$ fallocate -l 8G /mnt/swapfile
# Modify permissions so that only the root user can access it
$ chmod 600 /mnt/swapfile
# Configure the Swap file
$ mkswap /mnt/swapfile
# Enable Swap
$ swapon /mnt/swapfile

Then, execute the free command again to confirm that Swap is configured successfully:

$ free
             total        used        free      shared  buff/cache   available
Mem:        8169348      331668     6715972         696     1121708     7522896
Swap:       8388604           0     8388604

Now, in the free output, both the Swap space and the remaining space have changed from 0 to 8GB, indicating that Swap has been successfully enabled.

Next, in the first terminal, run the following dd command to simulate reading a large file:

# Write to a null device, actually only disk read requests
$ dd if=/dev/sda1 of=/dev/null bs=1G count=2048

Then, in the second terminal, run the sar command to view the changes in various memory metrics. You can observe for a while to see the changes in these metrics.

# Output a set of data every 1 second
# -r displays memory usage, -S displays Swap usage
$ sar -r -S 1

We can see that the output of sar consists of two tables: the first table represents memory usage and the second table represents Swap usage. The kb prefix before the metric names indicates that these metrics are in kilobytes.

After removing the prefix, you will find that you have already seen most of the metrics, and a few new metrics have appeared. Let me briefly explain them:

kbcommit represents the amount of memory the current system load needs. It is actually an estimation of the memory required to prevent memory overflow. %commit is the percentage of this value relative to the total memory.
kbactive represents active memory, which is the recently used memory and is generally not reclaimed by the system.
kbinact represents inactive memory, which is infrequently accessed memory and may be reclaimed by the system.

Now that we understand the meaning of these metrics, let’s analyze the related phenomena based on the specific values. You can clearly see that the overall memory usage (%memused) keeps increasing from the initial 23% to 98%, and most of the memory is being occupied by the buffer cache (kbbuffers). Specifically:

Initially, the free memory (kbmemfree) keeps decreasing while the buffer cache (kbbuffers) keeps increasing, indicating that the remaining memory is being allocated to the buffer cache.

After a period of time, the remaining memory is very small, while the buffer is occupying most of the memory. At this time, the usage of swap gradually increases, while the buffer and remaining memory only fluctuate within a small range.

You may be confused why the buffer keeps increasing. And which processes are causing this?

Clearly, we need to examine the situation of process caching. In the previous caching case, we learned that cachetop can meet this requirement. So, let’s try cachetop.

In the second terminal, press Ctrl+C to stop the sar command, and then run the cachetop command below to observe the usage of the cache:

$ cachetop 5
12:28:28 Buffers MB: 6349 / Cached MB: 87 / Sort: HITS / Order: ascending
PID      UID      CMD              HITS     MISSES   DIRTIES  READ_HIT%  WRITE_HIT%
   18280 root     python                 22        0        0     100.0%       0.0%
   18279 root     dd                  41088    41022        0      50.0%      50.0%

From the output of cachetop, we can see that the read/write requests of the dd process only have a 50% hit rate, and there are 41,022 cache pages that were missed. This indicates that it is the dd process, which was run at the beginning of the case, that caused the increase in buffer usage.

You might then ask, why does the usage of swap also increase? Intuitively speaking, since the buffer occupies the majority of the system’s memory and is still considered as reclaimable memory, shouldn’t the buffer be reclaimed first when the memory is not enough?

To understand this situation, we need to further observe the remaining memory, memory thresholds, and the activity of anonymous and file pages by examining /proc/zoneinfo.

In the second terminal, press Ctrl+C to stop the cachetop command, and then run the command below to observe the changes in the following indicators in /proc/zoneinfo:

# -d indicates highlighting the changed fields
# -A indicates displaying only the Normal line and the next 15 lines of output
$ watch -d grep -A 15 'Normal' /proc/zoneinfo
Node 0, zone   Normal
  pages free     21328
        min      14896
        low      18620
        high     22344
        spanned  1835008
        present  1835008
        managed  1796710
        protection: (0, 0, 0, 0, 0)
      nr_free_pages 21328
      nr_zone_inactive_anon 79776
      nr_zone_active_anon 206854
      nr_zone_inactive_file 918561
      nr_zone_active_file 496695
      nr_zone_unevictable 2251
      nr_zone_write_pending 0

You can notice that the remaining memory (pages_free) constantly fluctuates within a small range. When it is less than the low threshold of pages (pages_low), it suddenly increases to a value greater than the high threshold of pages (pages_high).

By combining the changes in remaining memory and buffer usage observed using sar, we can deduce that the fluctuation of remaining memory and buffer usage is due to the cycle of memory reclamation and cache reallocation.

When the remaining memory is less than the low threshold, the system will reclaim some cache and anonymous memory to increase the remaining memory. Among them, the reclamation of cache causes a decrease in the buffer in sar, while the reclamation of anonymous memory leads to an increase in swap usage.
Immediately after that, due to dd still running, the remaining memory is reallocated to the buffer, resulting in a decrease in remaining memory and an increase in the buffer.

In fact, there is another interesting phenomenon. If you run dd and sar multiple times, you may find that in multiple repetitions, sometimes swap is used more, and sometimes it is used less, while the fluctuation of the buffer is greater.

In other words, when the system reclaims memory, sometimes it reclaims more file pages, and sometimes it reclaims more anonymous pages.

Apparently, the tendency to reclaim different types of memory seems not very clear. You should recall the swappiness mentioned in the previous lesson, which is the configuration option for adjusting the reclamation of different types of memory.

Still in the second terminal, press Ctrl+C to stop the watch command, and then run the command below to view the configuration of swappiness:

$ cat /proc/sys/vm/swappiness
60

The swappiness value displayed is the default value of 60, which is a relatively neutral configuration. Therefore, the system will choose the appropriate reclamation type based on the actual runtime conditions, such as reclaiming inactive anonymous pages or inactive file pages.

By now, we have found the root cause of the swap usage. Another question is which applications were affected by the swap? In other words, which processes had their memory swapped out?

I would recommend using the proc file system to view the size of the virtual memory swapped out by processes, which is saved in /proc/pid/status under VmSwap (I recommend executing man proc to find the meanings of other fields).

Run the command below in the second terminal to view the processes with the highest swap usage. Note that for, awk, and sort are commonly used Linux commands. If you are not familiar with them, you can use man to search for their manuals or search online for tutorials on how to use them.

# Sort processes based on VmSwap usage and output process name, process ID, and SWAP usage
$ for file in /proc/*/status ; do awk '/VmSwap|Name|^Pid/{printf $2 " " $3}END{ print ""}' $file; done | sort -k 3 -n -r | head
dockerd 2226 10728 kB
docker-containe 2251 8516 kB
snapd 936 4020 kB
networkd-dispat 911 836 kB
polkitd 1004 44 kB

From here, you can see that the processes with relatively high swap usage are dockerd and docker-containe. So, when dockerd accesses the swapped memory again, it will also be slower.

This also suggests that although the cache belongs to reclaimable memory, in scenarios like large file copying, the system still uses swap to reclaim anonymous memory, rather than just reclaiming file pages that occupy the majority of the memory.

Finally, if you configured swap at the beginning, don’t forget to disable it after the case is completed. You can run the command below to disable swap:

$ swapoff -a

In practice, disabling and then re-enabling swap is also a common method to clean up swap space, for example:

$ swapoff -a && swapon -a

Summary #

When memory resources are scarce, Linux uses Swap to swap out infrequently accessed anonymous pages to disk and then swap them back into memory when accessed again. You can adjust the threshold for regular memory reclamation by setting /proc/sys/vm/min_free_kbytes, and you can adjust the tendency of swapping file pages and anonymous pages by setting /proc/sys/vm/swappiness.

When the Swap usage increases, you can use methods such as sar, /proc/zoneinfo, and /proc/pid/status to check the memory usage of the system and processes, and then identify the root cause and affected processes of the increased Swap.

On the other hand, generally, reducing Swap usage can improve overall system performance. How can you achieve this? Here, I have also summarized several common methods to reduce Swap usage.

Disable Swap - Now servers have sufficient memory, so unless necessary, disabling Swap is sufficient. With the popularity of cloud computing, most virtual machines in cloud platforms have Swap disabled by default.
If Swap is absolutely necessary, you can try lowering the value of swappiness to reduce the tendency of using Swap during memory reclamation.
For latency-sensitive applications that may run on servers with Swap enabled, you can also lock the memory using the library functions mlock() or mlockall() to prevent their memory from being swapped out.

Reflection #

Finally, I have a question for you to ponder.

In today’s case study, the swappiness parameter was set to the default configuration of 60. If we configure it as 0, will swap still occur? And why would this happen?

I encourage you to try it out yourself, pay close attention to the output of sar, and record and summarize your observations based on today’s content.

Feel free to leave a comment and discuss it with me. You can also share this article with your colleagues and friends. Let’s practice in real-world scenarios and improve through communication.