16 Fundamentals Understanding Buffer and Cache in Memory

16 Fundamentals Understanding Buffer and Cache in Memory #

Hello, I am Ni Pengfei.

In the previous section, we reviewed the basic principles of Linux memory management and learned how to use tools like free and top to view the memory usage of the system and processes.

The relationship between memory and CPU is very close, and memory management itself is a complex mechanism, so it is normal to find the knowledge difficult to grasp. However, as I mentioned before, when you are starting out, you don’t have to understand every single detail. Keep moving forward, try to understand the relevant concepts, and practice them. When you review the material later on, it will be much easier. Of course, the fundamentals should not be neglected.

Before we begin today’s content, let’s first review the memory usage of the system, such as the following free output:

# Please note that the output of `free` may vary between different versions
$ free
              total        used        free      shared  buff/cache   available
Mem:        8169348      263524     6875352         668     1030472     7611064
Swap:             0           0           0

Obviously, this display includes the specific usage of physical memory (Mem) and swap space (Swap), such as total memory, used memory, cache, and available memory. The cache here is the sum of two parts: buffers and caches.

Most of the indicators here are relatively easy to understand, but it may be difficult to distinguish between buffers and caches. Literally, a buffer is a buffer area, while a cache is a cache. Both are temporary storage for data in memory. So, do you know the difference between these two types of “temporary storage”?

Note: In the following sections, I will use the term “buffer” and “cache” in English to avoid confusion with the word “缓存” (cache) in the text. In the text, “缓存” (cache) specifically refers to temporary storage in memory.

Source of free data #

Before I explain the two concepts in detail, take a moment to think about whether you have any means to further understand them. In addition to getting the concepts through Chinese translation, remember that Buffer and Cache are metrics we obtain using the free command.

Do you remember what I said before about what to do when you encounter an unfamiliar metric?

I’m sure you remember - if you don’t understand something, consult the manual. Use the man command to query the documentation for free, and you will find detailed explanations of the corresponding metrics. For example, if we execute man free, we can see the following interface:

buffers
              Memory used by kernel buffers (Buffers in /proc/meminfo)

    cache  Memory used by the page cache and slabs (Cached and SReclaimable in /proc/meminfo)

    buff/cache
              Sum of buffers and cache

From the free manual, you can see the explanation of buffer and cache.

Buffers are memory used by the kernel buffers, corresponding to the value “Buffers” in /proc/meminfo.
Cache is memory used by the page cache and slabs, corresponding to the sum of “Cached” and “SReclaimable” in /proc/meminfo.

This explanation tells us that these values all come from /proc/meminfo, but the specific meanings of Buffers, Cached, and SReclaimable are still not clear.

To understand what they really are, your first reaction is probably to Baidu or Google it. Although in most cases, an internet search can give you an answer, let’s not forget that the accuracy of that answer is hard to guarantee for you.

Please note that the conclusions found online may be correct, but they may not necessarily apply to your environment. In short, the specific meanings of the same metric can vary considerably due to differences in kernel versions and performance tool versions. That’s why I always emphasize general approaches and methods in this column, rather than asking you to memorize specific conclusions. In terms of case studies, our machine environment is our biggest limitation.

So, is there a simpler and more accurate way to query their meanings?

proc file system #

I have mentioned the /proc file system in the previous section on CPU performance. /proc is a special file system provided by the Linux kernel that serves as the interface for user interaction with the kernel. Users can query the kernel’s running status and configuration options, as well as the running status and statistical data of processes, through /proc. Of course, you can also use /proc to modify the kernel’s configuration.

The proc file system is also the ultimate source of data for many performance tools. For example, as we just saw, the command free reads /proc/meminfo to obtain memory usage information.

Speaking of /proc/meminfo, since the metrics Buffers, Cached, and SReclaimable are not easy to understand, we need to continue to explore the proc file system to get their detailed definitions.

By running man proc, you can access the detailed documentation of the proc file system.

Please note that this document is quite long, so it would be better to search for specific keywords (e.g., search for meminfo) to quickly locate the memory-related section.

Buffers %lu
    Relatively temporary storage for raw disk blocks that shouldn't get tremendously large (20MB or so).

Cached %lu
   In-memory cache for files read from the disk (the page cache).  Doesn't include SwapCached.
...
SReclaimable %lu (since Linux 2.6.19)
    Part of Slab, that might be reclaimed, such as caches.

SUnreclaim %lu (since Linux 2.6.19)
    Part of Slab, that cannot be reclaimed on memory pressure.

From this document, we can see the following:

Buffers are relatively temporary storage for raw disk blocks, which means they are used to cache disk data and usually do not get very large (around 20MB). In this way, the kernel can consolidate scattered writes and optimize disk writes, for example, by combining multiple small writes into a single large write.
Cached is the in-memory cache for files read from the disk, which means it is used to cache data read from files. This allows for quick access to these file data from memory on subsequent access, without the need to access the slower disk.
SReclaimable is part of the Slab. Slab consists of two parts, with the reclaimable part recorded by SReclaimable, while the unreclaimable part is recorded by SUnreclaim.

Finally, we have found the detailed definitions for these three metrics. At this point, you may feel relieved and satisfied, thinking that you finally understand Buffer and Cache. However, does understanding these definitions mean that you truly understand them? I have two questions for you, think about whether you can answer them.

The first question is that the documentation for Buffer does not mention whether it is used to cache disk read data or write data, but many search results on the internet state that Buffer is only used to cache write data to disk. Can it also cache data read from the disk?

The second question is that the documentation mentions that Cache is used to cache data read from files, so does it also cache data written to files?

To answer these two questions, let’s take a look at several scenarios to demonstrate the usage of Buffer and Cache.

Case Study #

Your Preparation #

Just like the previous experiment, today’s case is also based on Ubuntu 18.04, but it is also applicable to other Linux systems. Here is my case environment.

Machine configuration: 2 CPUs, 8GB RAM.
Pre-install the sysstat package, such as apt install sysstat.

The reason for installing sysstat is that we will use vmstat to observe the changes in Buffer and Cache. Although the same results can also be obtained from /proc/meminfo, the results from vmstat are more intuitive.

In addition, these cases use dd to simulate disk and file I/O, so we also need to observe the changes in I/O.

After installing the above tools, you can open two terminals and connect to the Ubuntu machine.

For the last step of the preparation phase, in order to minimize the effect of cache, remember to run the following command in the first terminal to clear the system cache:

# Clear various caches such as file pages, directory entries, and inodes
$ echo 3 > /proc/sys/vm/drop_caches

Here, /proc/sys/vm/drop_caches is an example of modifying kernel behavior through the proc filesystem. Writing 3 indicates clearing various caches such as file pages, directory entries, and inodes. You don’t need to worry about the differences between these caches for now; we will talk about them later.

Scenario 1: Disk and File Write Case #

Let’s simulate the first scenario. First, in the first terminal, run the following vmstat command:

# Output 1 set of data every 1 second
$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
0  0      0 7743608   1112  92168    0    0     0     0   52  152  0  1 100  0  0
 0  0      0 7743608   1112  92168    0    0     0     0   36   92  0  0 100  0  0

In the output interface, the buff and cache in the memory section, as well as the bi and bo in the io section, are the key points we need to focus on.

buff and cache are the same as the Buffers and Cache we saw earlier, in KB.
bi and bo respectively indicate the size of block device read and write in blocks per second. Because the block size in Linux is 1KB, this unit is equivalent to KB/s.

Under normal circumstances, in an idle system, you should see that these values remain unchanged in multiple results.

Next, go to the second terminal and execute the dd command to generate a 500MB file by reading a random device:

$ dd if=/dev/urandom of=/tmp/file bs=1M count=500

Then go back to the first terminal and observe the changes in Buffer and Cache:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
0  0      0 7499460   1344 230484    0    0     0     0   29  145  0  0 100  0  0
 1  0      0 7338088   1752 390512    0    0   488     0   39  558  0 47 53  0  0
 1  0      0 7158872   1752 568800    0    0     0     4   30  376  1 50 49  0  0
 1  0      0 6980308   1752 747860    0    0     0     0   24  360  0 50 50  0  0
 0  0      0 6977448   1752 752072    0    0     0     0   29  138  0  0 100  0  0
 0  0      0 6977440   1760 752080    0    0     0   152   42  212  0  1 99  1  0
...
 0  1      0 6977216   1768 752104    0    0     4 122880   33  234  0  1 51 49  0
 0  1      0 6977440   1768 752108    0    0     0 10240   38  196  0  0 50 50  0

By observing the output of vmstat, we can see that during the execution of the dd command, the Cache keeps growing while the Buffer remains relatively unchanged.

Further observing the I/O situation, you will see:

When the Cache begins to grow, there is very little block device I/O. bi appears only once with 488 KB/s and bo has a value of 4KB. After a while, there will be a large number of block device writes, such as bo becoming 122880.
After the dd command ends, the Cache no longer grows, but the block device writes continue for a period of time. The results of multiple I/O writes added together are the 500M data that dd needs to write.

Comparing this result with the definition of Cache we just learned, you may be a bit confused. Why did the document say that Cache is a page cache for file reads, but now it also has a role in writing files?

Let’s temporarily set aside this question and move on to another disk write case. After finishing these two cases, we will analyze them together.

However, I must emphasize one point for the upcoming case:

The following commands have very high requirements for the environment. It requires your system to have multiple disks and the disk partition /dev/sdb1 to be unused. If you only have one disk, please do not attempt this, as it will cause damage to your disk partition.

If your system meets the requirements, you can continue to run the following command in the second terminal after clearing the cache. Write 2GB of random data to the disk partition /dev/sdb1:

# First, clear the cache
$ echo 3 > /proc/sys/vm/drop_caches
# Then, run the dd command to write 2GB of random data to disk partition /dev/sdb1
$ dd if=/dev/urandom of=/dev/sdb1 bs=1M count=2048

Next, return to Terminal 1 and observe the changes in memory and I/O:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
1  0      0 7584780 153592  97436    0    0   684     0   31  423  1 48 50  2  0
 1  0      0 7418580 315384 101668    0    0     0     0   32  144  0 50 50  0  0
 1  0      0 7253664 475844 106208    0    0     0     0   20  137  0 50 50  0  0
 1  0      0 7093352 631800 110520    0    0     0     0   23  223  0 50 50  0  0
 1  1      0 6930056 790520 114980    0    0     0 12804   23  168  0 50 42  9  0
 1  0      0 6757204 949240 119396    0    0     0 183804   24  191  0 53 26 21  0
 1  1      0 6591516 1107960 123840    0    0     0 77316   22  232  0 52 16 33  0

From this, you can see that although both involve writing data, writing to disk and writing to a file have different effects. When writing to disk (where bo is greater than 0), both the Buffer and Cache increase, but the Buffer increases much faster.

This indicates that writing to disk requires a large amount of Buffer, which is consistent with the definition we found in the documentation.

Comparing the two scenarios, we find that when writing to a file, Cache is used to cache the data, and when writing to disk, Buffer is used to cache the data. Therefore, returning to the previous question, although the documentation only mentions that Cache is used for file reading caching, in reality, Cache is also used to cache data when writing files.

Scenario 2: Disk and File Reading #

Now that we understand the situation with disk and file writing, let’s reverse our thinking and consider disk and file reading.

Return to Terminal 2 and run the following commands. After clearing the cache, read data from the file /tmp/file and write it to an empty device:

# First, clear the cache
$ echo 3 > /proc/sys/vm/drop_caches
# Run the dd command to read the file data
$ dd if=/tmp/file of=/dev/null

Then, return to Terminal 1 and observe the changes in memory and I/O:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  1      0 7724164   2380 110844    0    0 16576     0   62  360  2  2 76 21  0

0  1      0 7691544   2380 143472    0    0 32640     0   46  439  1  3 50 46  0
0  1      0 7658736   2380 176204    0    0 32640     0   54  407  1  4 50 46  0
0  1      0 7626052   2380 208908    0    0 32640    40   44  422  2  2 50 46  0

By observing the output of vmstat, you will find that when reading files (i.e., when bi is greater than 0), the buffer remains unchanged while the cache keeps growing. This is consistent with the definition we found, which states that “cache is the page cache for reading files”.

So, what happens during disk reads? Let’s run the second example and find out.

First, go back to the second terminal and run the following command. After clearing the cache, read data from the disk partition /dev/sda1 and write it to an empty device:

# Clear the cache first
$ echo 3 > /proc/sys/vm/drop_caches
# Run the dd command to read the file
$ dd if=/dev/sda1 of=/dev/null bs=1M count=1024

Then, go back to terminal one and observe the changes in memory and I/O:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
0  0      0 7225880   2716 608184    0    0     0     0   48  159  0  0 100  0  0
0  1      0 7199420  28644 608228    0    0 25928     0   60  252  0  1 65 35  0
0  1      0 7167092  60900 608312    0    0 32256     0   54  269  0  1 50 49  0
0  1      0 7134416  93572 608376    0    0 32672     0   53  253  0  0 51 49  0
0  1      0 7101484 126320 608480    0    0 32748     0   80  414  0  1 50 49  0

By observing the output of vmstat, you will find that when reading from the disk (i.e., when bi is greater than 0), both the buffer and cache are growing, but the buffer is growing much faster. This indicates that when reading from the disk, the data is cached in the buffer.

After analyzing the two examples in the previous scenario, you may have already come to the conclusion: when reading files, the data is cached in the cache, while when reading from the disk, the data is cached in the buffer.

By now, you should have realized that although the documentation provides explanations for the buffer and cache, it still does not cover all the details. For example, today we have learned:

The buffer can be used as a cache for both data to be written to the disk and data read from the disk.
The cache can be used as a page cache for data read from files and as a page cache for data written to files.

Thus, we have answered the two questions at the beginning of the scenario.

In simple terms, the buffer is a cache for disk data, while the cache is a cache for file data, and they are used in both read and write requests.

Summary #

Today, we explored in detail the meaning of Buffer and Cache in memory performance. Buffer and Cache respectively cache the read and write data of disks and file systems.

From a writing perspective, not only can it optimize disk and file writes, but it also benefits the application by allowing it to return to other tasks before the data is actually written to the disk.

From a reading perspective, it can accelerate the reading of frequently accessed data and reduce the pressure on the disk from frequent I/O operations.

In addition to the content we explored, this exploration process should also inspire you. When troubleshooting performance issues, it is impossible to remember the detailed meanings of all performance metrics due to the many performance metrics of various resources. Therefore, an accurate and efficient method - referring to documentation - is very important.

You must develop the habit of referring to documentation and learn to interpret the detailed meanings of these performance metrics. In addition, the proc file system is also our good helper. It presents the running state of the system internally and is also the data source for many performance tools, making it a good method for assisting in troubleshooting performance issues.

Reflection #

Lastly, I’d like to leave you with a question to ponder.

We already know that we can use commands like ps, top, or the proc file system to obtain information about a process’s memory usage. But how can we calculate the total physical memory usage of all processes?

Hint: To avoid double-counting memory usage by multiple processes, such as page cache or shared memory, adding up the data obtained from ps or top directly would result in duplicated calculations.

In this case, I recommend starting with the /proc/<pid>/smaps file. I haven’t directly explained the meaning of each metric in the /proc/<pid>/smaps file in the previous content, so you will need to consult the documentation of the proc file system and interpret the information to answer this question.

Please feel free to discuss with me in the comments section, and you are also welcome to share this article with your colleagues and friends. Let’s practice in real scenarios and progress through communication.