16 Fundamentals Understanding Buffer and Cache in Memory #
Hello, I am Ni Pengfei.
In the previous section, we reviewed the basic principles of Linux memory management and learned how to use tools like free
and top
to view the memory usage of the system and processes.
The relationship between memory and CPU is very close, and memory management itself is a complex mechanism, so it is normal to find the knowledge difficult to grasp. However, as I mentioned before, when you are starting out, you don’t have to understand every single detail. Keep moving forward, try to understand the relevant concepts, and practice them. When you review the material later on, it will be much easier. Of course, the fundamentals should not be neglected.
Before we begin today’s content, let’s first review the memory usage of the system, such as the following free
output:
# Please note that the output of `free` may vary between different versions
$ free
total used free shared buff/cache available
Mem: 8169348 263524 6875352 668 1030472 7611064
Swap: 0 0 0
Obviously, this display includes the specific usage of physical memory (Mem
) and swap space (Swap
), such as total memory, used memory, cache, and available memory. The cache here is the sum of two parts: buffers and caches.
Most of the indicators here are relatively easy to understand, but it may be difficult to distinguish between buffers and caches. Literally, a buffer is a buffer area, while a cache is a cache. Both are temporary storage for data in memory. So, do you know the difference between these two types of “temporary storage”?
Note: In the following sections, I will use the term “buffer” and “cache” in English to avoid confusion with the word “缓存” (cache) in the text. In the text, “缓存” (cache) specifically refers to temporary storage in memory.
Source of free data #
Before I explain the two concepts in detail, take a moment to think about whether you have any means to further understand them. In addition to getting the concepts through Chinese translation, remember that Buffer and Cache are metrics we obtain using the free
command.
Do you remember what I said before about what to do when you encounter an unfamiliar metric?
I’m sure you remember - if you don’t understand something, consult the manual. Use the man
command to query the documentation for free
, and you will find detailed explanations of the corresponding metrics. For example, if we execute man free
, we can see the following interface:
buffers
Memory used by kernel buffers (Buffers in /proc/meminfo)
cache Memory used by the page cache and slabs (Cached and SReclaimable in /proc/meminfo)
buff/cache
Sum of buffers and cache
From the free
manual, you can see the explanation of buffer and cache.
-
Buffers are memory used by the kernel buffers, corresponding to the value “Buffers” in
/proc/meminfo
. -
Cache is memory used by the page cache and slabs, corresponding to the sum of “Cached” and “SReclaimable” in
/proc/meminfo
.
This explanation tells us that these values all come from /proc/meminfo
, but the specific meanings of Buffers, Cached, and SReclaimable are still not clear.
To understand what they really are, your first reaction is probably to Baidu or Google it. Although in most cases, an internet search can give you an answer, let’s not forget that the accuracy of that answer is hard to guarantee for you.
Please note that the conclusions found online may be correct, but they may not necessarily apply to your environment. In short, the specific meanings of the same metric can vary considerably due to differences in kernel versions and performance tool versions. That’s why I always emphasize general approaches and methods in this column, rather than asking you to memorize specific conclusions. In terms of case studies, our machine environment is our biggest limitation.
So, is there a simpler and more accurate way to query their meanings?
proc file system #
I have mentioned the /proc file system in the previous section on CPU performance. /proc is a special file system provided by the Linux kernel that serves as the interface for user interaction with the kernel. Users can query the kernel’s running status and configuration options, as well as the running status and statistical data of processes, through /proc. Of course, you can also use /proc to modify the kernel’s configuration.
The proc file system is also the ultimate source of data for many performance tools. For example, as we just saw, the command free
reads /proc/meminfo
to obtain memory usage information.
Speaking of /proc/meminfo
, since the metrics Buffers, Cached, and SReclaimable are not easy to understand, we need to continue to explore the proc file system to get their detailed definitions.
By running man proc
, you can access the detailed documentation of the proc file system.
Please note that this document is quite long, so it would be better to search for specific keywords (e.g., search for meminfo) to quickly locate the memory-related section.
Buffers %lu
Relatively temporary storage for raw disk blocks that shouldn't get tremendously large (20MB or so).
Cached %lu
In-memory cache for files read from the disk (the page cache). Doesn't include SwapCached.
...
SReclaimable %lu (since Linux 2.6.19)
Part of Slab, that might be reclaimed, such as caches.
SUnreclaim %lu (since Linux 2.6.19)
Part of Slab, that cannot be reclaimed on memory pressure.
From this document, we can see the following:
-
Buffers are relatively temporary storage for raw disk blocks, which means they are used to cache disk data and usually do not get very large (around 20MB). In this way, the kernel can consolidate scattered writes and optimize disk writes, for example, by combining multiple small writes into a single large write.
-
Cached is the in-memory cache for files read from the disk, which means it is used to cache data read from files. This allows for quick access to these file data from memory on subsequent access, without the need to access the slower disk.
-
SReclaimable is part of the Slab. Slab consists of two parts, with the reclaimable part recorded by SReclaimable, while the unreclaimable part is recorded by SUnreclaim.
Finally, we have found the detailed definitions for these three metrics. At this point, you may feel relieved and satisfied, thinking that you finally understand Buffer and Cache. However, does understanding these definitions mean that you truly understand them? I have two questions for you, think about whether you can answer them.
The first question is that the documentation for Buffer does not mention whether it is used to cache disk read data or write data, but many search results on the internet state that Buffer is only used to cache write data to disk. Can it also cache data read from the disk?
The second question is that the documentation mentions that Cache is used to cache data read from files, so does it also cache data written to files?
To answer these two questions, let’s take a look at several scenarios to demonstrate the usage of Buffer and Cache.
Case Study #
Your Preparation #
Just like the previous experiment, today’s case is also based on Ubuntu 18.04, but it is also applicable to other Linux systems. Here is my case environment.
-
Machine configuration: 2 CPUs, 8GB RAM.
-
Pre-install the
sysstat
package, such asapt install sysstat
.
The reason for installing sysstat
is that we will use vmstat
to observe the changes in Buffer and Cache. Although the same results can also be obtained from /proc/meminfo
, the results from vmstat
are more intuitive.
In addition, these cases use dd
to simulate disk and file I/O, so we also need to observe the changes in I/O.
After installing the above tools, you can open two terminals and connect to the Ubuntu machine.
For the last step of the preparation phase, in order to minimize the effect of cache, remember to run the following command in the first terminal to clear the system cache:
# Clear various caches such as file pages, directory entries, and inodes
$ echo 3 > /proc/sys/vm/drop_caches
Here, /proc/sys/vm/drop_caches
is an example of modifying kernel behavior through the proc filesystem. Writing 3 indicates clearing various caches such as file pages, directory entries, and inodes. You don’t need to worry about the differences between these caches for now; we will talk about them later.
Scenario 1: Disk and File Write Case #
Let’s simulate the first scenario. First, in the first terminal, run the following vmstat
command:
# Output 1 set of data every 1 second
$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 7743608 1112 92168 0 0 0 0 52 152 0 1 100 0 0
0 0 0 7743608 1112 92168 0 0 0 0 36 92 0 0 100 0 0
In the output interface, the buff and cache in the memory section, as well as the bi and bo in the io section, are the key points we need to focus on.
-
buff and cache are the same as the Buffers and Cache we saw earlier, in KB.
-
bi and bo respectively indicate the size of block device read and write in blocks per second. Because the block size in Linux is 1KB, this unit is equivalent to KB/s.
Under normal circumstances, in an idle system, you should see that these values remain unchanged in multiple results.
Next, go to the second terminal and execute the dd
command to generate a 500MB file by reading a random device:
$ dd if=/dev/urandom of=/tmp/file bs=1M count=500
Then go back to the first terminal and observe the changes in Buffer and Cache:
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 7499460 1344 230484 0 0 0 0 29 145 0 0 100 0 0
1 0 0 7338088 1752 390512 0 0 488 0 39 558 0 47 53 0 0
1 0 0 7158872 1752 568800 0 0 0 4 30 376 1 50 49 0 0
1 0 0 6980308 1752 747860 0 0 0 0 24 360 0 50 50 0 0
0 0 0 6977448 1752 752072 0 0 0 0 29 138 0 0 100 0 0
0 0 0 6977440 1760 752080 0 0 0 152 42 212 0 1 99 1 0
...
0 1 0 6977216 1768 752104 0 0 4 122880 33 234 0 1 51 49 0
0 1 0 6977440 1768 752108 0 0 0 10240 38 196 0 0 50 50 0
By observing the output of vmstat
, we can see that during the execution of the dd
command, the Cache keeps growing while the Buffer remains relatively unchanged.
Further observing the I/O situation, you will see:
-
When the Cache begins to grow, there is very little block device I/O.
bi
appears only once with 488 KB/s andbo
has a value of 4KB. After a while, there will be a large number of block device writes, such asbo
becoming 122880. -
After the
dd
command ends, the Cache no longer grows, but the block device writes continue for a period of time. The results of multiple I/O writes added together are the 500M data thatdd
needs to write.
Comparing this result with the definition of Cache we just learned, you may be a bit confused. Why did the document say that Cache is a page cache for file reads, but now it also has a role in writing files?
Let’s temporarily set aside this question and move on to another disk write case. After finishing these two cases, we will analyze them together.
However, I must emphasize one point for the upcoming case:
The following commands have very high requirements for the environment. It requires your system to have multiple disks and the disk partition /dev/sdb1
to be unused. If you only have one disk, please do not attempt this, as it will cause damage to your disk partition.
If your system meets the requirements, you can continue to run the following command in the second terminal after clearing the cache. Write 2GB of random data to the disk partition /dev/sdb1
:
# First, clear the cache
$ echo 3 > /proc/sys/vm/drop_caches
# Then, run the dd command to write 2GB of random data to disk partition /dev/sdb1
$ dd if=/dev/urandom of=/dev/sdb1 bs=1M count=2048
Next, return to Terminal 1 and observe the changes in memory and I/O:
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 0 7584780 153592 97436 0 0 684 0 31 423 1 48 50 2 0
1 0 0 7418580 315384 101668 0 0 0 0 32 144 0 50 50 0 0
1 0 0 7253664 475844 106208 0 0 0 0 20 137 0 50 50 0 0
1 0 0 7093352 631800 110520 0 0 0 0 23 223 0 50 50 0 0
1 1 0 6930056 790520 114980 0 0 0 12804 23 168 0 50 42 9 0
1 0 0 6757204 949240 119396 0 0 0 183804 24 191 0 53 26 21 0
1 1 0 6591516 1107960 123840 0 0 0 77316 22 232 0 52 16 33 0
From this, you can see that although both involve writing data, writing to disk and writing to a file have different effects. When writing to disk (where bo is greater than 0), both the Buffer and Cache increase, but the Buffer increases much faster.
This indicates that writing to disk requires a large amount of Buffer, which is consistent with the definition we found in the documentation.
Comparing the two scenarios, we find that when writing to a file, Cache is used to cache the data, and when writing to disk, Buffer is used to cache the data. Therefore, returning to the previous question, although the documentation only mentions that Cache is used for file reading caching, in reality, Cache is also used to cache data when writing files.
Scenario 2: Disk and File Reading #
Now that we understand the situation with disk and file writing, let’s reverse our thinking and consider disk and file reading.
Return to Terminal 2 and run the following commands. After clearing the cache, read data from the file /tmp/file and write it to an empty device:
# First, clear the cache
$ echo 3 > /proc/sys/vm/drop_caches
# Run the dd command to read the file data
$ dd if=/tmp/file of=/dev/null
Then, return to Terminal 1 and observe the changes in memory and I/O:
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 1 0 7724164 2380 110844 0 0 16576 0 62 360 2 2 76 21 0
0 1 0 7691544 2380 143472 0 0 32640 0 46 439 1 3 50 46 0
0 1 0 7658736 2380 176204 0 0 32640 0 54 407 1 4 50 46 0
0 1 0 7626052 2380 208908 0 0 32640 40 44 422 2 2 50 46 0
By observing the output of vmstat
, you will find that when reading files (i.e., when bi
is greater than 0), the buffer remains unchanged while the cache keeps growing. This is consistent with the definition we found, which states that “cache is the page cache for reading files”.
So, what happens during disk reads? Let’s run the second example and find out.
First, go back to the second terminal and run the following command. After clearing the cache, read data from the disk partition /dev/sda1
and write it to an empty device:
# Clear the cache first
$ echo 3 > /proc/sys/vm/drop_caches
# Run the dd command to read the file
$ dd if=/dev/sda1 of=/dev/null bs=1M count=1024
Then, go back to terminal one and observe the changes in memory and I/O:
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 7225880 2716 608184 0 0 0 0 48 159 0 0 100 0 0
0 1 0 7199420 28644 608228 0 0 25928 0 60 252 0 1 65 35 0
0 1 0 7167092 60900 608312 0 0 32256 0 54 269 0 1 50 49 0
0 1 0 7134416 93572 608376 0 0 32672 0 53 253 0 0 51 49 0
0 1 0 7101484 126320 608480 0 0 32748 0 80 414 0 1 50 49 0
By observing the output of vmstat
, you will find that when reading from the disk (i.e., when bi
is greater than 0), both the buffer and cache are growing, but the buffer is growing much faster. This indicates that when reading from the disk, the data is cached in the buffer.
After analyzing the two examples in the previous scenario, you may have already come to the conclusion: when reading files, the data is cached in the cache, while when reading from the disk, the data is cached in the buffer.
By now, you should have realized that although the documentation provides explanations for the buffer and cache, it still does not cover all the details. For example, today we have learned:
- The buffer can be used as a cache for both data to be written to the disk and data read from the disk.
- The cache can be used as a page cache for data read from files and as a page cache for data written to files.
Thus, we have answered the two questions at the beginning of the scenario.
In simple terms, the buffer is a cache for disk data, while the cache is a cache for file data, and they are used in both read and write requests.
Summary #
Today, we explored in detail the meaning of Buffer and Cache in memory performance. Buffer and Cache respectively cache the read and write data of disks and file systems.
From a writing perspective, not only can it optimize disk and file writes, but it also benefits the application by allowing it to return to other tasks before the data is actually written to the disk.
From a reading perspective, it can accelerate the reading of frequently accessed data and reduce the pressure on the disk from frequent I/O operations.
In addition to the content we explored, this exploration process should also inspire you. When troubleshooting performance issues, it is impossible to remember the detailed meanings of all performance metrics due to the many performance metrics of various resources. Therefore, an accurate and efficient method - referring to documentation - is very important.
You must develop the habit of referring to documentation and learn to interpret the detailed meanings of these performance metrics. In addition, the proc file system is also our good helper. It presents the running state of the system internally and is also the data source for many performance tools, making it a good method for assisting in troubleshooting performance issues.
Reflection #
Lastly, I’d like to leave you with a question to ponder.
We already know that we can use commands like ps
, top
, or the proc file system to obtain information about a process’s memory usage. But how can we calculate the total physical memory usage of all processes?
Hint: To avoid double-counting memory usage by multiple processes, such as page cache or shared memory, adding up the data obtained from ps
or top
directly would result in duplicated calculations.
In this case, I recommend starting with the /proc/<pid>/smaps
file. I haven’t directly explained the meaning of each metric in the /proc/<pid>/smaps
file in the previous content, so you will need to consult the documentation of the proc file system and interpret the information to answer this question.
Please feel free to discuss with me in the comments section, and you are also welcome to share this article with your colleagues and friends. Let’s practice in real scenarios and progress through communication.