12 Caching Database Has Become a Bottleneck How to Accelerate Queries of Dynamic Data

12 Caching Database Has Become a Bottleneck - How to Accelerate Queries of Dynamic Data #

Hello, I’m Tang Yang.

Through the study of the previous database section, you have already learned about the evolution process of the database layer and the considerations for library and table design under high concurrency and high traffic. After completing the master-slave separation and sharding in your vertical e-commerce system, it can now support more than tens of thousands of DAUs, and the overall architecture of the system looks like this:

As a whole, the database is divided into master and slave, and the data is also split into multiple database nodes. However, as the concurrency increases and the amount of stored data grows, the disk IO of the database gradually becomes the bottleneck of the system. We need a faster component to access the data in order to reduce the response time of requests and improve the overall system performance. This is when we use caching. So what is caching, and how can we maximize its advantages?

This section is a summary of the caching part. I will comprehensively guide you in understanding the design ideas and concepts of caching from three aspects: caching definition, caching classification, and caching advantages and disadvantages. Then, in the remaining four sections, I will guide you to master the correct way to use caching in a targeted manner so that you can better use caching to improve the overall system performance in your practical work.

Now, let’s move on to today’s lesson!

What is Cache #

Cache is a component used for storing data, and its purpose is to return requests for data more quickly.

We often store cache in memory, so some people equate memory with cache, but this is a misconception. As industry insiders, you should know that in certain scenarios, we may also use SSD as a cache for cold data. For example, 360’s open-source Pika uses SSD to store data, solving Redis’s capacity bottleneck.

In fact, any structure used to coordinate the speed difference in data transmission between two hardware components with significant speed differences can be referred to as cache. So, in order to design a solution, we need to know the latency of common hardware components, so that we can have a more intuitive impression of the delays. Fortunately, someone in the industry has already summarized this data, and I have organized it for you to take a look.

From these data, you can see that an in-memory address lookup takes about 100ns, while a disk lookup takes 10ms. If we compare the time it takes to do an in-memory address lookup to a class break, then doing a disk lookup is like passing a whole semester in college. It’s clear that using memory as the storage medium for cache can significantly improve performance by multiple orders of magnitude compared to using disk as the main storage medium for a database, and it can also support higher concurrency. Therefore, memory is the most common medium for caching data.

Cache, as a common performance optimization technique that trades space for time, is applied in many places. Let’s look at a few examples, which I believe you will be familiar with.

1. Cache Examples #

Linux memory management is achieved through a hardware component called the Memory Management Unit (MMU), which performs the translation from virtual addresses to physical addresses. However, if we have to perform this complex calculation for every translation, it will undoubtedly result in performance loss. Therefore, we use a component called the Translation Lookaside Buffer (TLB) to cache the recent translations of virtual addresses and their corresponding physical addresses. The TLB is a cache component that caches the results of complex calculations. It’s like making a bowl of delicious and fragrant noodles, which may be complicated; but if we process the cooked noodles and turn them into instant noodles, it becomes simpler and faster. This cache component is relatively low-level, and you only need to have a basic understanding of it.

Most laptops, desktop computers, and servers have one or more TLB components, which help speed up the address translation process.

Think about the popular short videos you often watch on Douyin (TikTok). The platform uses a built-in network player to play the short videos. The network player receives data streams, downloads the data, and after processes such as separating audio and video streams and decoding, outputs the result to the peripheral device for playback. If we start downloading data only when we open a video, it will undoubtedly increase the opening speed of the video (we call it the first play time) and there will be buffering during playback. Therefore, our video player usually has some caching components. When the video is not opened, it caches a portion of the video data. For example, when we open Douyin (TikTok), the server may return three video pieces at once. When we play the first video, the player has already cached a portion of the data for the second and third videos. This way, when watching the second video, the user will have a “instant play” feeling.

In addition, the well-known HTTP protocol also has caching mechanisms. When we request a static resource for the first time, such as an image, the server not only returns the image information but also includes an “Etag” field in the response header. The browser caches the image information and the value of this field. When requesting the image again, the browser includes an “If-None-Match” field in the request header and writes the cached “Etag” value to send to the server. The server compares whether the image information has changed. If not, it returns a 304 status code to the browser, which will continue to use the cached image information. Through this cache negotiation approach, the size of network data transmission can be reduced, thereby improving the performance of page display.

2. Cache and Buffer #

After discussing so many caching cases, you probably already have an intuitive and vivid understanding of caching. Besides caching, we often hear a similar term in our daily development process - buffer. So, what is a buffer? What is the difference between buffering and caching?

We know that caching can improve the access speed of slow devices or reduce the performance issues caused by complex and time-consuming computations. In theory, we can solve all “slow” issues through caching, such as slow data reading from a disk or slow data querying from a database, but with different storage costs in different scenarios.

On the other hand, a buffer is an area for temporary storage of data that will be transferred to other devices later. A buffer is more similar to the message queue mentioned in the upcoming “Message Queue” section, used to compensate for the speed difference between high-speed and low-speed devices during communication. For example, when writing data to a disk, it is not written directly but written to a buffer area. The kernel will mark this buffer area as dirty. After a certain time or when the dirty buffer ratio reaches a certain threshold, a separate thread will flush the dirty blocks to the disk. This avoids the performance issues caused by flushing the disk every time data is written.

This is the difference between buffering and caching. From this perspective, the naming of TLB mentioned earlier is incorrect - it should be called caching instead of buffering.

Now that you understand the meaning of caching, what are the commonly used caches and how can we maximize their advantages?

Cache Categories #

In our daily development, the common types of cache mainly include static cache, distributed cache, and hot spot local cache.

Static cache was very famous in the era of Web 1.0. It is generally implemented by generating Velocity templates or static HTML files. Deploying static cache on Nginx can reduce the load on the backend application server. For example, when we are developing content management systems, the backend will input many articles, and the frontend will display the article content on the website, similar to portal websites like Sina and Netease.

Of course, we can also store articles in the database and query the database to obtain data for frontend display. However, this will put a heavy burden on the database. Even if we use distributed cache to handle read requests, it is still not cost-effective for large-scale portal websites with tens of billions of daily page views.

Therefore, our solution is to render each article as a static page when it is entered and place it on all frontend web servers such as Nginx or Squid. This way, when users visit the website, they will prioritize accessing the static pages on the web servers. After implementing certain cache cleaning strategies for older articles, a cache hit rate of over 99% can still be guaranteed.

This type of cache can only cache static data and is powerless for dynamic requests. So, how do we cache dynamic requests? This is where distributed cache comes in.

The name of distributed cache is well-known to us. Memcached and Redis, which we are familiar with, are typical examples of distributed cache. They have powerful performance and can break through the limitations of single machines by forming clusters using distributed solutions. Therefore, in the overall architecture, distributed cache plays a very important role (in the upcoming lessons, I will specifically cover the usage techniques and high availability solutions of distributed cache, so that you can skillfully apply distributed cache in your work).

For caching static resources, you can choose static cache. For caching dynamic requests, you can choose distributed cache. So, when should we consider using hot spot local cache?

The answer is when we encounter extreme hot spot data queries. Hot spot local cache is mainly deployed in the code of the application server to prevent hot spot queries from putting pressure on distributed cache nodes or databases.

For example, when a celebrity on Weibo becomes a hot topic, the “onlookers” will flock to their Weibo homepage, which will trigger a hot spot query for this user’s information. These queries usually hit a specific cache node or a specific database partition, resulting in a high volume of hot spot queries within a short period of time.

In this case, we will use some local cache solutions in the code, such as HashMap, Guava Cache, or Ehcache, which are deployed in the same process as the application. The advantage is that they do not require cross-network scheduling and are extremely fast. Therefore, they can block short-term hot spot queries. Let’s look at an example.

For example, let’s say your vertical e-commerce system’s homepage has some recommended products, and the information of these products is entered and modified by editors in the background. You analyze that there can be some delay in displaying the newly entered or modified product information on the webpage, such as a 30-second delay. The homepage has the highest number of requests, and even with distributed cache, it is difficult to handle the load. Therefore, you decide to use Guava Cache to cache all the information of recommended products and set it to reload all the latest products from the database every 30 seconds.

First, we initialize Guava’s Loading Cache:

CacheBuilder<String, List<Product>> cacheBuilder = CacheBuilder.newBuilder().maximumSize(maxSize).recordStats(); // Set the cache maximum size
cacheBuilder = cacheBuilder.refreshAfterWrite(30, TimeUnit.SECONDS); // Set the refresh interval

LoadingCache<String, List<Product>> cache = cacheBuilder.build(new CacheLoader<String, List<Product>>() {
    @Override
    public List<Product> load(String k) throws Exception {
        return productService.loadAll(); // Get all products
    }
});

This way, when you want to get all product information, you can call the get method of the Loading Cache to prioritize getting the product information from the local cache. If the local cache does not exist, the CacheLoader logic will be used to load all the products from the database.

Since the local cache is deployed on the application server and we usually deploy multiple application servers, when data is updated, we cannot determine which server’s local cache contains the cache. Updating or deleting the cache on all servers is not a good choice, so we usually wait for the cache to expire. Therefore, the expiration time of this cache is very short, usually in minutes or seconds, to avoid returning stale data to the frontend.

Limitations of caching #

By understanding the above content, it is not difficult to find that the main function of caching is to improve access speed, thereby being able to withstand higher concurrency. So, can caching solve all problems? Obviously not. Everything has two sides, and caching is no exception. We need to understand its advantages and also its limitations in order to make the most of its role.

First, caching is more suitable for business scenarios with more reads and fewer writes, and the data should preferably have certain hot-spot attributes. This is because caching is limited by storage media and cannot cache all data. Only when the data has hot-spot attributes can a certain cache hit rate be guaranteed. For example, in a scenario similar to Weibo or Moments, where 20% of content accounts for 80% of the traffic. Therefore, once the business scenario has fewer reads and more writes or no obvious hot spots, such as in a search scenario where each person searches for different terms without obvious hot spots, the role of caching is not significant.

Second, caching will bring complexity to the overall system and there is a risk of data inconsistency. In the scenario where the database is updated successfully but the cache update fails, dirty data will exist in the cache. For such scenarios, we can consider using shorter expiration times or manual clearing to solve the problem.

Third, it was mentioned earlier that caching typically uses memory as the storage medium, but memory is not unlimited. Therefore, when using caching, we need to evaluate the data storage capacity. For data that is foreseeable to consume a large amount of storage cost, caching should be used with caution. At the same time, the cache must be set with expiration times to ensure that the cache contains hot spot data.

Finally, caching will also bring certain costs to operations and maintenance. Operations and maintenance need to have a certain understanding of the caching component and consider it when troubleshooting.

Although there are many limitations, the performance improvement provided by caching is undeniable. When designing architecture, we also need to consider caching. However, when designing specific solutions, more careful thinking about cache design is required to maximize the advantages of caching.

Summary of the Course #

In this lesson, I introduced you to the definition of caching, common types of caching, and the drawbacks of caching. I would like to emphasize the following key points:

Caching can have multiple layers. For example, as mentioned earlier, static caching is located at the load balancing layer, distributed caching is located between the application layer and the database layer, and local caching is located at the application layer. We need to block the requests at higher layers as much as possible because the lower the layer, the poorer the concurrent processing capacity.
Cache hit rate is the most important monitoring metric for caching. The hotter the data, the higher the cache hit rate.

Another thing you need to understand is that caching is not just the name of a component, but also a design philosophy. You can think of any component or design solution that can accelerate read requests as an embodiment of caching. This acceleration is usually achieved in two ways:

Using faster media, such as the memory mentioned in the course.
Caching the results of complex calculations, such as the example of TLB caching address translation.

So, when you encounter a “slow” problem in your actual work, caching should be the first thing you consider.