43 How to Use Cache to Optimize System Performance

43 How to use cache to optimize system performance #

Hello, I’m Liu Chao.

Caching is an essential technology for improving system performance, whether it is in the frontend or the backend. Using caching in the frontend can reduce the pressure of multiple requests to the server, while using caching in the backend can reduce the workload on the database and improve the performance of data retrieval.

Today, we will explore the implementation of caching at various levels, from the frontend to the server, and understand the advantages, disadvantages, and application scenarios of different types of caching.

Frontend Caching Techniques #

As a Java developer, you may wonder if it is necessary to understand frontend technologies.

A good soldier is not just content with being a soldier; similarly, as a technical professional, it is beneficial to have knowledge of frontend technologies. This will help us design and optimize the system. Caching in the frontend can alleviate server workload, reduce bandwidth consumption, and improve frontend query performance.

1. Local Cache #

When using interceptors (such as Fiddler) or debugging in browsers, you often come across responses with a 304 status code and the “Not Modified” string, as shown in the image below, which depicts the Geektime web homepage.

Without understanding frontend caching techniques, you may find this confusing. One commonly used caching mechanism in browsers is based on the 304 response status, which is known as conditional caching.

Conditional caching, as the name suggests, involves negotiating with the server to determine whether to use local cache.

Generally, conditional caching can be implemented based on the If-Modified-Since field in the request header and the Last-Modified field in the response header. It can also be implemented based on the If-None-Match field in the request header and the ETag field in the response header.

Both implementation methods work similarly. The former is based on time, while the latter is based on a unique identifier. The latter method is more accurate in determining whether the file content has been modified, thereby avoiding unreliable caching caused by time manipulation. Let’s take a look at the entire caching process:

When the browser requests a server resource for the first time, the server will add a unique ETag identifier to the response header along with the resource being returned. This unique identifier is generated based on the current request resource.
When the browser requests the same resource from the server again, it will add the If-None-Match field to the request header, and the value of this field is the ETag identifier obtained from the response header.
Upon receiving the request, the server compares the value in the If-None-Match field with the unique identifier generated for the current request resource. If they are the same, the server returns a 304 Not Modified status code. If they are different, the server adds a new ETag identifier to the response header and returns the resource.
If the browser receives a 304 response status code, it will load the resource from the local cache; otherwise, it will update the resource.

In addition to conditional caching, local cache also implements another type of caching known as strong caching.

Strong caching refers to directly using the browser’s local cache as long as it is determined that the cache has not expired. In the image below, the response is given a 200 status code, but the “size” field indicates “memory cache”.

Strong caching is implemented using two HTTP response headers: Expires and Cache-Control. These headers are used to indicate the expiration period for caching on the client side.

Expires specifies an absolute time, while Cache-Control represents a relative time, that is, the expiration time period. Similar to conditional caching, strong caching implemented based on Expires can also encounter issues with cache management due to time-related problems. I recommend using Cache-Control to implement strong caching. The specific implementation process is as follows:

When the browser requests a server resource for the first time, the server will add the Cache-Control header to the response header, which specifies the expiration period for caching.
When the browser requests the same resource from the server again, it first compares the request time with the expiration period specified in the Cache-Control header to determine if the resource is still within the cache. If it is, it is used from the cache; otherwise, a request is made to the server.
Upon receiving the request again, the server updates the Cache-Control header in the response.

2. Gateway Cache #

In addition to local caching, we can also set up caching in the gateway, which is commonly known as a CDN.

CDN caching involves caching resource copies in cache nodes located in different locations. When a user requests a particular resource, the nearest CDN node is called to return the requested resource. This caching mechanism is commonly used for caching video resources.

Service Layer Caching Techniques #

Front-end caching is generally used to cache constant data or resource files that are not frequently modified. Most of the data requested from APIs is cached on the server side for easy management of cached data.

The purpose of server-side caching is to improve system performance. For example, if a database experiences heavy concurrent query pressure, caching can be used to alleviate the pressure on the database. In the backend management system, some data such as report calculations require extensive computation for each request, consuming system CPU resources. In this case, we can use caching to store the computation results.

Server-side caching can be divided into process caching and distributed caching. In Java, process caching is implemented by the JVM, and common implementations include container classes such as ArrayList and ConcurrentHashMap. Distributed caching is based on Redis.

1. Process Caching #

For process caching, although accessing and storing data is more efficient, the amount of heap memory in the JVM is limited and it is difficult to synchronize cache updates among services in a distributed environment. Therefore, we generally cache data with small volumes and low update frequencies. Common implementations are as follows:

// Static constant
public final static String url = "https://time.geekbang.org";
// List container
public static List<String> cacheList = new Vector<String>();
// Map container
private static final Map<String, Object> cacheMap = new ConcurrentHashMap<String, Object>();

In addition to the built-in container classes in Java, we can also use Guava Cache, a memory caching component provided by Google, to implement process caching.

Guava Cache is suitable for high-concurrency multi-threaded caching. Similar to ConcurrentHashMap, it is implemented based on segmented locks for concurrent caching.

Guava Cache also implements data eviction. When we set a maximum size for the cache, if the stored data exceeds the maximum, it will evict data using the LRU (Least Recently Used) algorithm. The following code provides an example of Guava Cache implementation:

public class GuavaCacheDemo {
    public static void main(String[] args) {
        Cache<String, String> cache = CacheBuilder.newBuilder()
                .maximumSize(2)
                .build();
        cache.put("key1", "value1");
        cache.put("key2", "value2");
        cache.put("key3", "value3");
        System.out.println("Value of the first key: " + cache.getIfPresent("key1"));
        System.out.println("Value of the second key: " + cache.getIfPresent("key2"));
        System.out.println("Value of the third key: " + cache.getIfPresent("key3"));
    }
}

Output:

Value of the first key: null
Value of the second key: value2
Value of the third key: value3

But what if we have a large amount of data that is frequently updated, and we want to use JVM heap memory as the cache in a distributed deployment? In this case, how should we implement it?

Ehcache is a good choice. Ehcache is often used in Hibernate and primarily used for caching query results. Ehcache is an Apache open-source caching library that is implemented based on JVM heap memory. It supports multiple cache invalidation policies, disk persistence, and distributed caching mechanisms.

2. Distributed Cache #

Due to the strict requirements for data consistency in high concurrency scenarios, I generally do not recommend using Ehcache for caching data that requires consistency. For distributed caching, we recommend using Redis. Redis functions as an in-memory database and its query performance is extremely high, with read speeds exceeding 100,000 times per second due to its pure in-memory operations and single-threaded implementation.

In addition to its high performance, Redis also supports different types of data structures, including strings, lists, sets, hashes, etc. It also supports data eviction policies, data persistence, and transactions.

Now that we have discussed the two types of caching, let’s take a look at the potential issues that may arise.

Database and Cache Consistency Issues #

When querying cached data, we first read from the cache. If the data is not found in the cache, we then query the database and store the retrieved data in the cache.

Once our data is cached, if the data is modified (the modification also involves deleting the cached data) or deleted, we need to operate on both the cache and the database simultaneously. This can lead to data inconsistency.

For example, in a concurrent scenario, when an operation A leads to a deletion of data, the operation will first delete the data from the cache, and then delete the data from the database. However, if the deletion from the database has not completed yet, another query operation B comes in and finds that the data is no longer in the cache. It then queries the database and finds the data, and stores it back into the cache. Subsequently, the data in the database is deleted. This results in data inconsistency.

What if we delete the database first and then the cache?

Let’s give it a try. In a concurrent scenario, when operation A leads to a deletion of data, the operation will first delete the data from the database, and then delete the data from the cache. If the deletion from the cache fails, the data will not be deleted from the cache, while the data in the database has already been deleted. Again, data inconsistency occurs.

Therefore, we still need to perform the cache deletion operation first before completing the database operation. So how can we avoid data inconsistency caused by data update or deletion operations in high concurrency?

The typical solution is to use a thread-safe queue to cache the updated or deleted data. When operation A modifies the data, it will first delete a cached data, and then put the cached data into the queue in a thread-safe manner. A separate thread will perform the database deletion operation.

When another query request B comes in, if it finds that the data is not in the cache, it will first check the queue to see if the data is being updated or deleted. If the data is in the queue, it will block and wait until operation A successfully completes the database operation, and then it will be awakened to query the data from the database.

However, this implementation also has many flaws, such as the possibility of long reading request blocking and low throughput in high concurrency scenarios. Therefore, when considering caching, I usually do not recommend using it if the data is updated frequently and has certain consistency requirements.

Cache Penetration, Cache Breakdown, Cache Avalanche #

In addition to the problem of data inconsistency, there are also issues like cache penetration, cache breakdown, and cache avalanche when implementing distributed cache for storing big data. When implementing cache code, we should fully and comprehensively consider these problems.

Cache penetration refers to a large number of queries that do not hit the cache and directly query the database. If the query volume is large, it will cause a high database query traffic, resulting in pressure on the database.

There are usually two solutions. One is to cache the null value of the first query and set a relatively short expiration time. However, this solution has a security vulnerability: when hackers attack the system using a large number of uncached keys, the cache memory will be filled and overflowed.

The other solution is to use the Bloom Filter algorithm. This algorithm can be used to check whether an element exists and returns two possible results: possibly exists or definitely does not exist. This situation is suitable for solving the problem of cache penetration when the system is deliberately attacked. When caching data for the first time, we also cache the key values in the BitArray of the Bloom Filter. When a key value is queried, if it is definitely not present, we can directly return a null value. If it is possibly present, we will query the cache, and if there is no value, we will query the database.

The implementation principle of Bloom Filter is similar to the BitMap in Redis. First, initialize an array of length m, and each bit is initialized as 0. When inserting an element, n hash functions are used to calculate n different values, representing the positions in the array, and then set the values of these positions to 1.

Let’s assume that we insert two key values 20 and 28 as elements. The values obtained by hash functions modulated twice are 4, 9 and 14, 19 respectively, so 4, 9, 14, and 19 are all set to 1.

So why does Bloom Filter return “possibly exists” and “definitely does not exist”?

Suppose we are looking for an element 25, and the values obtained by n hash functions modulation are 1, 9, 14. At this time, it definitely does not exist in the BitArray. But when we look for an element 21, the values obtained by n hash functions modulation are 9, 14, and it will return “possibly exists”, but it actually does not exist.

Bloom Filter does not allow the deletion of any element. Why is that? Assuming the above 20, 25, and 28 elements all exist in the BitArray, the positions after modulation are 4, 9; 1, 9, 14; and 14, 19. If we want to delete element 25, we need to set the positions 1, 9, 14 back to 0, which will affect the elements 20 and 28.

Therefore, Bloom Filter does not allow the deletion of any element. This will cause deleted elements to still return “possibly exists”, which will also affect the accuracy of Bloom Filter’s judgment. The solution is to rebuild a new BitArray.

So what is cache breakdown? In high-concurrency situations, when multiple requests query the same key at the same time, if the key becomes invalid due to some reasons (expiration time has been set, or cache service is down), all these requests will go to query the database. This situation often occurs in scenarios where hot data is queried. Usually, we use exclusion locks to query the database in an orderly manner to reduce the concurrent pressure on the database.

Cache avalanche is similar to cache breakdown, but the difference lies in the scale of invalid caches. Avalanche generally refers to a large-scale cache invalidation, for example, when the expiration time of caches expires at the same time or when the cache service goes down. For the problem of a large number of caches expiring at the same time, we can use dispersed expiration time to solve it. As for the case of cache service downtime, we can use distributed clusters to implement cache service.

Summary #

From front-end to back-end, we can cache some infrequently changing data to improve query efficiency and reduce the pressure on the back-end. For the front-end, some static resource files are cached in the browser. In addition to static resource files, we can also cache some constant data, such as product information.

Server-side caching includes using JVM heap memory as a cache and implementing distributed caching with Redis. If the data is not frequently modified, the data volume is small, and there is no strict consistency requirement for the cached data, we can use heap memory to cache the data, which is simple to implement and efficient for querying. If the data volume is relatively large, it is frequently modified, or there is a strict consistency requirement for the cached data, we can use distributed caching to store the data.

When using back-end caching, we should pay attention to the inconsistency between the database and the cached data caused by data modification. If there is a very strict consistency requirement for the cached and database data, I do not recommend using caching.

At the same time, we should take precautions for interfaces with a large number of cached requests to prevent problems such as cache penetration, cache breakdown, and cache avalanche.

Thought Question #

In a distributed cache implemented based on Redis, why do we recommend deleting the data in the cache directly when updating it, instead of updating the data in the cache?