07 Case Analysis the Treasures of Omnipresent Caching in High Concurrency Systems

07 Case Analysis- The Treasures of Omnipresent Caching in High-Concurrency Systems #

In the previous lesson, we introduced “buffering”. In this lesson, I will introduce the twin brother of “buffering”, which is “caching”.

Similar to buffering, caching is perhaps the most widely used optimization technique in software. For example, multiple levels of cache exist in the core CPUs. In order to eliminate the difference between memory and storage, various caching frameworks like Redis have been emerging.

Caching has excellent optimization effects. It can make originally slow-loading pages open instantly and ease the immense pressure on databases.

Cache , essentially, is used to coordinate components that have greatly different speeds. As shown in the diagram below, by adding an intermediate layer, commonly used data can be stored in relatively high-speed devices.

Drawing 1.png

In our usual application development, caching is generally divided into in-process caching and out-of-process caching based on the physical location of the cache.

In this lesson, we will mainly focus on in-process caching. In Java, in-process caching refers to what we commonly call heap caching. In Spring’s default implementation, it includes Ehcache, JCache, Caffeine, Guava Cache, etc.

Guava’s LoadingCache #

Guava is a commonly used utility package. Its LoadingCache (abbreviated as LC below) is a very handy heap caching tool. By learning the structure of LC, we can understand the general ideas of heap caching design.

Caching is generally an expensive component with limited capacity. Setting the capacity too small or too large will both affect the caching performance:

  • If the cache space is too small, it will frequently remove highly hit elements, losing the purpose of caching.
  • If the cache space is too large, it not only wastes valuable caching resources, but also puts certain pressure on garbage collection.

Through Maven, you can import Guava’s jar package:

<dependency> 
    <groupId>com.google.guava</groupId> 
    <artifactId>guava</artifactId> 
    <version>29.0-jre</version> 
</dependency>

Now, let’s introduce the commonly used operations of LC:

Drawing 3.png

1. Cache Initialization #

First, we can set the size of LC using the following parameters. Generally, we only need to provide an upper limit for the cache.

  • maximumSize: This parameter is used to set the maximum capacity of the cache pool. When this capacity is reached, other elements will be cleared.
  • initialCapacity: The default value is 16, indicating the initial size.
  • concurrencyLevel: The default value is 4, which, in combination with the initial size, indicates that the cache memory will be divided into 4 segments to support high concurrent access.

2. Cache Operations #

So how are the cached data added? There are two modes:

  • Manually handle it using the put method. For example, I query a User object from the database and then manually put it into the cache using code.
  • Trigger proactively (this is also the origin of the term “Loading”) by providing an implementation of CacheLoader. In this way, the object can be lazily loaded when it is used.
public static void main(String[] args) { 
    LoadingCache<String, String> lc = CacheBuilder 
            .newBuilder() 
            .build(new CacheLoader<String, String>() { 
                @Override
public String load(String key) throws Exception { 
    return slowMethod(key); 
} 

The code above is an example for manual triggering. You can use the get method to retrieve the cached value. For example, when we execute lc.get(“a”), the first time will be slower because it needs to retrieve data from the data source; the second time returns instantly, indicating a cache hit. The specific sequence can be seen in the following figure.

Drawing 5.png

In addition to the built-in eviction strategy of LC, we can also manually delete an element using the invalidate method. Of course, these deletion operations on data can also be monitored by setting a listener, as shown in the following code:

.removalListener(notification -> System.out.println(notification))

3. Eviction Strategies #

The size of the cache is limited. What should we do when it is full? This requires an eviction strategy. Next, I will introduce three eviction strategies.

(1) The first eviction strategy is based on capacity.

This is easier to understand. It means that if the cache is full, it will remove other elements based on the LRU algorithm.

(2) The second eviction strategy is based on time.

  • One way is to use the expireAfterWrite method to set a time for the data to expire after it is written.
  • Another way is to use the expireAfterAccess method to set the earliest accessed element and delete it preferentially.

(3) The third eviction strategy is based on JVM’s garbage collection.

We all know that there are four levels of reference for objects: strong, soft, weak, and phantom. You can set the corresponding reference levels by using functions like weakKeys. When JVM garbage collects, it will actively clean up these data.

Regarding the third eviction strategy, there is a frequently asked interview question: What will happen if you set both weakKeys and weakValues functions at the same time in LC?

Answer: If both of these functions are set, it means that when there are no strong references related to both the key or value, the entire cache entry will be deleted. These two functions are often misunderstood.

4. Memory Issues Caused by Caches #

LC can monitor the loading and hit rate of the cache using the recordStats function.

It is worth noting that LC is based on the number of data entries rather than the physical size of the cache. Therefore, if the objects you cache are particularly large, it will cause unpredictable memory usage.

Around this point, I would like to share a common memory issue caused by incorrect use of caches.

Most heap-based caches set the object reference to weak or soft references so that when memory is low, cache space can be released first to make room for other objects. The intention behind this approach is good, but it can lead to problems.

When your cache is heavily used and has a large amount of data, the cache can occupy a large amount of memory. If garbage collection (GC) occurs at this time, the cache space will be released but quickly filled again, triggering another round of garbage collection. This back and forth can consume a large amount of CPU resources, making the cache lose its meaning.

Therefore, in this case, setting the cache smaller to lighten the burden on the JVM is a good solution.

Cache Algorithms #

1. Introduction to Algorithms #

The most commonly used algorithms for heap-based caches are FIFO, LRU, and LFU.

  • FIFO This is a First-In-First-Out (FIFO) pattern. If the cache capacity is full, it will remove the earliest added element. This cache implementation is simple, but there are few functions that comply with the FIFO queue pattern, so it is less commonly used.

  • LRU

LRU stands for Least Recently Used, when the cache capacity is reached, it will prioritize removing the least recently used data, LRU is currently the most commonly used caching algorithm, and we will implement a simple one using Java API later.

  • LFU

LFU stands for Least Frequently Used. Compared to the time dimension of LRU, LFU adds a dimension of access frequency. If the cache is full, it will prioritize removing the element with the least access frequency; and when there are multiple elements with the same access frequency, it will prioritize removing the least recently used element.

2. Implementing an LRU algorithm #

There are multiple ways to implement an LRU algorithm in Java, the most commonly used one is LinkedHashMap, which is also a frequently asked point in interviews that you should pay attention to.

First, let’s take a look at the constructor of LinkedHashMap:

public LinkedHashMap(int initialCapacity, float loadFactor, boolean accessOrder)

The accessOrder parameter is key to implementing LRU. When the value of accessOrder is true, the objects will be sorted based on their access order; when the value of accessOrder is false, the objects will be sorted based on their insertion order. As mentioned above, sorting based on access order is essentially LRU.

Drawing 7.png

As shown in the above diagram, following the general design of caching, similar to LC, when you add a new object to LinkedHashMap, the removeEldestEntry method will be called. By default, this method returns false, indicating that it will never expire. We only need to override this method and return true when the capacity is exceeded, triggering the removal action. The key code is as follows:

public class LRU extends LinkedHashMap { 
    int capacity; 
    public LRU(int capacity) { 
        super(16, 0.75f, true); 
        this.capacity = capacity; 
    } 
    @Override 
    protected boolean removeEldestEntry(Map.Entry eldest) { 
        return size() > capacity; 
    } 
}

Compared to LC, this code implements a simpler functionality, it is not even thread-safe, but it embodies the general idea of cache design and is the simplest way to implement LRU in Java.

Further Acceleration #

In the Linux system, the free command can be used to see the usage status of system memory. Among them, there is an area called cached, which occupies a large amount of memory space.

Drawing 8.png

As shown in the figure, this area actually stores the operating system’s file cache, so when the application needs it again, it doesn’t have to go through the disk again but can be quickly loaded from the memory.

In terms of file reading cache, the operating system does more. Since disks are good at sequential read and write, the efficiency is low for random read and write. Therefore, the operating system uses an intelligent readahead algorithm to load data from the hard disk to the cache.

The readahead algorithm has three key points:

  • Predictive: It can predict the subsequent operation target of the application based on its usage data in advance.
  • Ahead: It can load these data into the cache in advance to ensure a high hit rate.
  • Batch: It merges small and frequent read operations into sequential batch reads to improve performance. Preloading techniques are generally intelligent and can cover most subsequent read operations. As an extreme example, if our data set is relatively small and the access frequency is very high, we can use the method of complete loading to replace lazy loading. Load the data into the cache at system startup.

General Ideas for Cache Optimization #

Generally, caching mainly targets read operations. When your functionality encounters the following scenarios, you can choose to use a caching component for performance optimization:

  • There is hot spot data that is frequently used and can be cached.
  • Read operations are significantly more frequent than write operations.
  • There is a significant performance difference between downstream functionalities, and the downstream service has limited capacity.
  • After joining the cache, it will not affect the correctness of the program or introduce unpredictable complexity.

Cache components are similar to buffering. They are introduced as an intermediate layer when the speed of two components does not match, but they serve different purposes:

  • Buffering: The data is generally used only once and the flush operation is performed when the buffer is full.
  • Caching: After the data is loaded, it can be used multiple times and the data will be shared multiple times.

The most important indicator of a cache is the hit rate, which is affected by the following factors.

(1) Cache Capacity

The capacity of the cache is always limited, so there is a problem of evicting cold data. However, the cache is not the bigger the better, as it should not obviously occupy the memory of the business.

(2) Data Set Type

If the cached data is non-hot spot data or cold data that is no longer used after a few operations, the hit rate will be low, and the cache will lose its function.

(3) Cache Invalidation Strategy

The cache algorithm also affects the hit rate and performance. The most efficient algorithm currently used by Caffeine is W-TinyLFU, which has a very high hit rate and smaller memory usage. The latest version of spring-cache already supports Caffeine by default.

The following diagram shows the performance of this algorithm, and you can find JMH test code from the official GitHub repository.

Drawing 9.png

It is recommended to use Guava Cache or Caffeine as the heap cache solution, and then adjust the size and content of the cache through a series of monitoring indicators provided by them. Generally speaking:

  • When the cache hit rate exceeds 50%, its effectiveness starts to become significant.
  • If the cache hit rate is below 10%, it is necessary to consider the necessity of the caching component.

The introduction of caching components can significantly improve system performance, but it can also introduce new problems. One of the most typical and frequently asked interview questions is: How to ensure the synchronization between the cache and the source data? We will explain this in the next lesson.

Summary #

Finally, let’s summarize the key points of this lesson.

We first used Guava’s LoadingCache as an example to explain some design ideas for in-memory caching. At the same time, we introduced a memory failure caused by improper use of the cache, which are frequent questions in interviews. Then, we explained three commonly used caching algorithms: LRU, LFU, and FIFO. We implemented a simplest LRU cache based on LinkedHashMap.

This lesson also mentioned methods to further accelerate application using techniques such as preloading or readahead, which are widely used in operating systems and databases and provide significant performance improvement.

Finally, we mentioned that you can adjust the cache hit rate by utilizing some monitoring data provided by the cache framework. To achieve a hit rate of 50% or more is considered to have a good effect.

Next, let’s briefly discuss two examples of cache application.

  • The first one is HTTP 304 status code, which means “Not Modified”. The browser client sends a conditional request, and the server can determine if the buffered file is up-to-date based on the If-Modified-Since header. If it is, the client can directly use the cache without re-reading.
  • The other one is about CDN, which is a kind of disguised cache. Users read file content from the nearest and fastest node. If the node does not cache the file, the CDN node will retrieve one from the origin server, so that next time when there is a request for reading the same content, it can be returned quickly.

Cache applications are very common, and in your everyday work, you can also try to summarize and make analogies.