02 How to Choose Caching Patterns and Components Based on Business

02 How to Choose Caching Patterns and Components Based on Business #

Hello, I am your caching teacher Chen Bo. Welcome to the 2nd lesson “Cache Read-Write Modes and Classification”. In this lesson, we will primarily learn about cache read-write modes and cache classification.

Cache Read-Write Modes #

There are 3 modes for reading and writing caches in a business system:

Cache Aside
Read/Write Through
Write Behind Caching

Cache Aside #

As shown in the above figure, in the Cache Aside mode, for write operations, the business application updates the database (DB) directly, deletes the key from the cache, and then the cache is updated by the DB driver. For read operations, the cache is read first, and if the cache does not exist, the DB is read, and the data read from the DB is also written back to the cache.

The characteristics of this mode are that the business-side deals with all data access details and ensures data consistency by directly deleting the cache and updating it using the DB update, based on the concept of Lazy computation. This greatly reduces the probability of data inconsistency between the cache and the DB.

If there is no dedicated storage service, and the business has relatively high data consistency requirements, or the business has complex cache data updates, then Cache Aside mode is more suitable. For example, during the initial development of microblogs, many businesses used this mode, where the cache data needed to be calculated using multiple original data sources. After some data changes, the cache is directly deleted. At the same time, a Trigger component is used to read the change log of the DB in real-time, then recalculate and update the cache. If the Trigger has not written to the cache when reading from the cache, the caller will manually load and calculate from the DB, and then write to the cache.

Read/Write Through #

#

As shown in the above figure, for the Cache Aside mode, the business application needs to maintain both cache and DB data storage, which is too cumbersome. Therefore, the Read/Write Through mode was invented. In this mode, the business application only needs to focus on one storage service. The operations of reading and writing cache and DB are all handled by the storage service. When the storage service receives a write request from the business application, it first checks the cache. If the data does not exist in the cache, it updates the DB only. If the data exists in the cache, it updates the cache first and then updates the DB. When the storage service receives a read request, if the cache is hit, it directly returns the response; otherwise, it loads from the DB, writes it back into the cache, and then returns the response.

The characteristics of this mode are that the storage service encapsulates all data processing details, and the business application only needs to focus on the business logic itself, making the system more isolated. Additionally, when performing write operations, if there is no data in the cache, no updates are made, and the memory efficiency is higher. The Outbox Vector of Weibo feed (i.e., the latest microblog list of a user) adopts this pattern. After less active users with fewer followers post a microblog, the Vector service will first query the Vector Cache. If there is no Outbox record of the user in the cache, the cache data of the user will not be written, and the DB will be directly updated and returned. Only if the cache exists, it will be updated through the CAS instruction.

Write Behind Caching #

The Write Behind Caching mode is similar to the Read/Write Through mode, where the data storage service manages the read and write operations of the cache and DB. The difference is that in Write Behind Caching, only the cache is updated and the DB is not directly updated. Instead, the DB is updated asynchronously in batches. The characteristic of this mode is that the write performance of data storage is the highest, which is especially suitable for business operations with frequent changes, especially for businesses that can merge write requests. For example, for some counting businesses, if a feed is liked 10,000 times, if updating the DB 10,000 times is costly, combining them into one request to add 10,000 directly is a very lightweight operation. However, this model has a significant drawback, that is, the consistency of the data deteriorates and data loss may even occur in some extreme scenarios. For example, when the system crashes or the machine crashes, there is a risk of data loss if some data has not been saved to the DB. Therefore, this read/write mode is suitable for businesses with particularly high change frequencies but not demanding high consistency. In this way, write operations can be asynchronously written to the DB in batch to reduce the DB pressure.

Speaking of this, the three read/write modes of the cache have been explained. You can see that each mode has its own advantages and disadvantages, and there is no best mode. In fact, it is not possible for us to design a perfect and best mode. Just like the space-time trade-off and the access latency for low cost mentioned earlier, high performance and strong consistency always conflict with each other. System design has always been about making trade-offs and making choices at every turn. This idea will run through the entire cache course. This may be another gain for us in learning this course, that is, how to make better trade-offs according to business scenarios and design better service systems.

Classification of Cache and Introduction to Common Caches #

The basic ideas, advantages, costs, and read/write modes of the cache have been introduced. Next, let’s look at the classifications of caches commonly used by Internet companies.

Classification by Hosting Level #

If classified by hosting level, caches are generally divided into Local Cache, Interprocess Cache, and Remote Cache.

Local Cache refers to the cache within the business process. This type of cache has extremely high read and write performance and no network overhead since it is within the business system process. However, the drawback is that it will be lost with the restart of the business system.
Interprocess Cache is an independent cache running locally. This type of cache has high read and write performance, will not lose data with the restart of the business system, and can greatly reduce network overhead. However, the business system and cache reside on the same host, making it operationally complex and prone to resource competition.
Remote Cache refers to the cache deployed across machines. This type of cache has a large capacity and is easy to scale because it is deployed on independent devices. However, remote cache requires cross-machine access, and under high read and write pressure, bandwidth can easily become a bottleneck.

Cache components for Local Cache include Ehcache, Guava Cache, etc. Developers can also easily build their own dedicated Local Cache using Map, Set, and other data structures. The cache components for Interprocess Cache and Remote Cache are the same, the only difference is the deployment location. These cache components include Memcached, Redis, Pika, etc.

Classification by Storage Medium #

Another common classification is based on the storage medium, which can be divided into in-memory cache and persistent cache.

In-memory cache stores data in memory and has high read/write performance. However, data in memory will be lost after the cache system restarts or crashes.
Persistent cache stores data on SSD/Fusion-IO hard disks. Under the same cost, the capacity of this type of cache can be more than one order of magnitude larger than that of in-memory cache. Data is persisted and not lost after a restart. However, the read/write performance is relatively lower by 1 to 2 orders of magnitude. Memcached is a typical in-memory cache, and Pika, as well as other cache components developed based on RocksDB, belong to persistent cache.