05 How to Deal With Cache Data Inconsistency and Race Conditions

05 How to Deal with Cache Data Inconsistency and Race Conditions #

Hello, I am your caching instructor Chen Bo. Welcome to Lesson 5, “Classic Issues Related to Cache Data”.

Inconsistent Data #

Problem Description #

The fourth classic problem among the seven caching problems is inconsistent data. The same set of data can exist simultaneously in both the database (DB) and the cache, and it is possible for the data in the DB and the cache to be inconsistent. If there are multiple copies of the cache, inconsistencies can also occur among the data in different cache copies.

Analysis of Causes #

Inconsistency issues are mostly related to abnormal cache updates. For example, if the cache fails to be updated after updating the DB, old data will be stored in the cache. In addition, if the system adopts a consistent hash distribution and employs automatic rehash migration strategy, dirty data can be generated after multiple node online/offline switches. When there are multiple cache copies, a failure to update a particular copy will also result in that copy containing old data.

Business Scenarios #

There are also many scenarios that can cause data inconsistency. As shown in the diagram below, when the bandwidth of the cache machine is fully occupied or when there are fluctuations in the network in the data center, cache updates may fail, resulting in new data not being written to the cache, which leads to inconsistent data between the cache and the DB. During cache rehashing, if a cache machine experiences repeated abnormalities and multiple online/offline switches, multiple rehashing requests will occur. As a result, multiple cache nodes will contain dirty data.

Solution #

One should strive to ensure data consistency. Here are three solutions that can be selected based on the actual situation:

The first solution is to retry cache updates. If retrying fails, write the failed key to a queue service. Once the cache access is restored, delete these keys from the cache. When these keys are queried again, reload them from the DB to ensure data consistency.
The second solution is to appropriately shorten the cache time, so that the cache data expires earlier and can be reloaded from the DB to ensure the eventual consistency of the data.
The third solution is not to use rehash migration strategy but instead adopt a cache tiering strategy to avoid the generation of dirty data as much as possible.

Concurrent Competition for Data #

Problem Description #

The fifth classic problem is concurrent competition for data. In high-traffic internet systems, it is easy to encounter concurrent competition for data in cache access. Concurrent competition for data refers to a situation where, in a highly concurrent access scenario, when cache access fails to find the data, a large number of requests concurrently query the DB, resulting in a significant increase in DB pressure.

Data concurrent competition occurs mainly because multiple processes/threads have a large number of concurrent requests to obtain the same data, and the cache does not contain this data key for various reasons such as expiration or eviction. These processes/threads do not have any coordination among them, so they all query the DB concurrently for the same key, ultimately resulting in a significant increase in the DB load, as shown in the diagram below.

Business Scenarios #

Data concurrent competition is also quite common in high-traffic systems. For example, in a ticketing system, if the cache information of a certain train route expires, but there are still a large number of users querying information about that train route. Another example is in a microblogging system, if a certain microblog is being recently evicted from the cache but still has a large number of forward, comments, and likes. Both of these situations cause concurrent competition in reading the cache for that train route information or for that microblog.

Solution #

To solve concurrent competition, there are two solutions:

The first solution is to use a global lock. As shown in the diagram below, when the cache request misses, it first attempts to acquire a global lock. Only the thread that successfully acquires the global lock is allowed to query the DB to load the data. When other processes/threads read the cache data and miss, if they find that the key has a global lock, they will wait until the previous thread stores the data back into the cache from the DB before retrieving it again from the cache.

The second solution is to maintain multiple backups of the cache data. Even if the data in one backup expires or is evicted, access to other backups is still possible, reducing the occurrence of data concurrent competition, as shown in the diagram below.