25 Cache Anomaly How to Solve Data Inconsistency Between Cache and Database

25 Cache Anomaly How to Solve Data Inconsistency between Cache and Database #

In the practical application of Redis cache, we often encounter several common exceptions: inconsistency between the cache and the database, cache avalanche, cache penetration, and cache breakdown.

Whenever we use Redis cache, we inevitably face the problem of ensuring consistency between the cache and the database. This can be considered a “must-answer question” in Redis cache application. Most importantly, if the data is inconsistent, the data read by the business application from the cache will not be the latest, which can lead to serious errors. For example, if we cache the inventory information of e-commerce products in Redis, incorrect inventory information may cause errors in the order placement operation at the business layer, which is unacceptable. Therefore, in this lesson, I will focus on discussing this problem. I will introduce the issues of cache avalanche, penetration, and breakdown in the next lesson.

Next, let’s take a look at how the inconsistency between the cache and the database occurs.

How does data inconsistency occur between cache and database? #

First of all, let’s clarify what “data consistency” means here. Actually, “consistency” here includes two situations:

The data value in the cache needs to be the same as the value in the database if there is data in the cache.

If there is no data in the cache, the value in the database must be the latest value.

Anything that doesn’t meet these two conditions belongs to the problem of data inconsistency between the cache and the database. However, the occurrence of cache data inconsistency varies depending on the cache read/write mode, and our response methods will also be different. Therefore, let’s first understand the cache inconsistency in different modes according to the cache read/write mode. In Lecture 23, I mentioned that we can divide the cache into read-write caches and read-only caches based on whether write requests are accepted.

For a read-write cache, if you want to perform data creation, deletion, or modification, you need to do it in the cache and decide whether to synchronize the write to the database based on the write-back strategy adopted.

Synchronous direct write strategy: When writing to the cache, also write to the database synchronously, so the data in the cache and the database are consistent.

Asynchronous write-back strategy: When writing to the cache, do not write to the database synchronously. Instead, write back to the database when the data is evicted from the cache. When using this strategy, if the cache fails before the data is written back to the database, the database will not have the latest data.

Therefore, for a read-write cache to ensure the data consistency between the cache and the database, the synchronous direct write strategy must be adopted. However, it is important to note that if this strategy is adopted, both the cache and the database need to be updated simultaneously. Therefore, we need to use the transaction mechanism in the business application to ensure that the update of the cache and the database has atomicity, which means that they should either be updated together or not updated at all, returning an error message for retry. Otherwise, we cannot achieve synchronous direct writing.

Of course, in some scenarios, our requirements for data consistency may not be so high. For example, if the cache is for non-critical attributes of e-commerce products or the creation/modification time of short videos, we can use the asynchronous write-back strategy.

Now let’s talk about the read-only cache. For a read-only cache, if there is new data, it will be directly written to the database. If there is data modification or deletion, the data in the read-only cache needs to be marked as invalid. This way, when the application accesses this modified or deleted data again, since there is no corresponding data in the cache, cache miss will occur. Then the application will read the data from the database into the cache, so that subsequent data accesses can be directly retrieved from the cache.

Next, let’s explain how the add, modify, and delete operations of data are performed by taking the example of Tomcat writing and modifying data in MySQL. The figure below shows the process:

From the figure, we can see that no matter if new data (Insert operation), modification (Update operation), or deletion (Delete operation) of data X is performed by the application running on Tomcat, it will directly perform the operation in the database. Of course, if the application performs a modification or deletion operation, the data in the cache will also be deleted. However, will data inconsistency occur in this process? Considering that the situation is different for new data and modified or deleted data, let’s look at them separately.

  1. New data

If it is new data, it will be directly written to the database without any operation on the cache. At this time, there is no new data in the cache itself, and the database has the latest value. This situation meets the second case of consistency mentioned earlier, so the data in the cache and the database are consistent.

  1. Modified or deleted data

If a modification or deletion operation occurs, the application needs to update the database and delete the data in the cache. If the atomicity of these two operations cannot be guaranteed, that is, they either both complete or both fail to complete, data inconsistency will occur. This problem is more complex, so let’s analyze it.

Let’s assume that the application first deletes the cache and then updates the database. If the cache deletion is successful but the database update fails, when the application accesses the data again, there will be a cache miss because there is no data in the cache. Then the application will access the database, but the value in the database will be the old value, so the application will read the old value.

Let me give you an example to illustrate. Please take a look at the image below.

The application wants to update the value of data X from 10 to 3. It first deletes the cache value of X in Redis, but the database update fails. If there are other concurrent requests to access X at this time, they will find a cache miss in Redis, and then the request will access the database and read the old value 10.

You may wonder if we update the database first and then delete the cache value, can we solve this problem? Let’s analyze it further.

If the application successfully updates the database first but fails to delete the cache value, the database will have the new value, but the cache will have the old value. This is definitely inconsistent. At this time, if other clients send requests to access X, according to the normal cache access process, they will first query the cache and find a cache hit, but they will read the old value.

Let me use an example to explain this again.

The application wants to update the value of data X from 10 to 3. It first successfully updates the database, and then deletes the cache value of X in Redis, but this operation fails. At this time, the new value of X in the database is 3, and the cached value of X in Redis is 10, which is inconsistent. If there happens to be another client sending a request to access X at this time, it will first query Redis and find a cache hit, but it will read the old value 10.

Alright, so far, we can see that in the process of updating the database and deleting the cache value, regardless of the order of these two operations, as long as one operation fails, it will cause the client to read the old value. I have created the following table to summarize the two situations just mentioned.

Now that we know the cause of the problem, how can we solve it?

How to Solve Data Inconsistency Issues? #

First, let me introduce you to a method: retry mechanism.

Specifically, you can temporarily store the cache value to be deleted or the database value to be updated in a message queue (such as using Kafka message queue). When the application fails to successfully delete the cache value or update the database value, you can retrieve these values from the message queue and try to delete or update them again.

If the delete or update operation is successful, we need to remove these values from the message queue to avoid duplicate operations. At this point, we can also ensure the data consistency between the database and the cache. Otherwise, we need to retry again. If the retries exceed a certain number of times without success, we need to send an error message to the business layer.

The following image shows a scenario where the cache deletion fails initially, but succeeds after a retry. You can take a look.

So far, we have discussed the situation where one of the operations (deleting the cache value and updating the database value) fails. In reality, even if both operations are successful on the first execution, when there are a large number of concurrent requests, the application may still read inconsistent data.

Similarly, let’s consider two cases based on different deletion and update orders. In these two cases, our approaches to solving the issues are also different.

Case 1: Delete the cache first, then update the database.

Assuming that thread A deletes the cache value before updating the database (for example, due to network latency), and thread B starts reading the data, at this point, thread B will find the cache missing and can only read from the database. This will lead to two problems:

  • Thread B reads an outdated value.
  • Thread B reads from the database in the absence of cache, so it will also write the outdated value to the cache, which may cause other threads to read the outdated value from the cache.

By the time thread B finishes reading from the database and updates the cache, thread A starts updating the database. At this point, the data in the cache is outdated while the data in the database is the latest, resulting in inconsistency.

I have summarized this situation in a table below.

What can we do in this case? Let me provide you with a solution.

After thread A updates the database value, we can make it sleep for a short period of time before performing the cache deletion operation again.

The reason for adding this sleep time is to allow thread B to read data from the database, write the missing data into the cache, and then thread A can perform deletion. Therefore, the sleep time of thread A needs to be greater than the time it takes for thread B to read data and write into the cache. How to determine this time? I suggest you measure the time for reading data and writing cache operations during the runtime of the business program and use that as an estimation basis.

By doing this, when other threads read the data, they will find the cache missing and therefore read the latest value from the database. Because this solution delays the deletion operation after the first deletion of the cache value, it is also called “delayed double delete”.

The following pseudocode is an example of the “delayed double delete” solution, you can take a look:

redis.delKey(X)
db.update(X)
Thread.sleep(N)
redis.delKey(X)

Case 2: Update the database value first, then delete the cache value.

If thread A deletes the value in the database but has not yet removed the cache value, thread B starts reading the data at this point. In this case, when thread B queries the cache, it finds a cache hit and directly reads the outdated value from the cache. However, in this case, if there are not many concurrent read requests for the cache, then there won’t be many requests reading the outdated value. Moreover, thread A will usually delete the cache value quickly, so when other threads read again, they will find the cache missing and retrieve the latest value from the database. Thus, this case has a relatively small impact on the business.

Let me summarize the scenario of updating the database value first and then deleting the cache value in a table.

Alright, by now we have learned that the inconsistency between the cache and the database is generally caused by two reasons, and I have provided corresponding solutions.

  • If the cache value deletion or database update fails, you can use the retry mechanism to ensure the success of the deletion or update operation.
  • In the process of deleting the cache value and updating the database, if there are other concurrent read operations, leading to other threads reading outdated values, the corresponding solution is the “delayed double delete”.

Summary #

In this lesson, we learned about the most common problem encountered when using Redis caching, which is the inconsistency between the cache and the database. To address this issue, we can analyze it in two scenarios: read-write cache and read-only cache.

For read-write cache, if we adopt a synchronous write-back strategy, we can ensure consistency between the cache and the data in the database. The situation with read-only cache is more complex, and I have summarized a table to help you better understand the reasons, symptoms, and solutions for data inconsistency.

Data Inconsistency Table

I hope you can include this table I summarized in your study notes and review it from time to time.

Finally, I would like to add a few more points. In most business scenarios, we use Redis as a read-only cache. For read-only cache, we can either delete the cache value first and then update the database, or update the database first and then delete the cache. My suggestion is to prioritize the method of updating the database first and then deleting the cache for two main reasons:

  1. Deleting the cache value first and then updating the database may cause requests to access the database due to cache miss, resulting in pressure on the database.
  2. If the time for reading the database and writing the cache is difficult to estimate in the business application, setting the wait time for delayed double deletion is not easy.

However, when using the method of updating the database first and then deleting the cache, there is also one thing to note. If the business layer requires consistent data reading, we need to temporarily store concurrent read requests in the Redis cache client during the database update, and then read the data after the database is updated and the cache value is deleted to ensure data consistency.

One Question per Lesson #

As usual, I have a little question for you. In this lesson, I mentioned that when performing delete or update operations on data in a read-only cache, we need to remove the corresponding cached value in the cache. I’d like you to think about what benefits and drawbacks there might be if we don’t delete the cached value but instead directly update it during this process.

Feel free to write down your thoughts and answers in the comment section. Let’s communicate and discuss together. If you found today’s content helpful, feel free to share it with your friends or colleagues. See you in the next lesson.