Extra Meal What Is Data's Strong and Weak Consistency

Extra meal What is data’s strong and weak consistency #

Hello, I’m Liu Chao.

In [Lesson 17] when discussing concurrent containers, I mentioned “strong consistency” and “weak consistency”. Many students left comments expressing that they are not familiar with or have a vague understanding of these concepts. Today, I will provide a detailed explanation in this extra lesson.

Speaking of consistency, it is actually a relevant issue in many aspects of a system. In addition to ensuring the consistency of shared variable data in concurrent programming, there is also consistency in the ACID properties of databases, and consistency in the CAP theory of distributed systems. Here, we will mainly discuss “the consistency of shared variables in concurrent programming”.

In concurrent programming, Java uses shared memory to implement operations on shared variables, so it involves the issue of data consistency in multi-threaded programming.

Let me start by explaining the potential issues that may arise when multiple threads operate on a shared variable using a classic example. Assume we have two threads (Thread 1 and Thread 2) executing the following methods, where x is a shared variable:

// Code 1
public class Example {
    int x = 0;
    public void count() {
        x++;                     //1
        System.out.println(x);   //2
    }
}

img

If both threads run concurrently, the value of the variable for the two threads can result in the following three possibilities:

img

Java Memory Model #

We can easily understand the results of 2,1 and 1,2, but why would the result 1,1 appear?

We know that Java uses a shared memory model to achieve information exchange and data synchronization between multiple threads. Before explaining why this result occurs, let’s first take a simple look at Java’s memory model (which will be explained in detail in Lesson 21). When a program is running, local variables will be stored in the Java virtual machine stack, while shared variables will be stored in the heap memory.

img

Since local variables are created when threads are created and destroyed when threads are destroyed, they are stored in the stack. As we can see from the above diagram, Java stack data is not shared by all threads, so we don’t need to worry about the consistency of its data.

Shared variables are stored in the heap memory or method area. As shown in the above diagram, data in the heap memory and method area is shared by threads. When shared variables in the heap memory are accessed by different threads, they will be loaded into their own working memory, which is the CPU’s cache.

CPU cache can be divided into L1 cache, L2 cache, and L3 cache. All the data stored in each level of cache is a part of the next level cache. When the CPU wants to read cache data, it first looks in the L1 cache. If it’s not found, it then looks in the L2 cache. If it’s still not found, it looks in the L3 cache or main memory.

If multiple threads are running on a single-core CPU and accessing shared data in the process, after the CPU loads the shared variable into the cache, when different threads access cache data, they will map to the same cache location. Therefore, even if a thread switch occurs, the cache will still not be invalidated.

If multiple threads are running on a multicore CPU, each core has its own L1 cache. If multiple threads running on different cores access shared variables, each core’s L1 cache will cache a copy of the shared variable.

Let’s assume that Thread A retrieves a cached data from the heap memory, and at that time, the value of the cached data in the heap memory is 0. This cached data is loaded into the L1 cache, and after the operation, the value of the cached data becomes 1, and then it is flushed to the heap memory.

Before it is flushed to the heap memory, another thread B loads the cached data of 0 from the heap memory into the L1 cache of another core. At this time, Thread A flushes the data of 1 to the heap memory, while Thread B actually gets the value of 0 from the cached data.

At this point, the data inconsistency problem arises because the data in the kernel cache is inconsistent with the data in the heap memory, and when Thread B flushes the cache to the heap memory, it will overwrite the data modified in Thread A.

img

After understanding the memory model, combined with the above explanation, we can now understand how the result of 1,1 in the first code snippet is produced. By now, I believe you can understand the running result shown in the diagram with 1,1.

img

Reordering #

In addition, there is also a problem of reordering in the Java Memory Model. Please take a look at the following code:

// Code 1
public class Example {
    int x = 0;
    boolean flag = false;
    public void writer() {
        x = 1;                //1
        flag = true;          //2
    }

    public void reader() {
        if (flag) {           //3
             int r1 = x;      //4
             System.out.println(r1==x);
        }
    }
}

img

If two threads are running concurrently, there are two possible values for the variables in thread 2:

img

Now let’s take a look at the result of r1=1, as shown in the following diagram:

img

How is r1=0 obtained? Let’s take a look at another timing diagram:

img

Without affecting the calculation results, the compiler may change the order of instruction execution in the code, especially in some scenarios where optimization can be performed.

For example, in the following case, in order to minimize the number of register reads and writes, the compiler may fully reuse the stored values in registers. If no reordering optimization is performed, the normal execution order is steps 1/2/3. However, after reordering optimization during compilation, the execution steps may become steps 1/3/2 or 2/1/3, which reduces the number of register accesses.

int x = 1; // Step 1: Load the memory address of variable x into the register, load 1 into the register, CPU writes 1 to the memory specified by the register through the mov instruction
boolean flag = true; // Step 2: Load the memory address of variable flag into the register, load true into the register, CPU writes 1 to the memory specified by the register through the mov instruction
int y = x + 1; // Step 3: Reload the memory address of variable x into the register, load 1 into the register, CPU writes 1 to the memory specified by the register through the mov instruction

In the JVM, reordering is a very important aspect, especially in concurrent programming. If the JVM could reorder them arbitrarily, it could potentially cause a series of problems in concurrent programming, including consistency issues.

Happens-before Rule #

To solve this problem, Java proposes the Happens-before rule to regulate the execution order of threads:

  • Program order rule: In a single thread, code execution is ordered. Although there may be reordering of instructions for execution, the final result is consistent with sequential execution.
  • Locking rule: If a lock is held by a thread, other threads can only obtain the lock operation after the thread releases the lock.
  • Volatile variable rule: If a thread is writing to a volatile variable, other threads reading the variable will see the write operation happen before the read operation.
  • Thread start rule: The start() method of a Thread object happens-before every action in the thread.
  • Thread termination rule: All actions in a thread happen-before detection of that thread’s termination.
  • Finalizer rule: The completion of an object’s initialization happens-before its finalize() method begins.
  • Transitivity: If operation A happens-before operation B, and operation B happens-before operation C, then operation A happens-before operation C.
  • Thread interruption rule: A call to the interrupt() method of a thread happens-before the code detects the interruption event of the thread.

Based on these rules, we can classify consistency into the following levels:

Strict consistency: All read and write operations are executed in the order of a global clock, and at any given time, the cached data that threads read is always the same. Hashtable is an example of strict consistency.

img

Sequential consistency: The overall execution of multiple threads may be unordered, but for each individual thread, its execution is ordered. This ensures that any read operation always reads the most recently written data. The use of the volatile keyword can prevent instruction reordering, so the program with volatile variables belongs to sequential consistency.

img

Weak consistency: It cannot guarantee that any read operation will always read the most recently written data. However, it guarantees that the written data will eventually be read. Implementations of weak consistency include using a single write lock and lock-free reading.