27 Distributed Transactions - Should We Use 2PC or Not #

Hello, I am Jiang Chengyao. Earlier, we learned about the sharding design, index design, middleware selection, and striping design in distributed databases. However, we have been avoiding the most troublesome problem in distributed databases, which is distributed transactions.

Today, let’s learn about the concept of distributed transactions and how to implement them in massive internet businesses.

Concept of Distributed Transactions #

I believe you are already very familiar with the concept of transactions. A transaction should satisfy the ACID properties, which can be summarized as follows:

A (Atomicity): All operations within a transaction must be either all done or none done.
C (Consistency): The integrity of data is not compromised from the start of the transaction to its end. This includes uniqueness constraints, foreign key constraints, etc.
I (Isolation): The operations performed by one transaction are not visible to another transaction; it appears as if they are executed serially.
D (Durability): Once a transaction is committed, the data modifications are permanent. Even in the event of a crash, the data can be recovered.

It is important to note that the default transaction isolation levels in current databases do not meet the requirements of isolation. This applies to relational databases such as MySQL, Oracle, PostgreSQL, etc. Most database transaction isolation levels are set to READ-COMMITTED by default, which does not solve the problems of repeatable reads and phantom reads.

However, in the majority of business scenarios, these two situations are rarely encountered. Achieving full isolation often comes at the cost of performance. Therefore, most relational databases choose a trade-off between performance and absolute isolation.

So, what exactly is a distributed transaction? Simply put, it is about achieving the ACID properties of transactions in a distributed database architecture.

Earlier, we discussed a principle of distributed database architecture design, which is that most operations should be able to be executed as a unit, within a single shard. For example, queries on user order details can be completed within a single shard because the shard keys are customer IDs. In this case, the ACID properties of transactions can be satisfied.

However, consider the following core business logic of an e-commerce system. It cannot be completed within a single shard, as it involves a user purchasing a product:

START TRANSACTION;

INSERT INTO orders VALUES (......);

INSERT INTO lineitem VALUES (......);

UPDATE STOCK SET COUNT = COUNT - 1 WHERE sku_id = ?

COMMIT;

As we can see, in a distributed database architecture, the shard keys for tables orders and lineitem are user IDs. However, the stock table represents inventory and is not associated with user IDs. Therefore, the sharding rules for stock are definitely different from orders and lineitem.

Thus, the majority of the above transaction operations cannot be completed as a unit within a single shard. This is a distributed transaction, which requires the changes to the user-focused tables (orders and lineitem) and the product-focused table (stock) to either all succeed or all fail together.

The common implementation of distributed transactions is through the use of Two-Phase Commit (2PC). Now let’s take a look at 2PC.

Implementation of Distributed Transactions with 2PC #

2PC is a strongly consistent implementation of distributed transactions at the database level. In 2PC, a transaction coordinator role is introduced to coordinate and manage the commit and rollback of each participant (also known as local resources). The “two phases” in 2PC refer to the prepare phase and the commit phase.

In the implementation of 2PC, the participants are the individual MySQL database instances. So, who is the transaction coordinator? This depends on the architecture of the distributed database. If the architecture of the distributed database involves direct connection to the shards through sharding rules, then the transaction coordinator is the business program itself. As shown in the following diagram:

If a database middleware is employed as part of the architecture, then the transaction coordinator is the database middleware. As shown in the following diagram:

From the above diagrams, when using a database middleware in a distributed database architecture, the implementation of distributed transactions can be hidden from the upper-layer services. The services do not need to care whether the underlying transaction is a local transaction or a distributed transaction; it is akin to a single-machine transaction itself.

2PC requires that all participants successfully complete the first phase, the prepare operation, for the distributed transaction to be committed and achieve persistence. The code logic for 2PC is illustrated in the following diagram:

The above is the Java code implementation of 2PC. As we can see, only when both participants successfully complete the first phase, prepare, can the distributed transaction be committed. However, one difficulty with 2PC is that if the prepare phase is successful but one of the nodes fails during the second phase commit, after the failed node is restored or after master-slave switching, the previously executed successful prepare transaction on that node needs to be handled manually. This transaction is called a “suspended transaction”.

Users can use the command XA_RECOVER to view suspended transactions on the node:

If there are suspended transactions, the lock resources held by these transactions are not released. It can be viewed by the command SHOW ENGINE INNODB STATUS:

From the above figure, it can be seen that transaction 5136 is in the PREPARE state and has been 218 seconds, which is a suspended transaction, and this transaction is only holding two row lock objects.

You can use the command XA RECOVER to manually submit it:

By now, you should have a clear understanding of the implementation and usage of distributed transactions using the 2PC method. It is a strong consistent transaction solution implemented at the database layer. Its advantages are simple usage, and most current languages support the implementation of 2PC. If middleware is used, the business does not need to be concerned about whether the transaction is distributed.

However, its disadvantage is that the overhead of transaction commit increases from one COMMIT to two PREPARE and COMMIT. For internet businesses with massive transactions, the performance of 2PC is unacceptable. Therefore, there is a need for business-level implementation of distributed transactions, known as flexible or “softer” transactions.

Flexible Transactions #

Flexible transactions refer to distributed transactions implemented at the business layer, achieving the work of distributed transactions through eventual consistency. It can be said that by sacrificing a certain level of consistency, the performance requirements of distributed transactions are met.

Well-known flexible transaction options in the industry include frameworks such as TCC, SAGA, and SEATA, as well as implementation through the use of message tables. These solutions achieve eventual consistency through compensation mechanisms. The difficulty of flexible transactions lies in handling error logic.

For simplicity, let’s use a message table as an example to explain flexible transactions. For the core e-commerce order placement logic mentioned above, it can be divided into 3 stages using a message table:

Stage 1:

START TRANSACTION;

Order number, order status #

INSERT INTO orders VALUES (…)

INSERT INTO lineitem VALUES (…)

COMMIT;

Stage 2:

START TRANSACTION;

UPDATE stock SET count = count - 1 WHERE sku_id = ?

o_orderkey is the primary key in the message table, with a unique constraint #

INSERT INTO stock_message VALUES (o_orderkey, …)

COMMIT;

Stage 3:

UPDATE orders SET o_orderststus = ‘F’ WHERE o_orderkey = ?

In the above flexible transaction, the column o_orderstatus in the order table is used to record whether the flexible transaction is completed or not. The initial state is unfinished. The table stock_message records whether the corresponding order has deducted the corresponding inventory. If Stage 2 is completed, the flexible transaction must be completed. Stage 3 sets the flexible transaction as completed, ensuring eventual consistency.

Next, let’s consider if Stage 2 fails to execute, i.e., a node fails during the execution process. The compensation logic in the background scans the unfinished orders with o_orderstatus in the order table and checks if there is a corresponding record in the stock_message table. If there is, Stage 3 is executed. If not, the user can be informed that the order placement has failed.

If Stage 3 fails to execute, the handling logic is similar to that of Stage 2. The difference is that now Stage 2 must have completed, so only Stage 3 needs to be executed.

Therefore, the compensation logic program here is to scan the overdue orders in real-time or periodically, judge whether the flexible transaction should continue or fail based on the message table, and then perform necessary business processing in case the transaction fails.

The above is a rough explanation of the principle of flexible transactions implemented by the mentioned frameworks, with some differences in the compensation logic and perhaps more general usage.

For internet businesses with massive transactions, flexible transactions perform better. Therefore, internet businesses such as Alipay and Taobao use flexible transactions to achieve the implementation of distributed transactions.