18 Financial High Availability Architecture Indispensable Data Core

18 Financial High Availability Architecture - Indispensable Data Core #

In Lesson 17, we learned about the three major architectural designs for high availability: data-based high availability, business-based high availability, and integrated high availability architecture.

These architectures only solve the problem of business continuity: that is, when a server becomes unavailable for various reasons, causing the MySQL database to be unavailable, the business can be quickly restored. However, for stateful database services, in some core business systems such as e-commerce and finance, data consistency must also be ensured.

In this context, “data consistency” refers to the fact that in any disaster scenario, no data can be lost (this replication method is generally referred to as “strong synchronization”).

Today, let’s take a look at how to design a highly available architecture for MySQL in this highest requirement (data consistency) business scenario.

Selection of Replication Type #

In Lesson 15, we discussed that core businesses such as banking, insurance, and securities need to ensure data consistency. In order to achieve strong synchronization of data, the configuration of replication needs to use the lossless semi-synchronous replication mode.

Internally in MySQL, the parameter rpl_semi_sync_master_wait_point needs to be set to AFTER_SYNC.

However, in high availability design, after the database FAILOVER is complete, additional operations must sometimes be performed on the original master to ensure the complete consistency of the master-slave data.

Let’s take a look at this picture:

From the picture, we can see that even with lossless semi-synchronous replication enabled, there is still a possibility that the last group of transactions may not be uploaded to the slave when the master machine goes down. The crashed host in the picture has already committed the transaction up to 101, but the slave only receives transaction 100. If Failover occurs at this time and the slave is promoted to the master, then:

It can be seen that after the master-slave switch is complete, the new MySQL starts to write new transaction 102. If the old master server recovers from the crash at this time, transaction 101 will not be synchronized to the new master server, resulting in inconsistent master-slave data.

But the advantage of setting AFTER_SYNC for lossless semi-synchronous replication is that although transaction 101 has been committed on the original master, it is not visible to users until the slave receives and returns the ACK for this transaction, so the user does not feel that the transaction has already been committed.

Therefore, when designing a highly available solution with strong data consistency, lossless semi-synchronous replication should be selected. In addition, when a FAILOVER occurs after a crash, if the old master recovers, additional processing is required for the data that has been committed on the old master but has not been sent to the slave.

Disaster Recovery Level #

High availability is used to handle various crash problems, which can be divided into server crash, data center-level crash, or even a city-wide crash.

Data center-level crash: The fiber between data centers is not connected/broken, and the entire data center loses power (even the dual power supply is unavailable).
City-wide crash: Generally refers to the failure of the entire city’s inbound and outbound networks and backbone switches (this situation is very rare).

If we consider these factors comprehensively, high availability becomes a kind of disaster recovery mechanism, and the evaluation criteria for the corresponding high availability architecture are raised.

Data center-level disaster recovery: A database server in a data center is unavailable, and it switches to a database instance in the same data center to ensure business continuity.
City-wide disaster recovery: The data center is unavailable, and it switches to a database instance in a data center in the same city to ensure business continuity.
Cross-city disaster recovery: None of the data centers in a single city are available, and it switches to a database instance in a data center in another city to ensure business continuity.

The high availability designs we talked about earlier only involve disaster recovery within the data center. That is to say, our master server and slave server are both in the same data center. Now let’s take a look at the disaster recovery designs for within and across cities (I would like to remind you that whether it is data center-level disaster recovery, city-level disaster recovery, or cross-city disaster recovery, they are all based on MySQL’s lossless semi-synchronous replication, just with different physical deployment methods to solve different problems).

For city-level disaster recovery, I have seen many designs like this:

This design does not consider network jitter between data centers. If there is network jitter between Data Center 1 and Data Center 2, there will be a problem where transaction commits are hung because the transaction commit requires the transaction log to be received by a slave server in Data Center 2.

Network jitter between data centers is very common, so city-level disaster recovery for core businesses needs to adopt a three-area architecture, as shown in the following figure:

This architecture is called the “three-area architecture”. If all three data centers are in one city, it is called “one city, three centers”. If they are in adjacent cities, it is called “two cities, three centers”. However, this kind of disaster recovery within or near a city requires the latency between the data centers to be no more than 5ms.

In the three-area architecture, data is stored in three data centers and replicated between data centers using semi-synchronous replication. Here, the semi-synchronous replication parameter rpl_semi_sync_master_wait_for_slave_count is set to 1, indicating that as long as one semi-synchronous backup machine receives the log, the transactions on the master server can be committed.

This design ensures that data is stored in at least one complete copy in the other data centers except for the primary data center. In addition, even if there is network jitter between Data Center 1 and Data Center 2, it will not affect the transaction submission on the primary server because the network between Data Center 1 and Data Center 3 is very good. If there is a failure in the exit switch or fiber of Data Center 1, the high availability suite will FAILOVER to Data Center 2 or Data Center 3 because at least one copy of the data is intact.

Data from Data Center 2 and Data Center 3 is used to ensure data consistency. However, if read-write separation or backup is required, an asynchronous replication standby node needs to be introduced. Therefore, the overall architecture is adjusted as follows:

From the figure, we can see that we add two asynchronous replication nodes for implementing read-write separation in the business. In addition, we introduce an asynchronous replication delayed standby node from the standby machine of Data Center 3 to recover from data deletion errors.

When designing an architecture similar to the one above, you can consider your same-city disaster recovery architecture to be qualified!

Another important point is that because the primary server in Data Center 1 needs to send logs to four slave servers, the network card may become a bottleneck, so it is necessary to configure a 10G network card.

After understanding the three data center architecture, it is very simple to implement cross-city disaster recovery, as long as the three data centers are placed in different cities. However, in this design, when the primary server goes down, the database will switch to cross-city, and the network latency between the cities exceeds 25 ms. Therefore, cross-city disaster recovery is generally designed as a “three-city, five-center” architecture, as shown in the figure below:

In the above figure: Data Center 1 and Data Center 2 are located in City 1; Data Center 3 and Data Center 4 are located in City 2; Data Center 5 is located in City 3. The distance between the three cities is more than 200 kilometers, and the latency exceeds 25 ms.

Because there are five data centers, the Ack set to 2 ensures that at least one copy of the data is in two data centers. This way, when a city-level failure occurs, there is at least one complete copy of the data in City 2 or City 3.

In real-world Internet business scenarios, the “three-city, five-center” application is not as common as the “three data center” architecture. This is because a latency of 25 ms has a significant impact on business operations, and this architecture is generally used in scenarios where there are more reads than writes, such as user centers.

In addition, in real-world Internet business scenarios, the implementation of cross-city disaster recovery is generally based on the same-city disaster recovery architecture, and then the data consistency across cities is guaranteed by the business layer.

Safety-net strategy: Data verification #

So far, our high availability is based on MySQL replication technology. But have you ever thought about these questions:

What if there is a bug in the database replication, resulting in logical inconsistency in the final data?
Are the master-slave data always consistent? How do you determine consistency?

Therefore, in addition to the high availability architecture design, we need to add a safety-net service to judge the consistency of the data. Here, we need to introduce data verification to solve the following two problems:

Logical consistency of data in business operations: This ensures that the business is correct.
Consistency of data between master and slave servers: This ensures that the data on the slave servers is safe and can be switched to.

The verification of business logic is the responsibility of the business colleagues, who are responsible for writing logic to reconcile the business. For example, verifying whether “today’s inventory consumption” is equal to the “sum of order details” or whether “in-transit deliveries” + “received deliveries” is equal to the “sum of ordered deliveries”. In short, this is a business logic used for reconciliation.

The verification between master and slave servers is carried out by the database team. An additional verification service needs to be written to ensure the consistency of master-slave data. This verification does not rely on replication itself and is also a kind of logical verification. The idea is to compare the recently changed records on the master server with the records on the slave server to verify their consistency. The implementation is shown in the following figure:

The challenge now is how to determine the recently changed records on the master server. There are two approaches:

In the table structure design specification, it is explained that each table has a last_modify_date field to record the last modification time of each record. By filtering based on this condition, we can retrieve the latest updated records and compare each record.
The verification service scans the recent binary logs, filters out the tables and primary keys of the recently updated records, and then verifies the data. The implementation difficulty of this approach is higher, but it does not require querying the database.

If during the verification process, records change again on the master and have not been synchronized to the slave, we can add a rechecking logic. Ideally, after multiple rechecks, the master and slave data should be consistent. If multiple rechecks still show inconsistency, then most likely, the master and slave data are indeed inconsistent.

The logic of the verification service is relatively simple, but there are some challenges in implementing online business data verification. However, isn’t this the value of our DBAs?

Summary #

Today, we have learned about the design of high availability architecture for financial-grade systems. The content is very practical, and it is recommended that you read it repeatedly. Although the content is not complex in terms of principles, it requires continuous refinement in terms of implementation details. You are welcome to exchange ideas with me during the architectural design process in the future. To summarize:

For core business replication, ensure that no-loss semi-synchronous replication is set.
For same-city disaster recovery, use a three data center architecture, one city with three centers, or two cities with three centers, with a network latency between data centers not exceeding 5 ms.
For cross-city disaster recovery, use a “three-city, five-center” architecture. The distance between cross-city data centers should exceed 200 kilometers and have a latency exceeding 25 ms.
Cross-city disaster recovery architecture is generally used in scenarios where there are more reads than writes, such as user centers, due to the high network latency involved.
In addition to replication for data synchronization, an additional verification program is needed for logical verification.
Database-level logical verification can use the last_modify_date field to retrieve the recently modified records.