19 High Availability Components How to Choose Among So Many Options

19 High Availability Components - How to Choose Among So Many Options #

In lessons 17 and 18, we learned about high availability solutions for MySQL databases and how to design a three-data center disaster recovery and a cross-data center disaster recovery for financial businesses using lossless semi-synchronous replication.

However, in the event of a database failure, MySQL’s master-slave replication does not automatically switch over. This requires a high availability suite to manage the master-slave replication.

In this lesson, we will learn about commonly used high availability suites for MySQL. After learning the content today, we hope you will understand the implementation principles of high availability suites and be able to use them in your production environment.

High Availability Suites #

MySQL’s high availability suites are responsible for database failover operations. When a database failure occurs, MySQL can remove the original host, select a new host, and continue providing services to ensure business continuity.

As we can see, MySQL replication is the technical foundation for high availability, used to synchronize data to the slave in real-time. The high availability suite is the solution for implementing high availability in MySQL, responsible for switching to a new master.

To ensure that the business does not perceive the database failover, VIP (Virtual IP) technology is used here. VIP is not a real physical IP, but can be bound to any server.

When accessing the database, it is not the physical IP bound to the network card on the server that the business accesses, but the VIP on this server. When a database server fails, the high availability suite will transfer the VIP to a new server. After failover, the business still accesses the VIP, so using VIP can achieve transparency to the business.

The following image shows how the business accesses the database through VIP:

Drawing 0.png

From the image above, we can see that the IP address of MySQL’s primary server is 192.168.1.10, and the IP addresses of the two slave servers are 192.168.1.20 and 192.168.1.30 respectively.

The upper-layer service does not directly access the physical IP 192.168.1.10, but accesses the VIP with the address 192.168.1.100. When the MySQL database primary server fails, the following processing occurs:

Drawing 1.png

We can see that after failover, because the upper-layer service accesses the VIP 192.168.1.100, the switch is transparent to the service, only that during the switch, the service will receive a notification of connection failure to the database. However, through the retry mechanism, when the underlying database completes the switch, the service can continue to be used. Therefore, the upper-layer service must have good error retry logic, otherwise even if the VIP is enabled, transparent switching cannot be achieved.

However, VIP also has limitations, limited to IP settings within the same data center and network segment. If it is the three-data center disaster recovery architecture we designed before, VIP cannot be used. In this case, a name service is required, and the commonly used name service is DNS (Domain Name Service), as shown below:

Drawing 2.png

From the image above, we can see that the domain name m1.insidemysql.com is mapped to the IP 192.168.1.10, and the upper-layer business accesses it through the domain name. When a failure occurs and a data center-level switch is performed, the result is as follows:

Drawing 3.png

We can see that after failover, the high availability suite will change the DNS to point to the new MySQL primary server, with an IP address of 202.177.54.20, which also achieves transparency to the upper-layer service.

Although using a domain name or other name service can solve the problem of switching across data centers, it introduces new components. The high availability of the new component also needs to be paid special attention. When designing the architecture, please consult the team that provides the name service and design the high availability disaster recovery architecture together with them.

Having understood the above transparent switching mechanism for high availability, let’s take a look at several commonly used high availability suites for MySQL in the industry.

MHA #

MHA (Master High Availability) is an open-source MySQL high availability program that provides automating master failover for MySQL database master-slave replication architectures.

MHA was developed by Yoshinori Matsunobu, a renowned engineer at Facebook. The open-source address is: https://github.com/yoshinorim/mha4mysql-manager. It consists of two major components, MHA Manager and MHA Node.

The MHA Manager is usually deployed on a server to determine the availability of multiple MySQL high availability groups. When a master server fails, it initiates failover operations. The MHA Manager can be regarded as the central control server for failover.

The MHA Node is deployed on each MySQL server, and the MHA Manager completes the failover switch operation by executing the scripts of the Node component. The communication between MHA Manager and MHA Node is done via SSH, which means that the SSH policy needs to be established between MHA Manager and all MySQL nodes in the production environment. However, this creates potential security risks.

Furthermore, SSH communication is not particularly efficient. Therefore, MHA is more suitable for smaller companies with no more than 20 MySQL database servers.

Drawing 4.png!

Orchestrator #

Orchestrator is another open-source MySQL high availability suite that not only supports failover, but also allows for simple replication management operations. The open-source address for Orchestrator is: https://github.com/openark/orchestrator

You can think of Orchestrator as an upgraded version of MHA, which provides an HTTP interface for managing databases. This is more convenient than having to log in to the MHA Manager server each time.

The following diagram shows the high availability design architecture of Orchestrator:

Drawing 5.png

The basic implementation principle is the same as MHA, except that the metadata information is stored in a metadata database and provides access through both HTTP interfaces and commands, making it more user-friendly.

However, because the management of the MySQL database from the control node to the underlying MySQL databases still uses SSH, similar weaknesses to MHA still exist. In summary, I would like to remind you that Orchestrator is still only recommended for use in small-scale database clusters.

Database Management Platform #

Of course, although both MHA and Orchestrator can perform failover operations for MySQL high availability, in a production environment where thousands or even tens of thousands of database servers need to be managed, the communication method using only SSH does not meet the security and performance requirements.

As a result, almost every internet company develops their own database management platform to manage all database clusters and handle database failover tasks.

Next, I would like to provide you with a detailed introduction to the architecture of a database management platform. The following diagram shows a rough framework of the database management platform:

Drawing 6.png

The database management platform shown in the diagram is the user’s entry point for operating the database. Most database operations, such as database initialization, data querying, and data backup, can be completed on this platform without the need to log in to the database servers. This greatly improves the efficiency of database operations.

The database management platform provides HTTP APIs to support various access methods such as web and mobile through a front-end and back-end separation.

The metadata database is used to store all node information of the managed MySQL databases, such as IP addresses, ports, and domain names.

The manager of the database management platform is responsible for controlling all the MySQL nodes below it. Communication between the manager and the backend MySQL is done through agents deployed on the MySQL servers. They communicate using the gRPC protocol with a binary packet (BP). This solves the security and performance issues associated with SSH.

The agent is responsible for reporting the status of each database node to the manager. The manager determines whether a database is down and needs to be switched, and to which node it should be switched based on the reported information.

With this design, a basic database management platform can be implemented. Additionally, each company has its own specific requirements, which can also be integrated into the database management platform, such as security requirements, auditing needs, and ticketing systems.

Therefore, with a database management platform, database failover, daily management, and access can all be automatically handled by the platform. It is only with a database management platform that true autonomous management of databases can be achieved.

Summary #

In this lesson, we primarily learned about high availability suites for MySQL databases. We know that MySQL replication is the technical foundation for achieving high availability, but failover operations can only be performed with the help of high availability suites. MySQL replication technology itself cannot achieve failover functionality.

To achieve transparent data switching, a virtual IP (VIP) and a name service mechanism can be used. VIPs are only used within the same data center and same network segment, while name servers, such as domain names, can be used to switch between different data centers.

Commonly used high availability suites for MySQL include MHA and Orchestrator, both of which can perform failover operations. However, due to the management node’s communication with MySQL using the SSH protocol, security is not high and performance is average. It is generally recommended to use them in environments with no more than 20 database nodes.

For scenarios where a large number of MySQL databases need to be managed, developing a self-built database management platform is recommended. This can combine the specific characteristics of each company and design an automated management platform for MySQL databases, thereby freeing up the productivity of DBAs to focus on optimizing business operations.