21 Database Backup Check Backup Files as Well

21 Database Backup - Check Backup Files as Well #

In the previous lessons, we learned about high availability architecture design. You should remember that high availability is only meant to ensure the continuity of your business. When a disaster occurs, such as in Lesson 20 where we discussed high availability solutions based on replication or InnoDB Cluster technology, MySQL databases can be switched over.

In addition to high availability design, an architect should also focus on designing a backup architecture. This is because we need to prevent unexpected incidents, such as hackers deleting all the core data from the database or an employee intentionally or unintentionally deleting data from the production environment.

Instances of intentionally deleting data are not uncommon and often make headlines every few months. For example, the Weimei incident that occurred in 2020:

On the night of February 23rd, a core employee of Weimei privately deleted the database, causing the company’s SaaS business to suddenly collapse. The business applications relying on Weimei were all offline, and 3 million merchants were almost unable to conduct business. Weimei itself also suffered huge losses, with a market value loss of over 2 billion Hong Kong dollars in just a few days.

We can see that destructive data deletion not only affects business continuity but also causes immeasurable economic damage to the company. Therefore, in this lesson, we will learn about “how to design a complete backup system”.

Database Backup #

Replication or InnoDB Cluster technology is only responsible for the availability of the business. To ensure data security, we need to build a complete offline backup system in addition to the online replica databases. This way, even if the online database is completely destroyed, users can still recover data from the offline backup.

Therefore, the first step is to: isolate the permissions between the online database and the offline backup system.

In other words, those who have access to the online database should not have access to the offline backup system, and vice versa. Otherwise, even if both sides of the data are destroyed, it will still be impossible to recover the data.

For MySQL databases, database backups can be classified into full backups and incremental backups.

Full Backup #

A full backup refers to backing up all the data in the database at a given point in time. Depending on the backup content, there are two types of full backups: logical backup and physical backup.

  • Logical Backup

A logical backup refers to backing up the logical content of the database, which means backing up the content of each table using INSERT statements.

MySQL provides logical backup tools such as mysqldump and mysqlpump. To perform a backup using mysqldump, you can use the following SQL statement:

mysqldump -A --single-transaction > backup.sql

The above command performs a full logical backup using mysqldump:

  1. The -A parameter backs up all databases.
  2. The --single-transaction parameter performs a consistent backup.

I want to emphasize that the --single-transaction parameter is a must-have. Without this parameter, the content of the backup file will be inconsistent, and such a backup is almost meaningless.

If you often forget the --single-transaction parameter, you can add the following reminder to the MySQL configuration file:

# my.cnf 

[mysqldump]

single-transaction

With the above configuration, whenever the mysqldump command is executed on the server, the --single-transaction parameter will be automatically added, and you will no longer forget.

In the above command, the final backup file is named backup.sql. Opening this file, we will see similar content:

-- MySQL dump 10.13  Distrib 8.0.23, for Linux (x86_64)

--

-- Host: localhost    Database:

-- ------------------------------------------------------

-- Server version       8.0.23

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;

/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;

/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;

/*!50503 SET NAMES utf8mb4 */;

/*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;

/*!40103 SET TIME_ZONE='+00:00' */;

/*!50606 SET @OLD_INNODB_STATS_AUTO_RECALC=@@INNODB_STATS_AUTO_RECALC */;

/*!50606 SET GLOBAL INNODB_STATS_AUTO_RECALC=OFF */;

/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;

/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;

/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;

/*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;

--

-- Current Database: `mysql`

--

CREATE DATABASE /*!32312 IF NOT EXISTS*/ `mysql` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci */ /*!80016 DEFAULT ENCRYPTION='N' */;

USE `mysql`;

...

As you can see, the backup.sql file is essentially a text file that records one SQL statement after another. This is what we call a logical backup. Restoring a logical backup is very simple, just execute the SQL statements in the file using the following SQL:

mysql < backup.sql

Although mysqldump is simple and easy to use, it performs backups in a single-threaded manner, which means it can be slow. Therefore, MySQL introduced the mysqlpump tool.

The mysqlpump command is used almost the same way as mysqldump, with the only difference being that it allows you to set the number of backup threads. For example:

mysqlpump -A --single-transaction --default-parallelism=8 > backup.sql

The above command demonstrates a backup using mysqlpump. The --default-parallelism parameter sets the number of parallel backup threads. In addition, unlike mysqldump, mysqlpump allows you to view the progress of the backup during the process. However, I do not recommend using mysqlpump in a real online production environment because it cannot create a consistent backup when the backup concurrent thread count is greater than 1. See the following mysqlpump prompt:

image.png Furthermore, mysqlpump’s backup multithreading is based on parallel backups of multiple tables, so if there is a super large table in the database, the backup for that table will still be single-threaded. So, is there a logical backup tool that supports consistent parallel backups at a record level?

Yes, there is, and that is the open-source tool called mydumper, available at: https://github.com/maxbube/mydumper. Mydumper has the following powerful features:

  1. Supports consistent backups.
  2. Allows for shard-based backups based on table records, enabling multithreaded backups.
  3. Supports multithreaded recovery operations.
  4. Allows for multithreaded recovery on a per-table basis.

As you can see, mydumper is almost a perfect logical backup tool and is the preferred tool for building backup systems. I will provide you with a simple example of how to use mydumper:

mydumper -o /bak -r 100000 --trx-consistency-only -t 8

In the above command, the backup files are saved to the directory /bak, where:

  1. The -r parameter indicates that each table is saved after exporting 100,000 records.
  2. The –trx-consistency-only parameter represents consistent backups.
  3. The -t parameter indicates 8 threads for parallel backups.

Even for a large table, it can be backed up with 8 threads, exporting 10,000 records at a time, greatly improving the backup performance.

  • Physical Backup

Of course, logical backups are good, but they can take a long time because fundamentally, logical backups involve performing INSERT … SELECT … operations.

In contrast, physical backups directly backup the database’s physical tablespace files and redo logs, without the need to retrieve data through logical SELECT statements. Therefore, physical backups are generally faster than logical backups and also have faster recovery speeds.

However, unlike mydumper, physical backups can only restore the entire instance’s data and cannot be restored for specific tables. MySQL 8.0 provides an official Clone Plugin for physical backups.

The Clone Plugin is a physical backup tool introduced in MySQL 8.0.17. After installing the plugin, you can perform physical backups of MySQL. To use the Clone Plugin, you need to install the Clone Plugin plugin and make the following settings in the configuration file:

[mysqld]

plugin-load-add=mysql_clone.so

clone=FORCE_PLUS_PERMANENT

After that, you can perform physical backups using the following command:

mysql> CLONE LOCAL DATA DIRECTORY = '/path/to/clone_dir';

As you can see, by entering the clone command in the mysql command-line interface, you can perform a local instance’s MySQL physical backup.

The strength of the Clone Plugin lies in its ability to perform remote physical backups. The command for remote physical backups is shown below:

CLONE INSTANCE FROM 'user'@'host':port

IDENTIFIED BY 'password'

[DATA DIRECTORY [=] 'clone_dir']
[REQUIRE [NO] SSL];

From the above command, we can see that the Clone Plugin supports specifying a username and password to back up remote physical backups to the current server. With the Clone Plugin, it is easy to build a backup system.

For versions of MySQL before 8.0, we can use the third-party open-source tool Xtrabackup, official website: https://github.com/percona/percona-xtrabackup.

However, the mechanism of physical backup is more complex than logical backup, and it requires an in-depth understanding of the implementation of the MySQL database kernel. I strongly recommend using the official MySQL physical backup tool, and open-source third-party physical backup tools should only be used as auxiliary means in certain scenarios.

Incremental Backup #

The logical and physical backups we have learned earlier are all full backups, which means backing up the entire database. However, the data in the database continues to change, and it is not practical to perform incremental backups of the entire database every minute.

Therefore, we need to build a complete backup strategy using the “full backup + incremental backup” approach. Incremental backup is about backing up the log files, which in MySQL are the binary log files.

Since the binary log files contain all the modifications to the database, “full backup + incremental backup” allows us to achieve point-in-time recovery, which means we can restore to any point in time using “full backup + incremental backup”.

During a full backup, the corresponding point-in-time is recorded, usually in the form of a GTID position. Incremental backups can replay the logs after this point, enabling recovery to a specific point in time.

In case any destructive operations are present in the binary logs, these operations can be skipped, and the subsequent binary logs can be replayed, allowing for disaster recovery in extreme scenarios.

To perform incremental backups of MySQL’s binary logs in near real-time, we can use the following command:

mysqlbinlog --read-from-remote-server --host=host_name --raw --stop-never binlog.000001

As we can see, incremental backup is done using the mysqlbinlog tool we learned about earlier, but this time with the additional parameter --read-from-remote-server, which indicates that binary logs can be pulled from a remote MySQL server specified by the --host parameter.

The --raw parameter indicates that the backup should be done in binary format, and the --stop-never parameter indicates that the process should never stop, meaning it will continue pulling and saving the logs indefinitely. The binlog.000001 parameter indicates the starting point for pulling the logs.

The essence of MySQL incremental backup is simulating a slave server using mysqlbinlog. The master server continuously pushes the binary logs to the slave server, achieving incremental backup of the database using the replication technology discussed earlier.

The recovery of an incremental backup involves parsing the binary logs using mysqlbinlog and then performing the recovery, for example:

mysqlbinlog binlog.000001 binlog.000002 | mysql -u root -p

Backup Strategy #

With the knowledge of full and incremental backups, we can now build our own backup strategy.

First, we need to set the frequency of full backups. Since full backups can be large, it is recommended to perform a full backup once a week and have a real-time incremental backup. This means that in the worst-case scenario, we would need to restore a full backup from 7 days ago and then apply the incremental backups from the past 7 days.

Backup files also need to be backed up. We should not assume that the storage medium for backup files will never fail. Therefore, backup files should be stored on at least 2 different storage servers in different data centers, meaning that backup files should have at least 2 copies. As for the retention period of backup files, it depends on the requirements of each company (some may require permanent retention, while others may require retention for at least the past 3 months).

All these backup strategies require their own backup systems for scheduling, and there are no particularly good open-source projects for this. It needs to be customized and developed according to the specific business needs.

Backup File Verification #

In my opinion, the backup system is very critical, not inferior to the high availability system in production.

In Lecture 18, we talked about the high availability architecture of online master-slave replication, which also requires data verification between the master and slave to ensure data consistency.

Similarly, backup files also need to be verified to ensure their correctness so that they can be used for recovery in case of a real disaster. Therefore, the backup system also needs a backup file verification feature.

The general logic of backup file verification is to perform a full restore, then perform a recovery using incremental backups, connect the recovered MySQL instance to the online MySQL server as a slave server, and then perform data verification again.

Remember, only when the verification is OK can we guarantee the security of your backup files. So backup files also need to be checked.

Conclusion #

Today we learned how to build a backup strategy for MySQL. First, we discussed the logical and physical backups of the MySQL database. Then we learned about incremental backups using mysqlbinlog. By combining full backups and incremental backups, we can build a complete backup strategy. Finally, we emphasized the need to check backup files to ensure their integrity.

I hope that after completing this lecture, you can apply what you have learned to your production environment and build a stable and reliable backup system that can recover data even in the event of a catastrophic scenario like accidental database deletion.