21 Case Study Distributed My SQL Cluster Tool Vitess Practical Analysis

21 Case Study- Distributed MySQL Cluster Tool Vitess Practical Analysis #

For the deployment of stateful applications on Kubernetes, the most challenging example is undoubtedly the deployment of a MySQL cluster. Over the past decade, MySQL has been the most widely used database by developers. Almost everyone is familiar with the complex architecture of setting up a MySQL cluster using the Master/Slave replication method. When we deploy a MySQL cluster on a Kubernetes cluster, we have to consider how to leverage cloud-native features to build the cluster. In this article, the author summarizes the real-world practice of migrating a cloud-native cluster using the Vitess distributed MySQL cluster tool, instead of analyzing how to manually install a MySQL cluster through YAML.

Introduction to Vitess Tool #

Vitess is touted as a tool for horizontally scaling MySQL database clusters. The first piece of news we heard about Vitess was when JD.com (one of the largest e-commerce companies in China) fully adopted cloud-native technology during the 618 Shopping Festival, with Vitess being used for database sharding and cluster management. Next, let’s quickly experience the process of using Vitess on Kubernetes.

Setting Up the Environment #

Deploy a single node on AWS with more than 8GB of memory, and use K3s to quickly build a Kubernetes environment.

# Set up a single-node Kubernetes cluster
curl https://releases.rancher.com/install-docker/19.03.sh | sh
curl -sfL https://get.k3s.io | sh -

# Download kubectl
curl -LO https://storage.googleapis.com/kubernetes-release/release/v1.14.9/bin/linux/amd64/kubectl
# Install MySQL client
apt install mysql-client

# Download and install the latest version of vtctlclient
wget https://github.com/vitessio/vitess/releases/download/v8.0.0/vitess-8.0.0-7e09d0c.tar.gz
tar zxvf vitess-8.0.0-7e09d0c.tar.gz && cp vitess-8.0.0-7e09d0c/bin/vtctlclient /usr/local/bin/

# Download the Vitess operator example
git clone https://github.com/vitessio/vitess.git
cd vitess/examples/operator
k3s kubectl apply -f operator.yaml

root@ip-172-31-27-203:~/vitess/examples/operator# k3s kubectl get po
NAME                               READY   STATUS    RESTARTS   AGE                                                                                                               
vitess-operator-784458658c-mzhzx   1/1     Running   0          59s 

# Initialize the cluster
root@ip-172-31-27-203:~/vitess/examples/operator# k3s kubectl apply -f 101_initial_cluster.yaml
vitesscluster.planetscale.com/example created
secret/example-cluster-config created

root@ip-172-31-27-203:~/vitess/examples/operator# k3s kubectl get pods
NAME                                             READY   STATUS    RESTARTS   AGE
vitess-operator-784458658c-mzhzx                 1/1     Running   0          3m38s
example-etcd-faf13de3-2                          1/1     Running   0          111s
example-etcd-faf13de3-1                          1/1     Running   0          111s
example-etcd-faf13de3-3                          1/1     Running   0          111s
example-zone1-vtctld-1d4dcad0-68484d7b88-428dc   1/1     Running   2          111s
example-zone1-vtgate-bc6cde92-c6499cf87-w86rz    1/1     Running   2          111s
example-vttablet-zone1-2469782763-bfadd780       3/3     Running   2          111s
example-vttablet-zone1-2548885007-46a852d0       3/3     Running   2          111s

To facilitate connecting to the Vitess proxy, we need to initialize the port forwarding environment:

./pf.sh &
alias vtctlclient="vtctlclient -server=localhost:15999"
alias mysql="mysql -h 127.0.0.1 -P 15306 -u user"

Load the database table structure:

vtctlclient ApplySchema -sql="$(cat create_commerce_schema.sql)" commerce
vtctlclient ApplyVSchema -vschema="$(cat vschema_commerce_initial.json)" commerce

Access the MySQL Server through the Vitess Proxy via MySQL:

~/vitess/examples/operator$ mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.7.9-Vitess MySQL Community Server (GPL)

... (MySQL console output)

mysql> show databases;
+-----------+
| Databases |
+-----------+
| commerce  |
+-----------+
1 row in set (0.00 sec)

So far, our experience is similar to installing a local MySQL Server. This transparent experience makes us want to explore more advanced features.

The following diagram illustrates the architectural components of Vitess, and we need to become familiar with these terms:

18-1-vitess-arch33

Topology

The topology service is a metadata storage object that contains information about running servers, sharding schemes, and the replication graph. The topology is backed by a consistent data store, with etcd2 being the default plugin. You can view topology information using vtctl (command line) and vtctld (web).

VTGate

VTGate is a lightweight proxy server that routes traffic to the correct VTTablet and returns merged results to the client. Applications make queries to VTGate. The client usage is very simple—it only needs to be able to find an instance of VTGate to use Vitess.

VTTablet

VTTablet is a proxy server that sits in front of the MySQL database and attempts to maximize throughput while protecting MySQL from harmful queries. Its features include connection pools, query rewriting, and reuse of duplicate data. Keyspace

A keyspace is a logical database. If sharding is used, a keyspace maps to multiple MySQL databases; if sharding is not used, a keyspace directly maps to a MySQL database name. From the perspective of the application, a keyspace appears as a single database.

Reading data from a keyspace is similar to reading data from a MySQL database. However, depending on the consistency requirements of the read operation, Vitess may fetch data from the master database or a replica. By routing each query to the appropriate database, Vitess allows your code to be structured as if it were reading from a MySQL database.

Introduction to Advanced Features of Vitess #

Sharding is a method of horizontally partitioning a database to store data on two or more database servers. Here, we will explain how sharding works in Vitess and the types of sharding supported by Vitess.

In Vitess, a keyspace can be sharded or unsharded. An unsharded keyspace can be directly mapped to a MySQL database. If it is sharded, the rows of the keyspace are split across different databases with the same schema.

For example, if the “User” keyspace of an application is divided into two shards, each shard contains approximately half of the records of that application. Similarly, each user’s information is only stored in one shard.

Note that sharding is orthogonal to (MySQL) replication. A Vitess shard typically consists of a primary MySQL instance and multiple replica instances. The primary instance handles write operations, while the replicas handle read traffic, batch operations, and other tasks. Apart from some replication lag, each MySQL instance within a shard should have the same data.

Requirement Action
Increase read capacity uniformly Add replicas or shards
Increase write capacity uniformly Shard shards
Recover excess resources Merge shards or keyspaces
Increase geographical diversity Add new partitions and replicas
Handle hot tables Add extra replicas or shards for read-only hot tables; shard directly for write tables

Applying a new VSchema will indicate to Vitess that the keyspace is sharded, which may block some complex queries. It is best to validate this step before proceeding. If you do notice that certain queries start failing, you can always revert to the old VSchema temporarily. Make sure to fix all queries before entering the Reshard process.

vtctlclient ApplySchema -sql="$(cat create_commerce_seq.sql)" commerce
vtctlclient ApplyVSchema -vschema="$(cat vschema_commerce_seq.json)" commerce
vtctlclient ApplySchema -sql="$(cat create_customer_sharded.sql)" customer
vtctlclient ApplyVSchema -vschema="$(cat vschema_customer_sharded.json)" customer

At this point, you have finalized your sharded VSchema and reviewed all queries to ensure they remain valid. Now it’s time for Resharding.

The process of Resharding involves splitting existing shards into smaller ones. This type of Resharding is best suited for Vitess. In some cases, you may want to introduce a new shard and add new rows to the recently created shard. In Vitess, this can be achieved by splitting the shard.

kubectl apply -f 302_new_shards.yaml

killall kubectl
./pf.sh &

# With Operator on Start the Reshard
vtctlclient Reshard customer.cust2cust '-' '-80,80-'

After Resharding is completed, we can use VDiff to check the integrity of the data and ensure that our source and target shards are consistent.

vtctlclient VDiff customer.cust2cust
# Returns the following content
Summary for customer: {ProcessedRows:5 MatchingRows:5 MismatchedRows:0 ExtraRowsSource:0 ExtraRowsTarget:0}
Summary for corder: {ProcessedRows:5 MatchingRows:5 MismatchedRows:0 ExtraRowsSource:0 ExtraRowsTarget:0}

Manually switch read and write operations to the new shards. Ensure that the database is functioning properly:

vtctlclient SwitchReads -tablet_type=rdonly customer.cust2cust
vtctlclient SwitchReads -tablet_type=replica customer.cust2cust

vtctlclient SwitchWrites customer.cust2cust

mysql --table < ../common/select_customer-80_data.sql
Using customer/-80
Customer
+-------------+--------------------+
| customer_id | email              |
+-------------+--------------------+
|           1 | [[email protected]](/cdn-cgi/l/email-protection)   |
|           2 | [[email protected]](/cdn-cgi/l/email-protection)     |
|           3 | [[email protected]](/cdn-cgi/l/email-protection) |
|           5 | [[email protected]](/cdn-cgi/l/email-protection)     |
+-------------+--------------------+
COrder
+----------+-------------+----------+-------+
| order_id | customer_id | sku      | price |
+----------+-------------+----------+-------+
|        1 |           1 | SKU-1001 |   100 |
|        2 |           2 | SKU-1002 |    30 |
|        3 |           3 | SKU-1002 |    30 |
|        5 |           5 | SKU-1002 |    30 |
+----------+-------------+----------+-------+

mysql --table < ../common/select_customer80-_data.sql
Using customer/80-
Customer
+-------------+----------------+
| customer_id | email          |
+-------------+----------------+
|           4 | [[email protected]](/cdn-cgi/l/email-protection) |
+-------------+----------------+
COrder
+----------+-------------+----------+-------+
| order_id | customer_id | sku      | price |
+----------+-------------+----------+-------+
|        4 |           4 | SKU-1002 |    30 |
+----------+-------------+----------+-------+

Summary #

After applying the Vitess Operator, the biggest benefit is not having to worry about the architecture design of the MySQL replication cluster. The Vitess Operator manages high availability and database sharding, thereby shielding us from the complexities of distributed deployment and operations. As an operator, one needs to be aware that because Vitess is a proxy, it has some differences from the native interface protocol of MySQL and needs to be adapted. With JD.com adopting Vitess technology to support the database cluster during the 618 shopping festival, we can confidently use it.

References #