23 Practical Implementation of Disaster Recovery for Storage Objects in Kubernetes (K8s) Clusters #
When it comes to disaster recovery for storage objects, we can imagine a scenario where the cluster machines crash when you start a pod with mounted volumes. How should we deal with the fault tolerance of storage objects? While high availability of applications is ideal, disaster recovery has always been the last barrier. In many extreme cases, having fault-tolerant backups provide assurance for providing services.
In the era of virtual machines, we improved the reliability of business operations by evenly distributing applications to various virtual machines and regularly executing data backups. Now, as we upgrade to the Kubernetes era, all businesses are managed by Kubernetes, and the cluster can quickly schedule and self-maintain the container status of applications. It can easily scale resources to cope with unexpected situations.
As I say this, it seems like there is no need to worry much about storage. As long as the mounted volume is network storage, even if the application cluster fails, the data is still there and there doesn’t seem to be much of an issue. So, what is the significance of learning about storage object disaster recovery?
Well, things are not as simple as they seem. We need to consider scenarios close to the business environment and test whether the read storage objects can be destructive by disrupting the cluster’s state.
Because our goal while upgrading from the virtual machine era to the Kubernetes era is to utilize dynamically scalable resources to minimize the downtime of businesses, we want applications to scale and self-heal according to demand. So, in the Kubernetes era, what we want is not whether the data is lost or not, but whether we can quickly ensure that the time to recover the business becomes shorter and even imperceptible to the users. Is this achievable?
I believe that Kubernetes is already close to achieving this goal through the constantly improving resource objects. So, here, I will share with you the practical experience of implementing various storage object disaster recovery in Kubernetes, so that we are prepared when the need arises.
Experience in Implementing Disaster Recovery for NFS Storage Objects #
Firstly, we need to understand the configuration method of creating PV/PVC for NFS network volumes and pay attention to the usage of the mountOptions
parameter. Use the following example as a reference:
### nfs-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs-pv
spec:
capacity:
storage: 10Gi
volumeMode: Filesystem
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Recycle
storageClassName: nfs
mountOptions:
- hard
- nfsvers=4.1
nfs:
path: /opt/k8s-pods/data # Specify the mount point of NFS
server: 192.168.1.40 # Specify the address of the NFS server
---
### nfs-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nfs-pvc
spec:
storageClassName: nfs
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
In this example, the PersistentVolume
is of type NFS, so it needs the auxiliary program /sbin/mount.nfs
to support mounting the NFS file system.
[kadmin@k8s-master ~]$ kubectl get pvc nfs-pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
nfs-pvc Bound nfs-pv 10Gi RWX nfs 3m54s
[kadmin@k8s-master ~]$
[kadmin@k8s-master ~]$ kubectl get pv nfs-pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
nfs-pv 10Gi RWX Recycle Bound default/nfs-pvc nfs 18m
Now, create a pod that mounts the NFS volume:
### nfs-pv-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-pv-pod
spec:
volumes:
- name: nginx-pv-storage
persistentVolumeClaim:
claimName: nfs-pvc
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
name: "nginx-server"
volumeMounts:
- mountPath: "/usr/share/nginx/html"
name: nginx-pv-storage
Execute the following command to create the pod:
[kadmin@k8s-master ~]$ kubectl create -f nfs-pv-pod.yaml
pod/nginx-pv-pod created
[kadmin@k8s-master ~]$
[kadmin@k8s-master ~]$ kubectl get pod nginx-pv-pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-pv-pod 1/1 Running 0 66s 172.16.140.28 k8s-worker-2 <none> <none>
[kadmin@k8s-master ~]$ curl http://172.16.140.28
Hello, NFS Storage NGINX
When you mount an NFS volume in a Pod, you need to consider how to back up the data. Velero is a cloud-native backup and restore tool that can help us back up persistent data objects. An example of using Velero is as follows:
velero backup create backupName --include-cluster-resources=true --ordered-resources 'pods=ns1/pod1,ns1/pod2;persistentvolumes=pv4,pv8' --include-namespaces=ns1
Note that by default Velero cannot back up volumes, so it integrates with the open-source component restic to support volume backups. Please note that it is still in the experimental stage and should not be used in a production environment.
Ceph Data Backup and Restore #
Rook is a cloud-native management system for managing Ceph clusters. In a previous lesson, I have already shown you how to create a Ceph cluster using Rook. Now, let’s assume that the Ceph cluster has crashed and needs to be repaired. Yes, we need to fix it manually. The steps are as follows:
Step 1: Stop the Ceph operator and turn off the controller for the Ceph cluster to prevent it from automatically recovering itself.
kubectl -n rook-ceph scale deployment rook-ceph-operator --replicas=0
Step 2: The Ceph monmap tracks the fault tolerance of Ceph nodes. Let’s first update the deployment of the healthy monitoring instance rook-ceph-mon-b while the unhealthy instances are rook-ceph-mon-a and rook-ceph-mon-c. Backup the Deployment object of rook-ceph-mon-b.
kubectl -n rook-ceph get deployment rook-ceph-mon-b -o yaml > rook-ceph-mon-b-deployment.yaml
Modify the monitoring instance:
kubectl -n rook-ceph patch deployment rook-ceph-mon-b -p '{"spec": {"template": {"spec": {"containers": [{"name": "mon", "command": ["sleep", "infinity"], "args": []}]}}}}'
Enter the healthy monitoring instance:
kubectl -n rook-ceph exec -it <mon-pod> bash
Set a few simple variables:
cluster_namespace=rook-ceph
good_mon_id=b
monmap_path=/tmp/monmap
Extract the monmap to a file by using the ceph mon command from the good mon deployment and adding the --extract-monmap=${monmap_path}
flag:
ceph-mon \
--fsid=41a537f2-f282-428e-989f-a9e07be32e47 \
--keyring=/etc/ceph/keyring-store/keyring \
--log-to-stderr=true \
--err-to-stderr=true \
--mon-cluster-log-to-stderr=true \
--log-stderr-prefix=debug \
--default-log-to-file=false \
--default-mon-cluster-log-to-file=false \
--mon-host=$ROOK_CEPH_MON_HOST \
--mon-initial-members=$ROOK_CEPH_MON_INITIAL_MEMBERS \
--id=b \
--setuser=ceph \
--setgroup=ceph \
--foreground \
--public-addr=10.100.13.242 \
--setuser-match-path=/var/lib/ceph/mon/ceph-b/store.db \
--public-bind-addr=$ROOK_POD_IP \
--extract-monmap=${monmap_path}
Review the contents of the monmap:
monmaptool --print /tmp/monmap
Remove the bad mon(s) from the monmap:
monmaptool ${monmap_path} --rm <bad_mon>
In this example, we remove mon0 and mon2:
monmaptool ${monmap_path} --rm a
monmaptool ${monmap_path} --rm c
Inject the modified monmap into the good mon by using the ceph mon command and adding the --inject-monmap=${monmap_path}
flag:
ceph-mon \
--fsid=41a537f2-f282-428e-989f-a9e07be32e47 \
--keyring=/etc/ceph/keyring-store/keyring \
--log-to-stderr=true \
--err-to-stderr=true \
--mon-cluster-log-to-stderr=true \
--log-stderr-prefix=debug \
--default-log-to-file=false \
--default-mon-cluster-log-to-file=false \
--mon-host=$ROOK_CEPH_MON_HOST \
--mon-initial-members=$ROOK_CEPH_MON_INITIAL_MEMBERS \
--id=b \
--setuser=ceph \
--setgroup=ceph \
--foreground \
--public-addr=10.100.13.242 \
--setuser-match-path=/var/lib/ceph/mon/ceph-b/store.db \
--public-bind-addr=$ROOK_POD_IP \
--inject-monmap=${monmap_path}
Edit the Rook configmap file:
```bash
kubectl -n rook-ceph edit configmap rook-ceph-mon-endpoints
Remove the expired ‘a’ and ‘b’ from the ‘data’ field:
data: b=10.100.13.242:6789
Update the secret configuration:
mon_host=$(kubectl -n rook-ceph get svc rook-ceph-mon-b -o jsonpath='{.spec.clusterIP}')
kubectl -n rook-ceph patch secret rook-ceph-config -p '{"stringData": {"mon_host": "[v2:'"${mon_host}"':3300,v1:'"${mon_host}"':6789]", "mon_initial_members": "'"${good_mon_id}"'"}}'
Restart the monitor instance:
kubectl replace --force -f rook-ceph-mon-b-deployment.yaml
Restart the operator:
# create the operator. it is safe to ignore the errors that a number of resources already exist.
kubectl -n rook-ceph scale deployment rook-ceph-operator --replicas=1
Restoring Data for a Jenkins PVC Application #
If the Jenkins data is corrupted and you want to restore the Jenkins data directory, you can mount the PVC with a temporary image and use kubectl cp
to achieve this. Here are the steps:
- Get the running permission of the current Jenkins container:
kubectl --namespace=cje-cluster-example get pods cjoc-0 -o jsonpath='{.spec.securityContext}'
map[fsGroup:1000]
- Shut down the container:
kubectl --namespace=cje-cluster-example scale statefulset/cjoc --replicas=0
statefulset.apps "cjoc" scaled
- Check the PVC:
kubectl --namespace=cje-cluster-example get pvc
- Mount the PVC to a temporary image for data recovery:
cat <<EOF | kubectl --namespace=cje-cluster-example create -f -
kind: Pod
apiVersion: v1
metadata:
name: rescue-pod
spec:
securityContext:
runAsUser: 1000
fsGroup: 1000
volumes:
- name: rescue-storage
persistentVolumeClaim:
claimName: jenkins-home-cjoc-0
containers:
- name: rescue-container
image: nginx
command: ["/bin/sh"]
args: ["-c", "while true; do echo hello; sleep 10;done"]
volumeMounts:
- mountPath: "/tmp/jenkins-home"
name: rescue-storage
EOF
pod "rescue-pod" created
- Copy the backup data to the temporary image:
kubectl cp oc-jenkins-home.backup.tar.gz rescue-pod:/tmp/
- Unzip the data to the mounted PVC:
kubectl exec --namespace=cje-cluster-example rescue-pod -it -- tar -xzf /tmp/oc-jenkins-home.backup.tar.gz -C /tmp/jenkins-home
- Delete the temporary image Pod:
kubectl --namespace=cje-cluster-example delete pod rescue-pod
- Restore the Jenkins container:
kubectl --namespace=cje-cluster-example scale statefulset/cjoc --replicas=1
Backing up a Kubernetes Cluster #
A Kubernetes cluster is a distributed system, and the primary reasons for backing up the cluster metadata are as follows:
- Quick recovery of control nodes rather than compute nodes
- Recovery of application containers
To understand the backup steps on a single control node server, consider the following:
# Backup certificates
sudo cp -r /etc/kubernetes/pki backup/
# Make etcd snapshot
sudo docker run --rm -v $(pwd)/backup:/backup \
--network host \
-v /etc/kubernetes/pki/etcd:/etc/kubernetes/pki/etcd \
--env ETCDCTL_API=3 \
k8s.gcr.io/etcd:3.4.3-0 \
etcdctl --endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
--key=/etc/kubernetes/pki/etcd/healthcheck-client.key \
snapshot save /backup/etcd-snapshot-latest.db
# Backup kubeadm-config
sudo cp /etc/kubeadm/kubeadm-config.yaml backup/
To restore data on a control node, perform the following operations:
# Restore certificates
sudo cp -r backup/pki /etc/kubernetes/
# Restore etcd backup
sudo mkdir -p /var/lib/etcd
sudo docker run --rm \
-v $(pwd)/backup:/backup \
-v /var/lib/etcd:/var/lib/etcd \
--env ETCDCTL_API=3 \
k8s.gcr.io/etcd:3.4.3-0 \
/bin/sh -c "etcdctl snapshot restore '/backup/etcd-snapshot-latest.db' ; \
mv /default.etcd/member/ /var/lib/etcd/"
# Restore kubeadm-config
sudo mkdir /etc/kubeadm
sudo cp backup/kubeadm-config.yaml /etc/kubeadm/
# Initialize the master with backup
sudo kubeadm init --ignore-preflight-errors=DirAvailable--var-lib-etcd \
--config /etc/kubeadm/kubeadm-config.yaml
Using Kubernetes’ native data replication capability kubectl cp
and cronjobs, we can handle most data backup and recovery tasks. When dealing with the backup and recovery of distributed systems, in most cases, we’re not backing up the data but rather attempting to remove the faulty nodes from the healthy ones to allow the cluster to heal itself. This is a characteristic of distributed systems—they can heal themselves. However, the weakness of distributed systems is that healing is conditional; if the number of faulty nodes exceeds the quorum of available nodes, even the most intelligent recovery efforts are useless. Therefore, backup remains the last line of defense. Regular and redundant data backup is essential.