23 Practical Implementation of Disaster Recovery for Storage Objects in Kubernetes ( K8s) Clusters

23 Practical Implementation of Disaster Recovery for Storage Objects in Kubernetes (K8s) Clusters #

When it comes to disaster recovery for storage objects, we can imagine a scenario where the cluster machines crash when you start a pod with mounted volumes. How should we deal with the fault tolerance of storage objects? While high availability of applications is ideal, disaster recovery has always been the last barrier. In many extreme cases, having fault-tolerant backups provide assurance for providing services.

In the era of virtual machines, we improved the reliability of business operations by evenly distributing applications to various virtual machines and regularly executing data backups. Now, as we upgrade to the Kubernetes era, all businesses are managed by Kubernetes, and the cluster can quickly schedule and self-maintain the container status of applications. It can easily scale resources to cope with unexpected situations.

As I say this, it seems like there is no need to worry much about storage. As long as the mounted volume is network storage, even if the application cluster fails, the data is still there and there doesn’t seem to be much of an issue. So, what is the significance of learning about storage object disaster recovery?

Well, things are not as simple as they seem. We need to consider scenarios close to the business environment and test whether the read storage objects can be destructive by disrupting the cluster’s state.

Because our goal while upgrading from the virtual machine era to the Kubernetes era is to utilize dynamically scalable resources to minimize the downtime of businesses, we want applications to scale and self-heal according to demand. So, in the Kubernetes era, what we want is not whether the data is lost or not, but whether we can quickly ensure that the time to recover the business becomes shorter and even imperceptible to the users. Is this achievable?

I believe that Kubernetes is already close to achieving this goal through the constantly improving resource objects. So, here, I will share with you the practical experience of implementing various storage object disaster recovery in Kubernetes, so that we are prepared when the need arises.

Experience in Implementing Disaster Recovery for NFS Storage Objects #

Firstly, we need to understand the configuration method of creating PV/PVC for NFS network volumes and pay attention to the usage of the mountOptions parameter. Use the following example as a reference:

### nfs-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-pv
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Recycle
  storageClassName: nfs
  mountOptions:
    - hard
    - nfsvers=4.1
  nfs:
    path: /opt/k8s-pods/data   # Specify the mount point of NFS
    server: 192.168.1.40  # Specify the address of the NFS server
---
### nfs-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs-pvc
spec:
  storageClassName: nfs
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10Gi

In this example, the PersistentVolume is of type NFS, so it needs the auxiliary program /sbin/mount.nfs to support mounting the NFS file system.

[kadmin@k8s-master ~]$ kubectl get pvc nfs-pvc
NAME      STATUS   VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
nfs-pvc   Bound    nfs-pv   10Gi       RWX            nfs            3m54s
[kadmin@k8s-master ~]$
[kadmin@k8s-master ~]$ kubectl get pv nfs-pv
NAME     CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM             STORAGECLASS   REASON   AGE
nfs-pv   10Gi       RWX            Recycle          Bound    default/nfs-pvc   nfs                     18m

Now, create a pod that mounts the NFS volume:

### nfs-pv-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nginx-pv-pod
spec:
  volumes:
    - name: nginx-pv-storage
      persistentVolumeClaim:
        claimName: nfs-pvc
  containers:
    - name: nginx
      image: nginx
      ports:
        - containerPort: 80
          name: "nginx-server"
      volumeMounts:
        - mountPath: "/usr/share/nginx/html"
          name: nginx-pv-storage

Execute the following command to create the pod:

[kadmin@k8s-master ~]$ kubectl create -f nfs-pv-pod.yaml
pod/nginx-pv-pod created
[kadmin@k8s-master ~]$
[kadmin@k8s-master ~]$ kubectl get pod nginx-pv-pod -o wide
NAME           READY   STATUS    RESTARTS   AGE   IP              NODE           NOMINATED NODE   READINESS GATES
nginx-pv-pod   1/1     Running   0          66s   172.16.140.28   k8s-worker-2   <none>           <none>

[kadmin@k8s-master ~]$ curl http://172.16.140.28
Hello, NFS Storage NGINX

When you mount an NFS volume in a Pod, you need to consider how to back up the data. Velero is a cloud-native backup and restore tool that can help us back up persistent data objects. An example of using Velero is as follows:

velero backup create backupName --include-cluster-resources=true --ordered-resources 'pods=ns1/pod1,ns1/pod2;persistentvolumes=pv4,pv8' --include-namespaces=ns1

Note that by default Velero cannot back up volumes, so it integrates with the open-source component restic to support volume backups. Please note that it is still in the experimental stage and should not be used in a production environment.

Ceph Data Backup and Restore #

Rook is a cloud-native management system for managing Ceph clusters. In a previous lesson, I have already shown you how to create a Ceph cluster using Rook. Now, let’s assume that the Ceph cluster has crashed and needs to be repaired. Yes, we need to fix it manually. The steps are as follows:

Step 1: Stop the Ceph operator and turn off the controller for the Ceph cluster to prevent it from automatically recovering itself.

kubectl -n rook-ceph scale deployment rook-ceph-operator --replicas=0

Step 2: The Ceph monmap tracks the fault tolerance of Ceph nodes. Let’s first update the deployment of the healthy monitoring instance rook-ceph-mon-b while the unhealthy instances are rook-ceph-mon-a and rook-ceph-mon-c. Backup the Deployment object of rook-ceph-mon-b.

kubectl -n rook-ceph get deployment rook-ceph-mon-b -o yaml > rook-ceph-mon-b-deployment.yaml

Modify the monitoring instance:

kubectl -n rook-ceph patch deployment rook-ceph-mon-b -p '{"spec": {"template": {"spec": {"containers": [{"name": "mon", "command": ["sleep", "infinity"], "args": []}]}}}}'

Enter the healthy monitoring instance:

kubectl -n rook-ceph exec -it <mon-pod> bash

Set a few simple variables:

cluster_namespace=rook-ceph
good_mon_id=b
monmap_path=/tmp/monmap

Extract the monmap to a file by using the ceph mon command from the good mon deployment and adding the --extract-monmap=${monmap_path} flag:

ceph-mon \
    --fsid=41a537f2-f282-428e-989f-a9e07be32e47 \
    --keyring=/etc/ceph/keyring-store/keyring \
    --log-to-stderr=true \
    --err-to-stderr=true \
    --mon-cluster-log-to-stderr=true \
    --log-stderr-prefix=debug \
    --default-log-to-file=false \
    --default-mon-cluster-log-to-file=false \
    --mon-host=$ROOK_CEPH_MON_HOST \
    --mon-initial-members=$ROOK_CEPH_MON_INITIAL_MEMBERS \
    --id=b \
    --setuser=ceph \
    --setgroup=ceph \
    --foreground \
    --public-addr=10.100.13.242 \
    --setuser-match-path=/var/lib/ceph/mon/ceph-b/store.db \
    --public-bind-addr=$ROOK_POD_IP \
    --extract-monmap=${monmap_path}

Review the contents of the monmap:

monmaptool --print /tmp/monmap

Remove the bad mon(s) from the monmap:

monmaptool ${monmap_path} --rm <bad_mon>

In this example, we remove mon0 and mon2:

monmaptool ${monmap_path} --rm a
monmaptool ${monmap_path} --rm c

Inject the modified monmap into the good mon by using the ceph mon command and adding the --inject-monmap=${monmap_path} flag:

ceph-mon \
    --fsid=41a537f2-f282-428e-989f-a9e07be32e47 \
    --keyring=/etc/ceph/keyring-store/keyring \
    --log-to-stderr=true \
    --err-to-stderr=true \
    --mon-cluster-log-to-stderr=true \
    --log-stderr-prefix=debug \
    --default-log-to-file=false \
    --default-mon-cluster-log-to-file=false \
    --mon-host=$ROOK_CEPH_MON_HOST \
    --mon-initial-members=$ROOK_CEPH_MON_INITIAL_MEMBERS \
    --id=b \
    --setuser=ceph \
    --setgroup=ceph \
    --foreground \
    --public-addr=10.100.13.242 \
    --setuser-match-path=/var/lib/ceph/mon/ceph-b/store.db \
    --public-bind-addr=$ROOK_POD_IP \
    --inject-monmap=${monmap_path}
Edit the Rook configmap file:

```bash
kubectl -n rook-ceph edit configmap rook-ceph-mon-endpoints

Remove the expired ‘a’ and ‘b’ from the ‘data’ field:

data: b=10.100.13.242:6789

Update the secret configuration:

mon_host=$(kubectl -n rook-ceph get svc rook-ceph-mon-b -o jsonpath='{.spec.clusterIP}')
kubectl -n rook-ceph patch secret rook-ceph-config -p '{"stringData": {"mon_host": "[v2:'"${mon_host}"':3300,v1:'"${mon_host}"':6789]", "mon_initial_members": "'"${good_mon_id}"'"}}'

Restart the monitor instance:

kubectl replace --force -f rook-ceph-mon-b-deployment.yaml

Restart the operator:

# create the operator. it is safe to ignore the errors that a number of resources already exist.
kubectl -n rook-ceph scale deployment rook-ceph-operator --replicas=1

Restoring Data for a Jenkins PVC Application #

If the Jenkins data is corrupted and you want to restore the Jenkins data directory, you can mount the PVC with a temporary image and use kubectl cp to achieve this. Here are the steps:

  1. Get the running permission of the current Jenkins container:
kubectl --namespace=cje-cluster-example get pods cjoc-0 -o jsonpath='{.spec.securityContext}'
map[fsGroup:1000]
  1. Shut down the container:
kubectl --namespace=cje-cluster-example scale statefulset/cjoc --replicas=0
statefulset.apps "cjoc" scaled
  1. Check the PVC:
kubectl --namespace=cje-cluster-example get pvc
  1. Mount the PVC to a temporary image for data recovery:
cat <<EOF | kubectl --namespace=cje-cluster-example create -f -
kind: Pod
apiVersion: v1
metadata:
  name: rescue-pod
spec:
  securityContext:
    runAsUser: 1000
    fsGroup: 1000
  volumes:
    - name: rescue-storage
      persistentVolumeClaim:
       claimName: jenkins-home-cjoc-0
  containers:
    - name: rescue-container
      image: nginx
      command: ["/bin/sh"]
      args: ["-c", "while true; do echo hello; sleep 10;done"]
      volumeMounts:
        - mountPath: "/tmp/jenkins-home"
          name: rescue-storage
EOF
pod "rescue-pod" created
  1. Copy the backup data to the temporary image:
kubectl cp oc-jenkins-home.backup.tar.gz rescue-pod:/tmp/
  1. Unzip the data to the mounted PVC:
kubectl exec --namespace=cje-cluster-example rescue-pod -it -- tar -xzf /tmp/oc-jenkins-home.backup.tar.gz -C /tmp/jenkins-home
  1. Delete the temporary image Pod:
kubectl --namespace=cje-cluster-example delete pod rescue-pod
  1. Restore the Jenkins container:
kubectl --namespace=cje-cluster-example scale statefulset/cjoc --replicas=1

Backing up a Kubernetes Cluster #

A Kubernetes cluster is a distributed system, and the primary reasons for backing up the cluster metadata are as follows:

  • Quick recovery of control nodes rather than compute nodes
  • Recovery of application containers

To understand the backup steps on a single control node server, consider the following:

# Backup certificates
sudo cp -r /etc/kubernetes/pki backup/
# Make etcd snapshot
sudo docker run --rm -v $(pwd)/backup:/backup \
    --network host \
    -v /etc/kubernetes/pki/etcd:/etc/kubernetes/pki/etcd \
    --env ETCDCTL_API=3 \
    k8s.gcr.io/etcd:3.4.3-0 \
    etcdctl --endpoints=https://127.0.0.1:2379 \
    --cacert=/etc/kubernetes/pki/etcd/ca.crt \
    --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
    --key=/etc/kubernetes/pki/etcd/healthcheck-client.key \
    snapshot save /backup/etcd-snapshot-latest.db

# Backup kubeadm-config
sudo cp /etc/kubeadm/kubeadm-config.yaml backup/

To restore data on a control node, perform the following operations:

# Restore certificates
sudo cp -r backup/pki /etc/kubernetes/

# Restore etcd backup
sudo mkdir -p /var/lib/etcd
sudo docker run --rm \
    -v $(pwd)/backup:/backup \
    -v /var/lib/etcd:/var/lib/etcd \
    --env ETCDCTL_API=3 \
    k8s.gcr.io/etcd:3.4.3-0 \
    /bin/sh -c "etcdctl snapshot restore '/backup/etcd-snapshot-latest.db' ; \
    mv /default.etcd/member/ /var/lib/etcd/"

# Restore kubeadm-config
sudo mkdir /etc/kubeadm
sudo cp backup/kubeadm-config.yaml /etc/kubeadm/

# Initialize the master with backup
sudo kubeadm init --ignore-preflight-errors=DirAvailable--var-lib-etcd \
    --config /etc/kubeadm/kubeadm-config.yaml

Using Kubernetes’ native data replication capability kubectl cp and cronjobs, we can handle most data backup and recovery tasks. When dealing with the backup and recovery of distributed systems, in most cases, we’re not backing up the data but rather attempting to remove the faulty nodes from the healthy ones to allow the cluster to heal itself. This is a characteristic of distributed systems—they can heal themselves. However, the weakness of distributed systems is that healing is conditional; if the number of faulty nodes exceeds the quorum of available nodes, even the most intelligent recovery efforts are useless. Therefore, backup remains the last line of defense. Regular and redundant data backup is essential.

References #