19 Using Rook to Build a Production Ready Storage Environment Practical Implementation

19 Using Rook to Build a Production-Ready Storage Environment - Practical Implementation #

Rook is a storage service framework built on top of Kubernetes. It supports the creation and management of various underlying storage systems such as Ceph and NFS. It helps system administrators automate the entire lifecycle of storage, including deployment, startup, configuration, provisioning, scaling, upgrading, migration, disaster recovery, monitoring, and resource management. This seems like a lot of tasks, but Rook aims to reduce the operational complexity by leveraging Kubernetes and Rook to manage these tasks for you.

Managing Ceph Cluster with Rook #

Ceph distributed storage is the first officially supported storage engine marked as Stable in Rook. During my verification process of using Rook to operate Ceph, I found that the community documents and scripts are mixed together, making it difficult for beginners to know how to step by step experience the process of setting up Ceph with Rook. This reflects the long-term iterative process of the technical difficulty and compatibility of distributed storage. The original intention of Rook was to reduce the difficulty of deploying and managing Ceph clusters, but it didn’t work out as expected. The initial use process was not user-friendly, and there were many unknown issues in the official documentation.

Before installing Ceph, it is important to note that the latest version of Ceph only supports the BlueStore backend, which only works with raw devices and does not support creating storage blocks on the local file system. Due to the confusion in the Rook documentation, initially we need to find the installation script directory ourselves, which is located at:

https://github.com/rook/rook/tree/master/cluster/examples/kubernetes/ceph

$ git clone https://github.com/rook/rook.git
$ cd rook
$ git checkout release-1.4
$ cd cluster/examples/kubernetes/ceph

$ kubectl create -f common.yaml
# Check if the namespace "rook-ceph" exists
$ kubectl get namespace
$ kubectl create -f operator.yaml
# The above steps must ensure that the pods are in a "running" or "complete" state before proceeding to the next stage, otherwise the process is likely to fail. These steps may take some time.
$ kubectl create -f cluster.yaml
# Wait for the Ceph cluster to be created

$ kubectl -n rook-ceph get pods
# mgr 1, mon 3, 
# rook-ceph-crashcollector (as many as the number of nodes)
# rook-ceph-osd (as many as the number of disks, indexed from 0)

Ceph has many issues and often requires the use of the toolbox to check certain situations. Follow the steps below to deploy the toolbox:

$ kubectl create -f toolbox.yaml
$ kubectl -n rook-ceph get pods | grep ceph-tools
rook-ceph-tools-649c4dd574-gw8tx   1/1  Running  0   3m20s
$ kubectl -n rook-ceph exec -it rook-ceph-tools-649c4dd574-gw8tx bash
$ ceph -s
cluster:
    id:     9ca03dd5-05bc-467f-89a8-d3dfce3b9430
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum a,d,e (age 12m)
    mgr: a(active, since 8m)
    osd: 44 osds: 44 up (since 13m), 44 in (since 13m)

  data:
    pools:   1 pools, 1 pgs
    objects: 0 objects, 0 B
    usage:   45 GiB used, 19 TiB / 19 TiB avail
    pgs:     1 active+clean
# Available capacity in the Ceph cluster
$ ceph df
# Distribution of Ceph OSDs across nodes
$ ceph osd tree
# Delete the Ceph toolbox
$ kubectl delete -f toolbox.yaml

Using the Dashboard to view the status of Ceph:

$ vim dashboard-external-https.yaml
apiVersion: v1
kind: Service
metadata:
  name: rook-ceph-mgr-dashboard-external-https
  namespace: rook-ceph
  labels:
    app: rook-ceph-mgr
    rook_cluster: rook-ceph
spec:
  ports:
  - name: dashboard
    port: 8443
    protocol: TCP
    targetPort: 8443
  selector:
    app: rook-ceph-mgr
    rook_cluster: rook-ceph
  sessionAffinity: None
  type: NodePort
$ kubectl create -f dashboard-external-https.yaml
$ kubectl -n rook-ceph get service
rook-ceph-mgr-dashboard-external-https   NodePort    10.107.117.151   <none>        8443:31955/TCP      8m23s

The access address is 31955, and you can access it at https://master_ip:31955. The username is admin, and you can find the password online:

$ kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo

Clean up Ceph:

$ cd /rook/cluster/examples/kubernetes/ceph
$ kubectl -n rook-ceph delete cephcluster rook-ceph
$ kubectl -n rook-ceph get cephcluster
# Confirm that rook-ceph has been deleted
$ kubectl delete -f operator.yaml
# Delete the cluster
$ kubectl delete -f common.yaml
$ kubectl delete -f cluster.yaml

Managing NFS File System with Rook #

NFS file system is still a common storage solution in domestic enterprises. Using Rook to manage NFS file systems can greatly facilitate the storage environment for developers. Before installing Rook, you need to install the NFS Client package. Install nf-utils on CentOS nodes and nf-common on Ubuntu nodes. Then you can install Rook. The steps are as follows:

git clone --single-branch --branch v1.4.6 https://github.com/rook/rook.git
cd rook/cluster/examples/kubernetes/nfs
kubectl create -f common.yaml
kubectl create -f provisioner.yaml
kubectl create -f operator.yaml

# Check the status
[root@dev-mng-temp ~]# kubectl -n rook-nfs-system get pod
NAME                                   READY   STATUS    RESTARTS   AGE
rook-nfs-operator-59fb455d77-2cxn4     1/1     Running   0          75m
rook-nfs-provisioner-b4bbf4cc4-qrzqd   1/1     Running   1          75m

Create permissions, with the following content in rbac.yaml:

---
apiVersion: v1
kind: Namespace
metadata:
  name: rook-nfs
---
apiVersion: v1

kind: ServiceAccount
metadata:
  name: rook-nfs-server
  namespace: rook-nfs
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: rook-nfs-provisioner-runner
rules:
  - apiGroups: [""]
    resources: ["persistentvolumes"]
    verbs: ["get", "list", "watch", "create", "delete"]
  - apiGroups: [""]
    resources: ["persistentvolumeclaims"]
    verbs: ["get", "list", "watch", "update"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["create", "update", "patch"]
  - apiGroups: [""]
    resources: ["services", "endpoints"]
    verbs: ["get"]
  - apiGroups: ["policy"]
    resources: ["podsecuritypolicies"]
    resourceNames: ["rook-nfs-policy"]
    verbs: ["use"]
  - apiGroups: [""]
    resources: ["endpoints"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]
  - apiGroups:
    - nfs.rook.io
    resources:
    - "*"
    verbs:
    - "*"
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: rook-nfs-provisioner-runner
subjects:
  - kind: ServiceAccount
    name: rook-nfs-server
    namespace: rook-nfs
roleRef:
  kind: ClusterRole
  name: rook-nfs-provisioner-runner
  apiGroup: rbac.authorization.k8s.io


Execute YAML creation permissions:


kubectl create -f rbac.yaml


The current mainstream practice is to create NFSServer using dynamically allocated resources. The steps are as follows:


kubectl create -f nfs.yaml


# sc.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  labels:
    app: rook-nfs
  name: rook-nfs-share1
parameters:
  exportName: share1
  nfsServerName: rook-nfs
  nfsServerNamespace: rook-nfs
provisioner: rook.io/nfs-provisioner
reclaimPolicy: Delete
volumeBindingMode: Immediate


`kubectl create -f sc.yaml` will create a StorageClass, and then resources can be requested:


apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rook-nfs-pv-claim
spec:
  storageClassName: "rook-nfs-share1"
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Mi


`kubectl create -f pvc.yaml` will create a file volume. Verify the result:


[root@dev-mng-temp nfs]# kubectl get pvc
NAME                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
rook-nfs-pv-claim   Bound    pvc-504eb26d-1b6f-4ad8-9318-75e637ab50c7   1Mi        RWX            rook-nfs-share1   7m5s


Test scenario used:


> kubectl create -f busybox-rc.yaml
> kubectl create -f web-rc.yaml

> kubectl get pod -l app=nfs-demo

> kubectl create -f web-service.yaml

> echo; kubectl exec $(kubectl get pod -l app=nfs-demo,role=busybox -o jsonpath='{.items[0].metadata.name}') -- wget -qO- http://$(kubectl get services nfs-web -o jsonpath='{.spec.clusterIP}'); echo

Thu Oct 22 19:28:55 UTC 2015
nfs-busybox-w3s4t


If you find that the NFS Server is not running, you can use this command to view the problem:


kubectl -n rook-nfs-system logs -l app=rook-nfs-operator


Summary


Since I started with the Rook project, its target positioning is still accurate, and it really solves the pain point of simplifying Ceph installation and configuration. Based on the experience of using Ceph, it begins to inject more storage drivers, such as NFS storage drivers. Using it is not complicated, but its documentation is really bad. There is no one in the community to maintain these documents, which means that many descriptions in the article are out of date, and you have no idea how to configure it correctly. You could easily make configuration errors. So when using it, you still need to carefully read through the contents of the yaml document and understand its functions before installing it, which will lead to better results. This imperfection is also an opportunity for enthusiasts of open source technology, allowing you to participate in the Rook project by modifying the documentation. After I sorted it out, using the latest version of the installation steps, you can deploy your distributed storage environment in minutes. Rook is indeed more efficient, and it is worth recommending and extensively practicing.


References

- <https://draveness.me/papers-ceph/>
- <https://rook.io/docs/rook/v1.4/nfs.html>