25 Persistent Volume and Nfs How to Use Network Shared Storage

25 PersistentVolume and NFS How to Use Network Shared Storage #

Hello, I’m Chrono.

In the previous lesson, we saw PersistentVolume, PersistentVolumeClaim, and StorageClass in Kubernetes. By combining them, we can mount a “virtual disk” for pods to read and write data.

However, the storage volume we used was HostPath, which can only be used on the local machine. Since pods in Kubernetes often “drift” in the cluster, this method is not very practical.

To truly enable storage volumes to be mounted by pods, we need to change the way storage is done. Instead of being limited to local disks, we need to switch to network storage, allowing pods to access the storage devices through network communication as long as they know the IP address or domain name.

Network storage is a very popular application field with many well-known products, such as AWS, Azure, and Ceph. Kubernetes also specifically defines the CSI (Container Storage Interface) specification. However, the installation and use of these storage types are relatively complex, making it difficult to deploy them in our experimental environment.

Therefore, in today’s lesson, I have chosen NFS (Network File System), which is relatively simple, as an example to explain how to use network storage in Kubernetes, as well as the concepts of static storage volume and dynamic storage volume.

How to Install NFS Server #

As a classic network storage system, NFS has a development history of nearly 40 years and has basically become the standard configuration for various UNIX systems. Linux naturally provides support for it.

NFS adopts the Client/Server architecture, which requires selecting a host as the server and installing the NFS server; other hosts that want to use the storage should install the NFS client tools.

Therefore, next, let’s add a server named Storage in our Kubernetes cluster and install NFS on it to achieve the functionality of network storage and shared network disk. However, this Storage is only a logical concept, and when actually installing and deploying it, we can completely merge it into one of the hosts in the cluster. For example, here I reuse the Console mentioned in [Lesson 17].

The new network architecture is shown in the following figure:

It is very easy to install the NFS server on the Ubuntu system using apt:

sudo apt -y install nfs-kernel-server

After installation, you need to specify a storage location for NFS, which is the network shared directory. Generally, a dedicated /data directory should be created. For simplicity, here I use the temporary directory /tmp/nfs:

mkdir -p /tmp/nfs

Next, you need to configure NFS to access the shared directory. Modify /etc/exports, specify the directory name, allowed access subnets, and permission parameters. These rules are somewhat tedious and are not closely related to our Kubernetes course, so I won’t explain them in detail. You only need to add the following line, and note that you should change the directory name and IP address to match your own environment:

/tmp/nfs 192.168.10.0/24(rw,sync,no_subtree_check,no_root_squash,insecure)

After making the modifications, you need to use exportfs -ra to notify NFS to make the configuration take effect, and then use exportfs -v to verify the effect:

sudo exportfs -ra
sudo exportfs -v

Now, you can use systemctl to start the NFS server:

sudo systemctl start nfs-server
sudo systemctl enable nfs-server
sudo systemctl status nfs-server

You can also use the showmount command to check the network mount status of NFS:

showmount -e 127.0.0.1

How to Install NFS Client #

After setting up NFS server, in order for the Kubernetes cluster to access the NFS storage service, we need to install NFS client on each node.

This can be done with a single apt command without any additional configuration:

sudo apt -y install nfs-common

Similarly, on each node, you can use the showmount command to check if NFS can be mounted correctly. Note that you should replace the IP address with the address of the NFS server. In this case, let’s assume the address is “192.168.10.208”:

Now let’s try to manually mount the NFS network storage. First, create a directory /tmp/test as the mount point:

mkdir -p /tmp/test

Then, use the mount command to mount the shared directory from the NFS server to the local directory we just created:

sudo mount -t nfs 192.168.10.208:/tmp/nfs /tmp/test

Finally, let’s test if it works. Create a file x.yml in the /tmp/test directory:

touch /tmp/test/x.yml

Go back to the NFS server and check the shared directory /tmp/nfs. You should see the same file x.yml there, which indicates that NFS installation is successful. From now on, any node in the cluster can write data to the NFS server through the NFS client, achieving network storage.

How to use NFS storage volumes #

Now that we have configured NFS storage system for Kubernetes, we can use it to create new PV (Persistent Volume) objects.

Let’s start by manually allocating a storage volume. We need to specify that the storageClassName is nfs, and the accessModes can be set to ReadWriteMany, which is determined by the characteristics of NFS - it supports multiple nodes simultaneously accessing a shared directory.

Since this storage volume is using NFS, we also need to add the nfs field in the YAML file, specifying the IP address of the NFS server and the shared directory name.

In this case, I have created a new directory 1g-pv inside the /tmp/nfs directory on the NFS server, representing an allocation of 1GB of available storage space. Accordingly, the capacity field in the PV should also be set to the same value, i.e., 1Gi.

After organizing these fields, we have a YAML description file using NFS network storage:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-1g-pv

spec:
  storageClassName: nfs
  accessModes:
    - ReadWriteMany
  capacity:
    storage: 1Gi

  nfs:
    path: /tmp/nfs/1g-pv
    server: 192.168.10.208

Now we can use the kubectl apply command to create the PV object and then use kubectl get pv to check its status:

kubectl apply -f nfs-static-pv.yml
kubectl get pv

I would like to remind you again to ensure that the IP address in spec.nfs is correct and that the path exists (pre-created), otherwise Kubernetes will be unable to mount the NFS shared directory according to the PV’s description, and the PV will remain in the “pending” state and cannot be used.

With PV, we can define the PVC (Persistent Volume Claim) object to request storage. Its content is similar to that of PV, but it does not involve the details of NFS storage. We only need to use resources.request to indicate the desired capacity, which I have set to 1GB, the same as the capacity of the PV:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs-static-pvc

spec:
  storageClassName: nfs
  accessModes:
    - ReadWriteMany

  resources:
    requests:
      storage: 1Gi

After creating the PVC object, Kubernetes will find the most suitable PV based on the description of the PVC and “bind” them together, i.e., the storage allocation is successful:

Let’s create a Pod that mounts the PVC as one of its volumes. The process is the same as in the previous lesson. Just use persistentVolumeClaim to specify the name of the PVC:

apiVersion: v1
kind: Pod
metadata:
  name: nfs-static-pod

spec:
  volumes:
  - name: nfs-pvc-vol
    persistentVolumeClaim:
      claimName: nfs-static-pvc

  containers:
    - name: nfs-pvc-test
      image: nginx:alpine
      ports:
      - containerPort: 80

      volumeMounts:
        - name: nfs-pvc-vol
          mountPath: /tmp

The relationship between Pod, PVC, PV, and NFS storage can be represented visually, as shown in the following image. You can compare it with the usage of HostPath PV to see the difference:

Since we have specified storageClassName as nfs in PV/PVC, and NFS client is also installed on the node, Kubernetes will automatically perform the NFS mount action, mounting the NFS shared directory /tmp/nfs/1g-pv to /tmp in the Pod. No manual management is required.

Finally, let’s test it. After creating the Pod using kubectl apply, we can enter the Pod using kubectl exec and try to operate on the NFS shared directory:

If we exit the Pod and take a look at the /tmp/nfs/1g-pv directory on the NFS server, we will see that the file created in the Pod has indeed been written to the shared directory:

What’s more, since NFS is a network service and not affected by the scheduling of Pods, as long as the network is smooth, this PV object will always be available, and the data will be truly persisted.

How to deploy NFS Provisoner #

Now that we have a network storage system like NFS, do you think the problem of data persistence in Kubernetes has been solved?

Regarding this issue, I think we can use a popular phrase: “It’s solved, but not completely solved.”

By saying “It’s solved,” I mean that the network storage system can indeed allow Pods in the cluster to access data freely. The data still exists even after a Pod is destroyed, and newly created Pods can mount and read data that was previously written. The entire process is fully automated.

By saying “It’s not completely solved,” I mean that PV (Persistent Volume) still requires manual management. Various storage devices must be maintained manually by the system administrator, and PVs need to be created one by one based on development needs. Moreover, it is difficult to precisely control the size of PVs, which can result in cases of insufficient space or wasted space.

In our experimental environment, there are only a few PV requirements, so the administrator can quickly allocate PV storage volumes. However, in a large cluster, there may be hundreds or thousands of applications needing PV storage every day. If storage allocation continues to be manually managed, the administrator may be overwhelmed with work, causing a significant backlog in storage allocation tasks.

So, can we automate the creation of PVs? In other words, can we let computers allocate storage volumes instead of humans?

This concept in Kubernetes is known as " dynamic storage volume ," which can bind a Provisioner object using a StorageClass. This Provisioner is an application that can automatically manage storage and create PVs, replacing the manual labor of the system administrator.

With the concept of “dynamic storage volume,” the manually created PVs we mentioned earlier can be referred to as “static storage volumes.”

Currently, each type of storage device in Kubernetes has a corresponding Provisioner object. For NFS, its Provisioner is called “NFS subdir external provisioner”, and you can find this project on GitHub (https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner).

The NFS Provisioner also runs in Kubernetes in the form of a Pod. Within the GitHub deploy directory, you will find the YAML files required to deploy it. There are three YAML files in total, namely rbac.yaml, class.yaml, and deployment.yaml.

However, these three files are just examples, and to truly run them in our cluster, two of the files need to be modified.

The first file to modify is rbac.yaml, which uses the default default namespace. You should change it to another namespace to avoid mixing with ordinary applications. You can use the “find and replace” method to uniformly change it to kube-system.

The second file to modify is deployment.yaml, which requires more changes. First, you need to change the namespace to the same one used in rbac.yaml, such as kube-system. Then, you need to focus on modifying the IP address and shared directory name in volumes and env to match the NFS server configuration in the cluster.

Based on our current environment, the IP address should be changed to 192.168.10.208, and the directory name should be changed to /tmp/nfs:

spec:
  template:
    spec:
      serviceAccountName: nfs-client-provisioner
      containers:
          ...
          env:
            - name: PROVISIONER_NAME
              value: k8s-sigs.io/nfs-subdir-external-provisioner
            - name: NFS_SERVER
              value: 192.168.10.208        # Change IP address
            - name: NFS_PATH
              value: /tmp/nfs              # Change shared directory name
      volumes:
        - name: nfs-client-root
          nfs:
            server: 192.168.10.208         # Change IP address
            Path: /tmp/nfs                 # Change shared directory name

There is one troublesome task. The deployment.yaml file uses gcr.io as the image repository, which is difficult to pull and not available on Chinese mirror sites. To ensure the smooth progress of the experiment, I had to take a roundabout method and transferred the image to Docker Hub.

Therefore, you also need to change the name of the image from the original “k8s.gcr.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2” to “chronolaw/nfs-subdir-external-provisioner:v4.0.2,” which is essentially modifying the username of the image.

Once you have modified these two YAML files, you can create the NFS Provisioner in Kubernetes:

kubectl apply -f rbac.yaml
kubectl apply -f class.yaml
kubectl apply -f deployment.yaml

By using the kubectl get command, along with the namespace limitation -n kube-system, you can observe that the NFS Provisioner is running in Kubernetes.

How to Use NFS Dynamic Storage Volumes #

Compared to static storage volumes, dynamic storage volumes are much easier to use. With the provisioner, we no longer need to manually define PV objects. Instead, we only need to specify the StorageClass object in the PVC, which will then be associated with the provisioner.

Let’s take a look at the default StorageClass definition for NFS:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-client

provisioner: k8s-sigs.io/nfs-subdir-external-provisioner 
parameters:
  archiveOnDelete: "false"

The key field in the YAML is provisioner, which specifies which provisioner to use. Another field, parameters, adjusts the parameters for the provisioner to run. You need to refer to the documentation to determine the specific values. In this case, archiveOnDelete: "false" means that storage will not be automatically recycled.

After understanding the YAML for StorageClass, you can also customize your own StorageClass with different storage features according to your needs. For example, you can add the field onDelete: "retain" to temporarily retain the allocated storage and manually delete it later:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-client-retained

provisioner: k8s-sigs.io/nfs-subdir-external-provisioner
parameters:
  onDelete: "retain"

Next, let’s define a PVC to request 10MB of storage space from the system, using the default StorageClass nfs-client:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs-dyn-10m-pvc

spec:
  storageClassName: nfs-client
  accessModes:
    - ReadWriteMany

  resources:
    requests:
      storage: 10Mi

With the PVC defined, we can still use volumes and volumeMounts in the Pod to mount it. Kubernetes will automatically find the NFS provisioner and create the appropriate PV object on the NFS shared directory:

apiVersion: v1
kind: Pod
metadata:
  name: nfs-dyn-pod

spec:
  volumes:
  - name: nfs-dyn-10m-vol
    persistentVolumeClaim:
      claimName: nfs-dyn-10m-pvc

  containers:
    - name: nfs-dyn-test
      image: nginx:alpine
      ports:
      - containerPort: 80

      volumeMounts:
        - name: nfs-dyn-10m-vol
          mountPath: /tmp

Create the PVC and Pod using kubectl apply, and let’s take a look at the PV status in the cluster:

From the screenshot, you can see that although we did not directly define the PV object, the NFS provisioner automatically creates a PV with a size of 10MB, which matches the request in the PVC.

If you check the shared directory on the NFS server, you will also find an additional directory with the same name as the automatically created PV, but with the namespace and PVC prefix added:

I have also created a diagram to illustrate the relationships between the Pod, PVC, StorageClass, and provisioner. You can clearly see the associations between these objects and how the Pod ultimately finds the storage device:

Summary #

Alright, in today’s class we continued our study of PV/PVC and introduced the network storage system. We used NFS as an example to explore the usage of static storage volumes and dynamic storage volumes, with the core objects being StorageClass and Provisioner.

Let me summarize today’s main points:

In a Kubernetes cluster, the network storage system is more suitable for data persistence. NFS is the easiest-to-use network storage system and needs to be installed on both the server and client in advance.
You can manually define an NFS static storage volume by writing a PV and specifying the IP address of the NFS server and the shared directory name.
In order to use NFS dynamic storage volumes, you need to deploy the corresponding Provisioner and correctly configure the NFS server in the YAML file.
Dynamic storage volumes do not require manually defining PVs. Instead, you need to define a StorageClass, and the associated Provisioner will automatically create and bind the PV.

Homework #

Finally, it’s time for homework. I have two questions for you to think about:

What are the advantages and disadvantages of dynamic storage volumes compared to static storage volumes?
What role does StorageClass play in the allocation process of dynamic storage volumes?

I look forward to your thoughts. If you find it helpful, feel free to share it with your friends for discussion. See you in the next class.