09 Application Storage and Persistence Data Volumes Core Knowledge

09 Application Storage and Persistence Data Volumes Core Knowledge #

Introduction to Volumes #

Pod Volumes #

First, let’s take a look at the use cases of Pod Volumes:

  • Use case 1: How can we ensure that important data generated by a container in a pod is not lost if the container crashes and is restarted by the kubelet?
  • Use case 2: How can multiple containers in the same pod share data?

Both of these use cases can be effectively solved using Volumes. Next, let’s take a look at the common types of Pod Volumes:

  1. Local storage, commonly used types are emptydir and hostpath.
  2. Network storage: There are currently two implementation methods for network storage. One is in-tree, which means the implementation code is included in the Kubernetes code repository. As Kubernetes continues to support more storage types, this method will impose a heavy burden on the maintenance and development of Kubernetes itself. The second implementation method is out-of-tree, which allows Kubernetes to decouple from storage implementations by abstracting the interface and separating the driver implementations for different types of storage from the Kubernetes code repository. Therefore, out-of-tree is the recommended method for implementing network storage plugins in the community.
  3. Projected Volumes: This is a way to mount configuration information, such as secrets or config maps, as volumes in a container, allowing programs in the container to access the configuration data through the POSIX interface.
  4. PV and PVC are the main focus of today’s discussion.

Persistent Volumes #

avatar

Next, let’s take a look at Persistent Volumes (PV). Since we already have Pod Volumes, why do we need to introduce PV? We know that the lifecycle of a volume declared in a pod is the same as that of the pod itself. There are several common scenarios:

  • Scenario 1: When a pod is rebuilt or destroyed, such as during an image upgrade process managed by a Deployment, a new pod is created and the old pod is deleted. How can the data be reused between the new and old pods?
  • Scenario 2: When the host machine fails, the pods on it need to be migrated. StatefulSets already implement volume migration. This cannot be achieved using Pod Volumes alone.
  • Scenario 3: How can multiple pods share data? We know that multiple containers in the same pod can use Pod Volumes to share data, but it is difficult to express this semantic when multiple pods want to share data.
  • Scenario 4: How can we extend the functionality of data volumes, such as implementing features like snapshot and resize?

In the above scenarios, it is difficult to accurately express the reuse/shared semantics using Pod Volumes, and it is also difficult to extend their functionality. Therefore, Kubernetes introduces the concept of Persistent Volumes, which separates storage and computation, and manages storage resources and compute resources through different components, thereby decoupling the lifecycle association between pods and volumes. This way, when a pod is deleted, the PV it uses still exists and can be reused by newly created pods.

Purpose of PVC #

avatar

After understanding PV, how should it be used?

When users use PV, they actually use PVC. Why was PVC introduced despite the existence of PV? The main reason is to simplify the way Kubernetes users interact with storage and achieve separation of responsibilities. Typically, when users use storage, they only need to declare the required storage size and access mode.

What is an access mode? It is actually whether the storage can be shared by multiple nodes or can only be accessed by a single node (note that it is node level, not pod level). Is it read-only or read-write access? Users only need to care about these things, and they don’t need to concern themselves with the implementation details related to storage.

By introducing the concepts of PVC and PV, user requirements and implementation details are decoupled. Users only need to declare their storage requirements through PVC. PVs are operated and managed by cluster administrators and storage teams, which simplifies the way users use storage. As seen from this, the design of PV and PVC is somewhat similar to the relationship between interfaces and implementations in object-oriented programming. Users only need to care about the interface when using the functionality, without concerning themselves with its complex internal implementation details.

Since PVs are managed by cluster administrators, let’s see how this PV object is created.

Static Volume Provisioning #

The first approach: static provisioning.

avatar

Static Provisioning: Cluster administrators plan how users will use storage in the cluster in advance. They pre-allocate some storage, meaning that they create PVs in advance. When users submit their storage requirements (i.e. PVCs), internal components of Kubernetes will help bind the PVCs and PVs. Then, when users use storage through pods, they can find the corresponding PVs through PVCs, and they can start using them.

What are the drawbacks of static provisioning? As can be seen, it requires cluster administrators to pre-allocate storage, which is difficult to predict the actual needs of users. Let’s take a simple example: if a user needs 20 GB, but the cluster administrators only have 80 GB or 100 GB available for allocation and there is no 20 GB available, then it is difficult to meet the user’s actual needs, leading to resource waste. Is there a better way?

Dynamic Volume Provisioning #

The second approach: dynamic provisioning.

avatar

What does dynamic provisioning mean? It means that now cluster administrators do not pre-allocate PVs. Instead, they create a template file that represents the parameters required to create a certain type of storage (e.g. block storage, file storage), which users do not need to care about as they are related to the implementation of the storage itself. Users only need to submit their storage requirements, i.e. PVC files, and specify the storage template (StorageClass) to be used in the PVC.

The control components in the Kubernetes cluster, combined with the information from PVCs and StorageClasses, dynamically generate the storage (PV) that the user needs. After binding the PVC and PV, pods can use the PV. StorageClass is used to configure the storage template required to create storage, and then PV objects are dynamically created based on the user’s requirements, achieving on-demand allocation. This not only avoids increasing the difficulty for users but also frees cluster administrators from operational and maintenance work.

Use Case Interpretation #

Next, let’s take a look at how Pod Volumes, PV, PVC, and StorageClass are used.

Use of Pod Volumes #

avatar

First, let’s look at the use of Pod Volumes. As shown on the left side of the above figure, we can declare the name and type of the volume in the Volumes field of the pod yaml file. Two volumes are declared here, one is using emptyDir and the other is using hostPath, both of which are local volumes. How do we use this volume in the container? It can actually be used through the volumeMounts field, where the name specified in the volumeMounts field is the volume it uses, and mountPath is the mounting path in the container.

What is subPath for?

Take a look, both containers specify the use of the same volume, which is the cache-volume. When multiple containers share the same volume, we can use subPath to isolate the data. It will create two subdirectories inside the volume, and the data written by container 1 to cache will actually be written to the subdirectory cache1 in the volume, and the data written by container 2 to cache will end up in the subdirectory cache2 under the volume.

There is also a readOnly field, which means read-only mounting. You cannot write data to the mounting point.

In addition, emptyDir and hostPath are both local storage, what are the subtle differences between them? emptyDir is actually a temporary directory that is created during pod creation. This directory is deleted when the pod is deleted, and the data inside it is cleared. hostPath, as the name suggests, is a path on the host machine. After the pod is deleted, this directory still exists and its data will not be lost. This is a subtle difference between the two.

Static PV Usage #

avatar

Next, let’s take a look at how PV and PVC are used.

Let’s first look at the static PV creation method. Static PVs are created by administrators. Here, we take NAS, which is Alibaba Cloud File Storage, as an example. First, I need to create NAS storage on the Alibaba Cloud File Storage console, and then fill in the relevant information of the NAS storage in the PV object. After the PV object is pre-created, users can declare their own storage requirements through PVC and then create pods. When creating pods, use the fields explained earlier to mount the storage to a mount point in a container.

Next, let’s see how to write the yaml file. The cluster administrator first creates the storage at the cloud storage vendor, and then fills in the corresponding information in the PV object.

avatar

The PV corresponding to the Alibaba Cloud NAS file storage just created has an important field: capacity, which is the size of the storage created, accessModes, which is the access mode of the storage created, and we will explain later how many access modes there are.

Then there is a ReclaimPolicy, which means: after this storage is used, should this PV be deleted or retained after the pod and PVC using it are deleted? This is the PV’s recycling policy.

Next, let’s see how users can use this PV object. Users need to create a PVC object when using storage. In the PVC object, only the storage requirement needs to be specified, and the specific implementation details of the storage do not need to be concerned. What are the storage requirements? First, the required size, that is, resources.requests.storage; then the access mode it needs, that is, the access mode needed for this storage, declared as ReadWriteMany here, that is, supporting multiple node read-write access, which is a typical feature of file storage.

avatar

In the left side of the above figure, you can see this declaration: its size and access mode match what we just statically created. In this way, when the user submits the PVC, the relevant components of the K8s cluster will bind the PVC to the PV. Afterwards, when the user submits the pod yaml, they can write the PVC declaration in the volume, and in the PVC declaration, they can declare which PVC to use through claimName. At this time, the mounting method is actually the same as mentioned earlier. When the yaml is submitted, it can find the bound PV through PVC, and then it can use that storage. This is the process of static Provisioning to being used by a pod.

Dynamic PV Usage #

Then let’s take a look at dynamic Provisioning. As mentioned above, system administrators no longer pre-allocate PVs, but only create a template file.

avatar This template file is called StorageClass. In StorageClass, there are important information that needs to be filled out. The first one is provisioner. What is provisioner? It actually determines which storage plugin should be used to create the PV and the corresponding storage when they were created.

These parameters are some detailed parameters that need to be specified when creating storage in k8s. Users do not need to concern about these parameters, such as regionld, zoneld, fsType and its type. ReclaimPolicy has the same meaning as mentioned earlier in PV. It means that after the dynamically created PV is no longer in use and the Pod and PVC are deleted, how should this PV be handled. In this case, we have set it to delete, which means that the PV will also be deleted after the Pod and PVC are deleted by users.

Next, let’s take a look at how users can use it after the cluster administrator submits the StorageClass, which is to submit the template for creating PV. First, a PVC file still needs to be written.

avatar

The storage size and access mode in the PVC file remain unchanged. Now we need to add a new field called StorageClassName, which specifies the name of the template file for dynamically creating PVs. Here, StorageClassName is filled in with csi-disk, as declared above.

After the PVC is submitted, the related components in the K8s cluster will dynamically generate this PV and bind it to the PVC according to the PVC and its corresponding StorageClass. After that, when users submit their own yaml, the usage and the following process are the same as the static usage method described earlier. They can find the dynamically created PV through the PVC and then mount it to the corresponding container for use.

Analysis of Important Fields in PV Spec #

Next, let’s explain some important fields in PV:

avatar

  • Capacity: This is easy to understand, it refers to the storage object’s size.
  • AccessModes: This is also what users need to care about, it specifies how this PV is used. There are three ways:
  • One is single node read-write access;
  • The second one is multiple nodes read-only access, which is a common way to share data;
  • The third one is read-write access on multiple nodes.

When users submit PVC, the two most important fields are Capacity and AccessModes. After submitting PVC, how do the related components in the k8s cluster find the appropriate PV? First, it finds all the PVs that can meet the requirements of the AccessModes in the user’s PVC based on the AccessModes index established for PVs. Then it further filters the PVs based on PVC’s Capacity, StorageClassName, and Label Selector. If there are multiple PVs that meet the conditions, the PV with the smallest size and the shortest accessmodes list will be selected, which is the “least suitable” principle.

  • ReclaimPolicy: This is what we mentioned earlier. What should happen to my PV when the PVC of my user PV is deleted? There are three common ways:
  • The first way is not recommended now in K8s;
  • The second way is delete, that is, the PV will also be deleted after the PVC is deleted;
  • The third way is Retain, which means retaining. After retaining, this PV needs to be manually handled by the administrator.
  • StorageClassName: As we mentioned just now, this is a field that must be specified when dynamically provisioning. It means we need to specify which template file to use to generate the PV.
  • NodeAffinity: It specifies which nodes the created PV can be mounted on. There are actually restrictions. Then through NodeAffinity, the restrictions on scheduling pods that use this PV are also limited. It means that the pod must be scheduled on the nodes that can access this PV in order to use this PV. This field will be explained in detail in the next lecture on storage topology scheduling.

PV State Transition #

avatar

Next, let’s take a look at the state transition of PV. After creating the PV object, it will be in a brief pending state. After the PV is created, it will enter the available state.

The available state means that it is ready to be used. After users submit PVC, the PV and PVC are bound together by the related components in K8s. At this time, both of them are in the bound state. When users finish using the PVC and delete it, the PV will enter the released state. What should happen to this PV, whether it should be deleted or retained? It depends on the ReclaimPolicy we mentioned just now.

Here is one thing that needs to be explained: when the PV is already in the released state, it is not possible to directly return to the available state, which means that it cannot be bound by a new PVC next. If we want to reuse the released PV, what should we do at this time? The first way is to create a new PV object and fill in the information of the previously released PV in the new PV object. In this way, this PV can be combined with a new PVC. The second way is not to delete the PVC object after deleting the pod, so that the PVC bound to the PV still exists. Next time when the pod is used, it can be directly reused through the PVC. This is how the storage migration of the pod managed by the StatefulSet in K8s is implemented.

Operation Demonstration #

Next, I will demonstrate the specific operation methods for static provisioning and dynamic provisioning in the actual environment.

Example of Static Provisioning #

Static provisioning mainly uses Alibaba Cloud NAS file storage, while dynamic provisioning mainly uses Alibaba Cloud cloud disks. They require corresponding storage plugins, which I have already deployed in my K8s cluster (csi-nasplugin* is the plugin required for using Alibaba Cloud NAS in K8s, and csi-disk* is the plugin required for using Alibaba Cloud cloud disks in K8s).

avatar

Next, let’s take a look at the PV yaml file for static provisioning.

avatar

volumeAttributes contains the relevant information of the NAS file system that I pre-created in the Alibaba Cloud NAS console. The key details we need to pay attention to are the capacity of 5Gi, accessModes of multi-node read-write access, reclaimPolicy of Retain (which means that the PV should be retained after the associated PVC is deleted), and the driver used during the use of this volume.

Then, we create the corresponding PV:

avatar

Let’s take a look at the status of the PV in the above image. It is already in the Available state, which means it is ready to be used.

Next, we create the nas-pvc:

avatar

As seen above, the PVC has been created and it is already bound to the PV we created earlier. Let’s take a look at what is written in the PVC yaml.

avatar

It’s actually quite simple - just the size I need and the accessModes I need. After submitting, it will match the existing PVs in our cluster, and once the match is successful, it will be bound.

Next, we create a pod using nas-fs:

avatar

As seen above, both pods are already in the running state.

Let’s take a look at the pod yaml:

avatar

The pod yaml declares the PVC object we just created and mounts it under /data in the nas-container container. In this case, we use deployments discussed in previous lessons to create two replicas of the pod and schedule them on different nodes through anti-affinity.

avatar

As seen above, the two pods are on different host machines.

As shown in the following figure: we log in to the first host machine and check the mount information with findmnt. It is actually mounted on nas-fs as I declared earlier. Then we touch a file named test.test.test at the bottom, and we will log in to the other container to see if it is shared.

avatar

We log out and then log in to the second pod (we logged into the first one just now).

As shown below: we also use findmnt to check, and we can see that the remote mount path of these two pods is the same, which means that we are using the same NAS PV. Let’s take a look at whether the file we created earlier still exists.

avatar We can see that this also exists, which indicates that the two pods running on different nodes are sharing the same NAS storage.

Next, let’s take a look at the situation after deleting the two pods. First delete the Pods, then delete the corresponding PVC (Kubernetes has a protection mechanism for PVC objects, which means that if a pod is found to be using the PVC when deleting it, the PVC cannot be deleted). This may take a moment.

avatar

Let’s see if the corresponding PVC in the picture below has been deleted.

avatar

As shown in the above picture, it has been deleted. Let’s take a look at the nas PV from earlier. It is still there, with a status of Released, which means that the PVC that was using it has been deleted and it has been released. And because our RECLAIN POLICY is Retain, this PV is retained.

Dynamic Provisioning Example #

Next, let’s look at the second example, the dynamic provisioning example. First, manually delete the retained PV, and you can see that there are no PVs in the cluster. Next, let’s demonstrate dynamic provisioning.

First, create a template file to generate the PV, which is the storageclass. Let’s take a look at the content of the storageclass, which is actually very simple.

avatar

As shown in the above picture, I have specified the volume plugin for creating storage in advance (Aliyun Cloud Disk plugin developed by the Aliyun team). We can see that the parameters section contains some parameters needed to create the storage, but users don’t need to worry about this information. Then there is the reclaimPolicy, which means that after the PV created by this storageclass is bound to the PVC and the PVC is deleted, it should be retained or deleted.

avatar

As shown in the above picture: there is currently no PV in this cluster. We dynamically submit a PVC file, so let’s take a look at the PVC file. Its accessModes is ReadWriteOnce (because Aliyun Cloud Disk can only be read and write by a single node, so we declare it in this way). Its storage size requirement is 30G, and its storageClassName is csi-disk, which is the storageclass we just created, meaning that it specifies that it should generate a PV using this template.

avatar

This PVC is currently in the pending state, which means that the corresponding PV is still being created.

avatar

After a while, we see that a new PV has been generated. This PV is actually dynamically generated based on the PVC we submitted and the storageclass specified in the PVC. Afterwards, k8s will bind the generated PV and the PVC we submitted, which is the disk PVC, and then we can use it by creating pods.

Let’s take a look at the pod yaml:

avatar

The pod yaml is very simple, declaring the use of this PVC. Then comes the mount point, and below we can create it.

avatar

As shown in the picture below: let’s take a closer look at the Events. First, it is scheduled by the scheduler. After scheduling, there will be an attach-detach controller that will perform the attach operation for the disk. That is, it will mount our corresponding PV to the node scheduled by the scheduler, and then the container corresponding to the pod can start and use the corresponding disk.

avatar

Next, I will delete the PVC to see if the PV will be deleted according to our reclaimPolicy. First, let’s take a look, at this time the PVC still exists, and the corresponding PV also exists.

avatar

Then delete the PVC, and after deleting it, let’s take a look: our PV has also been deleted, which means that according to the reclaimPolicy, the PV will also be deleted when we delete the PVC.

avatar

That’s all for our demonstration.

Architecture Design #

Processing Flow of PV and PVC #

Let’s take a look at the complete processing flow of the PV and PVC system in K8s. First, let’s examine the mentioned “csi” in the lower right part of this image.

avatar

What is “csi”? “csi” stands for container storage interface, which is the official recommended way for out-of-tree storage plugin implementation by the K8s community. The implementation of “csi” can be divided into two parts:

  • The first part is the general part implemented by the K8s community, such as csi-provisioner and csi-attacher controllers mentioned in this image;
  • The other part is practiced by cloud storage vendors, which integrates with the cloud storage vendor’s OpenApi, mainly implementing operations like create/delete/mount/unmount storage. These correspond to csi-controller-server and csi-node-server in the image.

Next, let’s take a look at the internal processing flow of K8s when a user submits a YAML file. When a user submits a PVC YAML file, a PVC object will first be generated in the cluster. Then, the csi-provisioner controller will watch the PVC object. The csi-provisioner will combine the PVC object with the storageClass declared in the PVC object and make a GRPC call to csi-controller-server. This call will create the actual storage in the cloud storage service, and finally, create a PV object. Finally, after the PVC and PV objects are bound by the PV controller in the cluster, the PV can be used.

When a user submits a pod, it will be first scheduled to a suitable node by the scheduler. Then, the kubelet on the selected node will mount the PV we created earlier to a path that the pod can use using csi-node-server. After that, the kubelet will create and start all the containers in the pod.

Processing Flow of PV, PVC, and Storage Using CSI #

Let’s take a more detailed look at the complete flow of PV, PVC, and storage using CSI with another image.

avatar

It mainly consists of three stages:

  • The first stage (Create stage) occurs when a user submits a PVC. The csi-provisioner creates the storage and generates a PV object. Then, the PV controller will bind the PVC and the generated PV object. Once the binding is done, the create stage is completed.
  • Next, when a user submits a pod YAML, it will be scheduled to a suitable node. After the running node for the pod is selected, it will be watched by the AD Controller. The AD Controller will check which PVs are used in the pod and generate an internal object called VolumeAttachment. This will trigger the csi-attacher to invoke csi-controller-server to perform the actual attach operation by invoking the cloud storage vendor’s OpenApi. The attach operation here is to attach the storage to the node where the pod will run. This completes the second stage, the attach stage.
  • Then, let’s move on to the third stage. The third stage occurs during the process of the kubelet creating the pod. When creating the pod, the kubelet first performs a mount operation. This mount operation is to further mount the attached disk to a specific path that the pod can use. After that, the kubelet will start creating and starting the containers. This is the third stage, the mount stage, of creating and using storage using PV and PVC with CSI.

In summary, there are three stages: the first stage, create stage, mainly focuses on creating the storage; the second stage, attach stage, involves attaching the storage to the node (typically loading the storage under “/dev” on the node); the third stage, mount stage, involves further mounting the corresponding storage to a path that the pod can use. This is the complete flow of PVC, PV, and the usage of storage implemented through CSI.

Conclusion #

And that concludes our content for today. In the next section, I will share with you the knowledge and specific processing flow related to Volume Snapshot and Volume Topology-aware Scheduling. Thank you everyone~

Summary of this Section (to be supplemented) #

The main content of this section has come to an end. Here is a brief summary for everyone.

  • K8s Volume is an important interface and means for user Pods to store business data.
  • The PVC and PV system enhances the ability of K8s Volumes to share, migrate, and scale storage in multi-Pod scenarios.
  • The different supply modes (static and dynamic) of PV (storage) can provide storage for Pods in the cluster in various ways.