12 Pod How to Understand This Core Concept in Kubernetes

12 Pod How to Understand this Core Concept in Kubernetes #

Hello, I am Chrono.

In the past few days, we have learned about YAML, the working language in the Kubernetes world, and have written a brief YAML file that describes an API object: the Pod. The Pod contains the definition of a container in the spec field.

So why doesn’t Kubernetes directly use mature and stable containers? Why does it have a separate abstracted object called the Pod? Why do almost everyone say that the Pod is the most core and fundamental concept in Kubernetes?

Today, I will answer these questions one by one, hoping that by the end of this lesson, you will have a clear answer in your mind.

Why Do We Need Pods #

The word “Pod” originally means “pea pod,” but it has also extended its meaning to include “compartment” or “space capsule.” You can take a look at this picture, which vividly illustrates that a Pod is a structure that contains many components and members.

I believe you are already familiar with container technology, which allows processes to run in a “sandbox” environment with good isolation. It is a great encapsulation for applications.

However, when container technology is deployed in real production environments, this isolation can cause some troubles. Because rarely does an application run completely independently, often several processes need to cooperate with each other to complete tasks. For example, in the “Getting Started” guide, when we set up a WordPress website, we need three containers - Nginx, WordPress, and MariaDB - to work together.

The relationship between the three applications in the WordPress example is relatively loose. They can be scheduled separately and can communicate with each other using IP addresses even if they run on different machines.

However, there are some special cases where multiple applications are tightly coupled and cannot be separated. For example, some applications require other applications to initialize some configurations before they can run, or a log proxy needs to read files stored on the local disk of another application and forward them. If these applications are forcibly separated into two containers, cutting off the connection, they will not work properly.

Can we put all these applications in one container and run them together?

Of course, it is possible, but it is not a good practice. Because the concept of a container is to encapsulate an application independently, it should contain only one process, one application. If there are multiple applications inside, it not only violates the original intention of containerization but also makes it more difficult to manage the container.

To solve the problem of running multiple applications together without breaking the isolation of containers, we need to create a “holding bay” outside the container, allowing multiple containers to remain relatively independent while sharing network, storage, and other resources in a small scope. Moreover, they should always be in a “bound-together” state.

Therefore, the concept of Pods emerged. The containers inside a Pod are like the tiny peas inside a “pea pod.” You can see in the YAML of a Pod that the field “spec.containers” is actually an array, allowing the definition of multiple containers.

If we continue with the analogy of the “small modular house” mentioned earlier, a Pod is like a complete living environment assembled from pre-built rooms such as a living room, bedroom, and kitchen. It not only has the advantages of easy disassembly and relocation, but also is much more powerful than a standalone “studio apartment,” allowing processes to “reside” more comfortably.

Why is a Pod the core object of Kubernetes? #

Pod is considered the core object of Kubernetes because it serves as a “unit” for packaging containers. The containers within a Pod are always scheduled and run together, never separated. Additionally, Kubernetes owns and manages Pods, allowing for customization and modifications without directly impacting the underlying containers. With the abstraction of Pods, Kubernetes becomes adept at managing applications at a cluster level.

Kubernetes orchestrates and manages containers through Pods, treating them as the smallest unit for application scheduling and deployment. As a result, Pods have become the “atom” of the Kubernetes world (although the internals of this “atom” have structure and are not a monolithic block). Based on Pods, more complex and diverse business scenarios can be created.

The following diagram may look familiar, as it illustrates how Kubernetes extends from Pods to other important API objects, such as ConfigMap for configuration information, Job for batch processing, Deployment for multiple instance deployment, and more, representing various practical operational needs.

However, although this diagram is classic and highly valuable as a reference, it is somewhat dated and does not fully capture all Kubernetes resource objects as the platform has evolved over time.

Inspired by this diagram, I have created a new version that focuses on the relationship between Pods and other Kubernetes resource objects, including additional concepts. We will use this updated diagram to explore the various functionalities of Kubernetes in the future.

From these two diagrams, you should be able to see that all Kubernetes resources are directly or indirectly attached to Pods. All Kubernetes functionalities rely on Pods as the foundation, making Pods the natural core object of Kubernetes.

How to Describe a Pod Using YAML #

Since Pods are so important, it is necessary for us to have a detailed understanding of Pods. By understanding the concept of Pods, we have already completed half of our Kubernetes learning journey.

Remember, we can always use the command kubectl explain to see detailed explanations of any field. So, next, I will briefly talk about some commonly used fields in the Pod YAML.

Because the Pod is also an API object, it must have the four basic components: apiVersion, kind, metadata, and spec.

The fields “apiVersion” and “kind” are very simple. For Pods, they have fixed values: v1 and Pod respectively. Generally, the “metadata” should include the fields name and labels.

When creating a container using Docker, we can omit giving the container a name. However, in Kubernetes, a Pod must have a name. This is a convention for all resource objects in Kubernetes. In this course, I usually add the suffix “pod” to the names of Pods, so they can be easily distinguished from other resource types.

“name” is just a basic identifier with limited information, so the “labels” field comes in handy. It can add any number of key-value pairs to “tag” the Pod for classification. Combining it with the name field makes it even easier to identify and manage Pods.

For example, we can use the label env=dev/test/prod based on the runtime environment, or use the label region: north/south based on the data center where it is located. We can also use tier=front/middle/back based on the application’s hierarchy in the system, and so on. Just use your imagination.

The following YAML code describes a simple Pod named “busy-pod”, along with some additional labels:

apiVersion: v1
kind: Pod
metadata:
  name: busy-pod
  labels:
    owner: chrono
    env: demo
    region: north
    tier: back

In general, “metadata” only needs to include name and labels. As for the “spec” field, it contains a lot of crucial information since it needs to manage and maintain the Pod, which is the basic scheduling unit of Kubernetes. Today, I will introduce the most important part, which is “containers”. You can look up the documentation to learn about other fields such as hostname and restartPolicy.

“containers” is an array, and each element inside is a container object. This object represents a container.

Similar to Pods, a container object must also have a name to identify it. And of course, it should have an image field to specify the image it uses. These two fields are mandatory, otherwise Kubernetes will report a data validation error.

Other fields of the container object can mostly correspond to what we learned in the “Getting Started” section about Docker and container technology. They are not difficult to understand, so I will just list a few:

ports: Lists the ports exposed by the container to the outside world, similar to the -p parameter in Docker.
imagePullPolicy: Specifies the image pull policy, which can be set to Always, Never, or IfNotPresent. The default value is usually IfNotPresent, which means that the image will only be pulled remotely if it does not exist locally, reducing network consumption.
env: Defines environment variables for the Pod, similar to the ENV instruction in a Dockerfile. However, in this case, the environment variables are specified at runtime, making them more flexible and configurable.
command: Defines the command to be executed when the container starts, equivalent to the ENTRYPOINT instruction in a Dockerfile.
args: These are the arguments for the command at runtime, similar to the CMD instruction in a Dockerfile. Please note that these two commands have different meanings compared to Docker, so be careful.

Now let’s write the “spec” section for the “busy-pod” and add fields such as env, command, args, etc.:

spec:
  containers:
  - image: busybox:latest
    name: busy
    imagePullPolicy: IfNotPresent
    env:
      - name: os
        value: "ubuntu"
      - name: debug
        value: "on"
    command:
      - /bin/echo
    args:
      - "$(os), $(debug)"

Here, I specified that the Pod should use the image busybox:latest with the image pull policy set to IfNotPresent. I also defined two environment variables, os and debug, and set the startup command to /bin/echo. The arguments are used to output the environment variables defined earlier.

By comparing this YAML file with the Docker command, you can see that the YAML clearly and accurately describes the running state of the container in a “declarative” manner. It is much cleaner and more friendly to humans and machines compared to the long command line of docker run.

How to use kubectl to operate on Pods #

With the YAML file describing the Pod, let me introduce the kubectl commands used to operate on Pods.

The commands kubectl apply and kubectl delete have been mentioned in the last lesson. They can be used with the -f parameter to create or delete Pods using the YAML file, for example:

kubectl apply -f busy-pod.yml
kubectl delete -f busy-pod.yml

However, because we defined the “name” field in the YAML, we can also specify the name directly for deletion:

kubectl delete pod busy-pod

Unlike Docker, Kubernetes Pods do not run in the foreground, they only run in the background (as if the -d parameter is used by default), so we cannot see the output directly. We can use the command kubectl logs, which displays the standard output information of the Pod, and in this case, it will display the values of the two predefined environment variables:

kubectl logs busy-pod

The command kubectl get pod can be used to view the list of Pods and their running status:

kubectl get pod

You will find that this Pod is running with a “CrashLoopBackOff” status. In this case, we can use the kubectl describe command to check its detailed status, which is useful for debugging and troubleshooting:

kubectl describe pod busy-pod

Typically, you need to pay attention to the “Events” section at the end, which displays important events that occurred during the Pod’s running process. For this busy-pod, because it only executed one echo command and then exited, and Kubernetes by default will restart the Pod, it will enter a looping erroneous state of stop-start-stop-start.

Since most applications running in Kubernetes are long-running services that do not exit voluntarily, we can delete this busy-pod and start an Nginx service using the ngx-pod.yml created in the previous lesson, which is the working mode of most Pods.

kubectl apply -f ngx-pod.yml

After starting it, we can use kubectl get pod to check its status, and it should be in the “Running” state:

The kubectl logs command can also output the running logs of Nginx:

In addition, kubectl also provides commands similar to Docker, namely cp and exec. kubectl cp can copy files from the local file system to a Pod, while kubectl exec is used to execute shell commands inside a Pod, and their usage is similar to that of Docker.

For example, if I have a file named “a.txt”, I can use kubectl cp to copy it into the " /tmp" directory of the Pod:

echo 'aaa' > a.txt
kubectl cp a.txt ngx-pod:/tmp

However, the kubectl exec command has a slightly different format from Docker, in that it requires adding -- after the Pod to separate the kubectl command from the shell command. Please be careful when using it:

kubectl exec -it ngx-pod -- sh

Summary #

Alright, today we learned the most core and fundamental concept in Kubernetes, the Pod. We learned how to customize Pods using YAML and how to create, delete, view, and debug Pods using the kubectl command.

Pod abstracts some of the underlying details of containers and provides sufficient control and management capabilities. Compared to the “granularity” of containers and the “coarseness” of virtual machines, Pods can be considered as a “medium-grained” entity that is flexible and lightweight. Therefore, it is very suitable as the basic unit for application scheduling in the field of cloud computing, and has become the “atom” for building all business in the Kubernetes world.

I have briefly listed today’s key points below:

In real-world scenarios, there are often applications that require close collaboration between multiple processes to complete tasks. It is difficult to describe this relationship using only containers. Hence, Pods were introduced to “package” one or more containers and ensure that the processes inside can be scheduled as a whole.
Pod is the smallest unit for managing applications in Kubernetes, and all other concepts are derived from Pods.
Pods should also be described using YAML in a “declarative” manner, with the key field being “spec.containers” where you list the names, images, ports, and other elements that define the running state of containers inside.
Many commands for operating Pods are similar to Docker, such as kubectl run, kubectl cp, kubectl exec, etc. However, there may be some minor differences in these commands, so be aware when using them.

Although Pods are a core concept in Kubernetes and are very important, in practice, Pods are usually not created directly in Kubernetes. This is because Pods are just a simple wrapper around containers and they are somewhat fragile, still some distance away from complex business requirements. To put them into production use, other objects like Jobs, CronJobs, and Deployments need to be added to provide more functionality.

Homework #

Finally, it’s time for homework. Here are two questions for you to ponder:

What difficulties would arise if applications were managed directly using containers without Pods?
What do you think are the differences and connections between Pods and containers?

Feel free to leave comments to participate in the discussion. If you have gained any insights, feel free to share them with your friends for learning together. See you in the next class.