11 Yaml the Universal Language in the World of Kubernetes

11 YAML- The Universal Language in the World of Kubernetes #

Hello, I’m Chrono.

In our last class, we explored the internal architecture and components of Kubernetes, understanding that it is divided into the control plane and the data plane. The control plane manages the cluster, while the data plane runs the business applications. Within each node, there are components such as the apiserver, etcd, scheduler, kubelet, and kube-proxy, which work together to maintain the stability of the entire cluster.

This unique Master/Node architecture is the foundation of Kubernetes, but relying solely on this “internal skill” is not enough to freely wander the world like many characters in martial arts or fantasy works.

Clearly, this is not the case. Just like the characters in many martial arts or fantasy works, Kubernetes also needs a “technique manual” in order to fully utilize its “internal skills”. Only by cultivating both internal and external skills can one reach the realm of being unrivaled in the world.

And this “technique manual” is the standard working language in the Kubernetes world, YAML. So today, I will talk about why we need YAML, what it looks like, and how to use it.

What are Declarative and Imperative? #

The YAML language used by Kubernetes has a very important feature called “declarative”, which corresponds to another term: “imperative”.

So before we delve into YAML, we need to take a look at these two work modes, “declarative” and “imperative”. Their relationship in the computing world is somewhat similar to the “sword discipline” and “air discipline” in novels.

The Docker commands and Dockerfile we learned in the introductory part belong to the “imperative” mode, and most programming languages also belong to the imperative mode. Its characteristics are strong interactivity, emphasis on sequence and process. You have to “tell” the computer what to do at each step, and list all the steps clearly, so that the program can proceed step by step and finally complete the task, making the computer appear a bit “dumb”.

“Declarative” was relatively rare before the advent of Kubernetes. It is completely opposite to “imperative”. It doesn’t care about the specific process, but focuses more on the result. We don’t need to “teach” the computer how to do something, we just need to tell it a target state, and it will figure out how to complete the task on its own. In comparison, it is more automated and intelligent.

These two concepts are quite abstract and not easy to understand, which is also one of the difficulties that Kubernetes beginners often encounter. The official Kubernetes website deliberately uses air conditioning as an example to explain the principle of “declarative”, but I feel that it is still not very clear. So here, I will use the example of “hailing a taxi” to illustrate the difference between “imperative” and “declarative”.

Suppose you need to take a taxi to the high-speed rail station, but the driver is not familiar with the road conditions, so you have to tell him which route to take, where to turn at which intersection, where to enter and exit the main road, and where to stop. Although you eventually reach your destination, you have to give countless “commands” along the way. Obviously, this journey is “imperative”.

Now let’s change the approach. Still going to the high-speed rail station, but the driver is experienced and knows where there is congestion, which roads have more traffic lights, which sections have temporary control, and where shortcuts can be taken. At this time, if you keep talking, it will undoubtedly disturb his normal driving. So all you need to do is give him a “declaration”: I want to go to the high-speed rail station. Then you can comfortably lie in the back seat and rest, and arrive at your destination smoothly.

In this example of “hailing a taxi”, Kubernetes is such an experienced driver. The Master/Node architecture allows it to be well aware of the state of the entire cluster, and its numerous components and plugins can automatically monitor and manage applications.

At this time, it is not appropriate to deal with it in an “imperative” way, because it knows more comprehensive information than us and does not need us as outsiders to guide the insider. So it is best for us to be a “hands-off manager” and use “declarative” to tell it the goal of the task, such as which image to use and when to run, and let it handle the details of the execution process on its own.

So, what is the best way to send a “declaration” to Kubernetes?

In the container technology, Shell scripts and Dockerfiles can describe “imperative” well, but they are not suitable for “declarative”. At this time, we need to use the specialized YAML language.

What is YAML #

YAML language was created in 2001, three years later than XML. You should be familiar with XML. It is a tag-based language similar to HTML, with many intricacies. While YAML’s name is reminiscent of XML, it is fundamentally different from XML and is more suitable for human reading and easy for computer parsing.

The official website of YAML (https://yaml.org/) provides a complete introduction to the language specification, so I won’t list the details of the language here. I will only discuss some key points related to Kubernetes to help you quickly grasp it.

You need to know that YAML is a superset of JSON, supporting data types such as integers, floats, booleans, strings, arrays, and objects. In other words, any valid JSON document is also a YAML document. If you are familiar with JSON, learning YAML will be much easier.

However, compared to JSON, YAML has simpler syntax and a more clear and compact format. For example:

Indentation and whitespace are used to represent hierarchy (similar to Python), and curly braces and square brackets are optional.
Comments can be written using #, which is a big improvement over JSON.
The format of objects (dictionaries) is basically the same as JSON, but the keys do not need to be enclosed in double quotation marks.
Arrays (lists) are represented as lists starting with - (similar to Markdown).
There must be a space after : for objects and after - for arrays.
--- can be used to separate multiple YAML objects in a file.

Now let’s look at some simple examples of YAML.

First is an array that lists three operating systems using -:

# YAML array (list)
OS:
  - linux
  - macOS
  - Windows

The equivalent JSON for this YAML is as follows:

{
  "OS": ["linux", "macOS", "Windows"]
}

Comparing the two, we can see that YAML is simpler in format. It doesn’t have the hassle of closing curly braces and square brackets, and each element doesn’t need a comma after it.

Next, let’s look at a YAML object that declares one Master node and three Worker nodes:

# YAML object (dictionary)
Kubernetes:
  master: 1
  worker: 3

Its equivalent JSON is as follows:

{
  "Kubernetes": {
    "master": 1,
    "worker": 3
  }
}

Notice that in YAML, double quotation marks are not required for keys, which makes it more comfortable to read.

By combining arrays and objects in YAML, we can describe any Kubernetes resource object. The third example is slightly more complex. You can try to interpret it yourself:

# Complex example, combining arrays and objects
Kubernetes:
  master:
    - apiserver: running
    - etcd: running
  node:
    - kubelet: running
    - kube-proxy: down
    - container-runtime: [docker, containerd, cri-o]

I won’t go into detail about other knowledge points of the YAML language. They are all summarized in this image, which you can refer to on the YAML official website and gradually explore in future courses.

What is an API object #

Learning up to this point is not enough because YAML language is only equivalent to “grammar”. In order to communicate with Kubernetes, we must also have enough “vocabulary” to express “semantics”.

So what should be declared in Kubernetes to make it understand what we mean?

As a cluster operating system, Kubernetes has summarized Google’s years of experience and abstracted many concepts at the theoretical level to describe the management and operation work of the system. These concepts are called “API objects”. Speaking of this name, you may associate it with the Kubernetes component apiServer mentioned in the last class. Yes, it comes from here.

Because apiServer is the only entry point of the Kubernetes system, external users and internal components must communicate with it. And it adopts the URL resource concept of the HTTP protocol, and the API style is also RESTful with GET/POST/DELETE, etc. Therefore, these concepts are naturally called “API objects”.

So what are the API objects?

You can use kubectl api-resources to view all the objects supported by the current Kubernetes version:

kubectl api-resources

In the “NAME” column of the output, is the name of the object, such as ConfigMap, Pod, Service, etc. The second column “SHORTNAMES” is the abbreviation of this resource, which is useful when we use the kubectl command. It can save us from typing too much, for example, Pod can be abbreviated as po, and Service can be abbreviated as svc.

When using the kubectl command, you can also add a --v=9 parameter, which will display the detailed command execution process. You can clearly see the issued HTTP request, such as:

kubectl get pod --v=9

From the screenshot, you can see that the kubectl client is equivalent to calling curl, sending an HTTP GET request to port 8443, and the URL is /api/v1/namespaces/default/pods.

Currently, Kubernetes 1.23 version has more than 50 kinds of API objects, comprehensively describing the cluster’s nodes, applications, configurations, services, accounts, and other information. The apiServer will store them all in the etcd database, and then the kubelet, scheduler, controller-manager, and other components will operate on them through the apiServer, achieving management of the entire cluster at the abstraction layer of the API objects.

How to describe API objects #

Now let’s take a look at how to describe and create API objects in Kubernetes using the declarative approach with YAML.

Do you remember the command we used to run Nginx? It was kubectl run, which is an “imperative” approach, similar to Docker:

kubectl run ngx --image=nginx:alpine

Let’s rewrite it in a “declarative” YAML format, specifying the desired state of the Nginx application, and let Kubernetes decide how to pull the image and run it:

apiVersion: v1
kind: Pod
metadata:
  name: ngx-pod
  labels:
    env: demo
    owner: chrono

spec:
  containers:
  - image: nginx:alpine
    name: ngx
    ports:
    - containerPort: 80

Once you have a basic understanding of YAML, you should be able to understand this YAML file. It defines a Pod, which uses the nginx:alpine image to create a container, opens port 80, and includes other Kubernetes-required formatting.

Since API objects use the standard HTTP protocol, we can borrow the structure of an HTTP message to divide the description of an API object into “header” and “body” sections.

The “header” contains basic information about the API object, with three fields: apiVersion, kind, and metadata.

apiVersion represents the API version used to operate on this resource. Due to Kubernetes’ rapid iteration, different versions may create objects with differences. To distinguish between these versions, the apiVersion field is used, such as v1, v1alpha1, v1beta1, and so on.
kind represents the type of resource object. This should be self-explanatory, such as Pod, Node, Job, Service, etc.
metadata field, as the name suggests, represents some “metadata” about the resource, used to tag and manage objects in Kubernetes.

For example, in the YAML example above, there are two pieces of metadata: name, which assigns the name ngx-pod to the Pod, and labels, which adds some tags (env and owner) for easy searching.

apiVersion, kind, and metadata are all used by kubectl to generate the HTTP requests sent to the API server. You can see them in the request URL by using the --v=9 parameter, such as:

https://192.168.49.2:8443/api/v1/namespaces/default/pods/ngx-pod

Similar to the HTTP protocol, the apiVersion, kind, and metadata fields in the “header” are required for any object, while the “body” section is object-specific and contains different specification definitions for each object. In YAML, the spec field represents the “desired state” of the object.

Looking at the Pod example again, the spec contains an array of containers, where each element is an object that specifies the name, image, ports, and other information:

spec:
  containers:
  - image: nginx:alpine
    name: ngx
    ports:
    - containerPort: 80

Now, by combining these fields, you can see that this YAML document fully describes an API object of type Pod, using the v1 version of the API, with additional details about its name, labels, and status stored in the metadata and spec fields.

You can use kubectl apply or kubectl delete commands with the -f parameter to create or delete objects using this YAML file:

kubectl apply -f ngx-pod.yml
kubectl delete -f ngx-pod.yml

When Kubernetes receives this declarative data and performs POST/DELETE methods based on the HTTP request, it will automatically manage the resource object. You don’t need to worry about which node the object is on, how it is created, or how it is deleted.

How to write YAML #

By now, I believe you should have a general understanding of how to use YAML to communicate with Kubernetes. However, you may still have some questions: with so many API objects, how do we know which apiVersion and kind to use? What fields should be included in metadata and spec? Also, YAML may seem simple, but it can be a bit cumbersome to write as it is easy to make indentation mistakes. Is there a simpler way?

The most authoritative answer to these questions is undoubtedly the official Kubernetes documentation (https://kubernetes.io/docs/reference/kubernetes-api/). All the fields for API objects can be found there. However, the official documentation is extensive and detailed, which can make it difficult to navigate. Therefore, I will introduce a few simple and practical tips below.

The first tip has actually been mentioned before, which is the kubectl api-resources command. It displays the corresponding API versions and types for resource objects. For example, the version for Pod is “v1” and the version for Ingress is “networking.k8s.io/v1”. By following this information, you can never go wrong.

The second tip is the kubectl explain command, which is like the built-in API documentation for Kubernetes. It provides detailed explanations of object fields, saving you the trouble of searching online. For example, if you want to see how to write fields for Pod, you can do the following:

kubectl explain pod
kubectl explain pod.metadata
kubectl explain pod.spec
kubectl explain pod.spec.containers

Using the first two tips, writing YAML becomes relatively easy.

However, we can let kubectl do the work for us and generate a “template” document, eliminating the need for us to type and align the format ourselves. The third tip involves using two special parameters of kubectl: --dry-run=client and -o yaml. The former performs a dry run and the latter generates the output in YAML format. By combining these two parameters, kubectl will not actually create the object but only generate the YAML file.

For example, if you want to generate a YAML template example for a Pod, you can add these two parameters after kubectl run:

kubectl run ngx --image=nginx:alpine --dry-run=client -o yaml

This will generate a perfectly valid YAML file:

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: ngx
  name: ngx
spec:
  containers:
  - image: nginx:alpine
    name: ngx
    resources: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

Next, all you need to do is refer to the documentation of the object, add or remove fields to customize this YAML file.

You can further enhance this tip by defining these parameter strings as shell variables (you can name it whatever you like, such as $do/$go, but here I’ll use $out for illustration purposes). This will make it more convenient to use. For example:

export out="--dry-run=client -o yaml"
kubectl run ngx --image=nginx:alpine $out

In the future, except for some special cases, we will no longer use commands like kubectl run to directly create Pods. Instead, we will write YAML files to describe objects declaratively and use kubectl apply to deploy the YAML and create objects.

Summary #

Well, that’s it for today. We have learned about the differences between “declarative” and “imperative” approaches, the syntax of the YAML language, how to use YAML to describe API objects, and some tips for writing YAML files.

One of the distinguishing features of Kubernetes is its use of YAML as the working language. The declarative nature of the language allows for a more accurate and clearer description of the system’s state, avoiding the introduction of cumbersome steps that could disrupt the system. This aligns well with Kubernetes’ highly automated internal structure. Additionally, the plain text format of YAML makes it easy to version and suitable for CI/CD.

To summarize the key points of today’s content:

YAML is a superset of JSON that supports arrays and objects and is able to describe complex states. It also has good readability.
Kubernetes defines everything in the cluster as API objects and manages them through RESTful interfaces. Describing API objects requires the use of the YAML language, with the mandatory fields being apiVersion, kind, metadata.
The command kubectl api-resources can be used to view the apiVersion and kind of objects, while the command kubectl explain can be used to view the documentation for object fields.
The commands kubectl apply and kubectl delete are used to manage API objects by sending HTTP requests.
Using the --dry-run=client -o yaml option can generate YAML templates for objects, simplifying the writing process.

Homework #

Finally, it’s time for homework. Here are two questions for you to think about:

How do you understand “imperative” and “declarative”? Why is the air conditioner considered “declarative”?
Using the --v=9 parameter, try to explain how YAML is transformed into an HTTP request by kubectl.

Please feel free to share your thoughts in the comments area. Starting from today, we will get used to writing YAML to create objects. If you have any questions during the learning process, please don’t hesitate to leave a comment and ask. I will reply to you as soon as possible. See you in the next class.