03 Macroscopic Understanding of the Overall Architecture

03 Macroscopic Understanding of the Overall Architecture #

To do a good job, one must first sharpen one’s tools. In this section, we will have a macro understanding of the overall architecture of K8S, in order to facilitate further exploration and practice based on this foundation.

Client/Server Architecture #

From a high-level perspective, K8S follows a Client/Server (C/S) architecture. From this viewpoint, the structure can be represented using the following diagram:

                                   +-------------+                              
                                   |             |                              
                                   |             |               +---------------+
                                   |             |       +-----> |     Node 1    |
                                   | Kubernetes  |       |       +---------------+
    +-----------------+            |   Server    |       |                      
    |       CLI       |            |             |       |       +---------------+
    |    (Kubectl)    |----------->| ( Master )  |<------+-----> |     Node 2    |
    |                 |            |             |       |       +---------------+
    +-----------------+            |             |       |       
                                   |             |       |       +---------------+
                                   |             |       +-----> |     Node 3    |
                                   |             |               +---------------+
                                   +-------------+               

On the left side is an official CLI (Command Line Interface) tool called kubectl, which is used to manage clusters and manipulate objects using the Kubernetes API.

On the right side are the backend services of the K8S cluster and the exposed API. As discussed in the previous section, Nodes are the machines used for work, while Master is a role that represents the necessary components for managing the cluster on a Node. For more detailed information on specific components, refer to Section 11 for a detailed analysis of each component.

Of course, here we only draw one Master. In production environments, to ensure high availability of the cluster, we usually deploy multiple Masters.

Master #

Now let’s break it down layer by layer. First is the Master. Here we will only introduce the relevant components that manage the cluster. The Master is the “brain” of the entire Kubernetes cluster. Similar to a brain, it has several important functions:

  • Reception: Handles external requests and internal notifications
  • Publication: Manages the scheduling and administration of the entire cluster
  • Storage: Handles storage

These functions are achieved through several components that work together, typically referred to as the control plane. As shown in the diagram below:

+----------------------------------------------------------+
| Master                                                   |
|              +-------------------------+                 |
|     +------->|        API Server       |<--------+       |
|     |        |                         |         |       |
|     v        +-------------------------+         v       |
|   +----------------+     ^      +--------------------+   |
|   |                |     |      |                    |   |
|   |   Scheduler    |     |      | Controller Manager |   |
|   |                |     |      |                    |   |
|   +----------------+     v      +--------------------+   |
| +------------------------------------------------------+ |
| |                                                      | |
| |                Cluster state store                   | |
| |                                                      | |
| +------------------------------------------------------+ |
+----------------------------------------------------------+

It mainly consists of several important components.

Cluster state store #

Stores all the persistent states of the cluster and provides support for the watch feature to quickly notify any changes to the components.

Currently, Kubernetes uses etcd as its storage layer, so generally, it is directly referred to as the cluster state storage service. In other words, all the states are stored in etcd instances.

Just as we said before, the Master is like the brain of the Kubernetes cluster. Taking a more detailed look, etcd can be considered as the core of the brain. Why is that? You can refer to the later chapters for a detailed analysis. In this chapter, let’s first look at the overall architecture of the cluster from a higher level.

You may ask, is etcd necessary? As of now, etcd is necessary, mainly because it is the internal implementation of Kubernetes.

Around 2014, the community had been proposing to abstract the storage layer and treat the backend storage as a plug-in. Consul, which provides key-value storage capabilities, was a highly anticipated alternative.

However, thanks to the active development team of etcd and significant improvements based on the feedback from the Kubernetes community, etcd remains the only option until now.

If you take a look at the source code of Kubernetes now, you will find that the code for the storage layer is relatively simple and clear. If there is enough capacity in the future, it might be possible to make this part as a plug-in as well.

API Server #

This is the entry point of the entire cluster, similar to the sensory organs of a human body. It receives signals and requests from external sources and writes some information into etcd.

The actual processing logic is much simpler than the three-way handshake:

  • Request to API Server: “Hi, I want to put something into etcd”
  • API Server receives the request: “Who are you? Why should I listen to you?”
  • Take out your identity credentials (usually a certificate) from the request: “It’s me, your master. Please put these things in”
  • Now it depends on the content. If API Server understands the content, it will put it into etcd and reply with “Okay, master, I put it in”. If it doesn’t understand, it replies with “Sorry, master, I don’t understand”

As you can see, it provides authentication-related functions to determine whether permission is granted for the operation. Of course, API Server supports multiple authentication methods, but usually, we use x509 certificates for authentication.

The goal of the API Server is to be an extremely simple server, only providing REST operations to update etcd and acting as the gateway for the cluster. Other business logic is handled through plugins or in other components. For detailed implementation of this part, you can refer to the relevant chapters on API Server analysis.

Controller Manager #

The Controller Manager is probably the busiest part of the Kubernetes cluster. It runs multiple controller processes in the background to regulate the state of the cluster.

When the configuration of the cluster changes, the controller starts to work towards the desired state.

Scheduler #

As the name suggests, the Scheduler is responsible for scheduling the cluster. It continuously monitors the unscheduled Pods in the cluster and based on various conditions, such as resource availability, node affinity, or other constraints, it schedules/binds the Pods to Nodes through the bound API.

During this process, the scheduler generally only considers the initial state of the Nodes for scheduling and does not consider changes in the state of the Nodes during the scheduling process (such as node affinity). As of v1.11.2, such features have not been added to stable functionality.

Node #

We have mentioned the concept of Node in the previous section, so we won’t go into much detail here. Simply put, a Node is a machine that joins a cluster.

How does a Node join the cluster, receive scheduling, and run services? This is made possible by several core components running on the Node. Let’s take a look at the overall structure:

+--------------------------------------------------------+
| +---------------------+        +---------------------+ |
| |      kubelet        |        |     kube-proxy      | |
| |                     |        |                     | |
| +---------------------+        +---------------------+ |
| +----------------------------------------------------+ |
| | Container Runtime (Docker)                         | |
| | +---------------------+    +---------------------+ | |
| | |Pod                  |    |Pod                  | | |
| | | +-----+ +-----+     |    |+-----++-----++-----+| | |
| | | |C1   | |C2   |     |    ||C1   ||C2   ||C3   || | |
| | | |     | |     |     |    ||     ||     ||     || | |
| | | +-----+ +-----+     |    |+-----++-----++-----+| | |
| | +---------------------+    +---------------------+ | |
| +----------------------------------------------------+ |
+--------------------------------------------------------+

Kubelet #

Kubelet implements the most important control functions for Nodes and Pods in the cluster. Without Kubelet, Kubernetes would likely be a pure application that CRUDs through the API Server.

The native execution mode of K8S is to operate on containers of an application, rather than directly manipulating a package or a process like traditional modes. With this mode, applications can be isolated from each other, without affecting one another. In addition, since it operates on containers, the application is isolated from the host, as it does not depend on the host and can be deployed and run on any container runtime (such as Docker).

As mentioned in the previous section, Pods can be a group of containers (including storage volumes). K8S treats Pods as the basic unit for scheduling, separating the concerns during construction and deployment:

  • During construction, the focus is on whether a container can be built correctly and how to build it quickly.
  • During deployment, the concern is whether an application’s service is available, whether it meets expectations, and whether the relevant resources it depends on are accessible.

This isolation model allows for easy decoupling of applications from underlying infrastructure, greatly improving the flexibility of cluster scaling, scaling in/out, and migration.

In the previous section, we mentioned the “Scheduler” component on the Master node, which schedules unbound Pods to Nodes that meet certain conditions. However, whether a Pod can ultimately run on a Node is determined by Kubelet. The specific principles of Kubelet will be discussed in detail in later sections.

Container runtime #

The main functions of a container runtime are to download images and run containers. The most common implementation we encounter is Docker, but there are also other implementations, such as rkt and cri-o.

K8S provides a universal Container Runtime Interface (CRI) that allows any container runtime implementation that complies with this standard to be used in K8S.

Kube Proxy #

We all know that to access a service, we can do so either through a domain name or an IP address. Each Pod gets a virtual IP after creation. In K8S, there is an abstract concept called Service. kube-proxy provides a proxy service that allows you to access Pods through Service.

The actual working principle is that a kube-proxy process is started on each Node, and this process uses iptables rules to achieve the desired effect. A more in-depth analysis of this will be covered in a later section.

Summary #

In this section, we learned that K8S follows a client/server architecture, where the cluster’s master node includes several important components such as the API Server, Controller Manager, etc.

On the other hand, on the node, three necessary components are running: kubelet, container runtime (usually Docker), kube-proxy.

Through the collaboration of all the components, K8S achieves container orchestration and scheduling.

Now that we have completed this section, let’s start building our own cluster.