30 System Monitoring How to Use Metrics Server and Prometheus

30 System Monitoring How to Use Metrics Server and Prometheus #

Hello, I am Chrono.

In the previous two lessons, we learned about some management methods for Pods and clusters, with the focus on setting resource quotas to allow Kubernetes users to utilize system resources fairly and reasonably.

Although we have these methods, it is still important to have another aspect in order to effectively manage and utilize Pods and clusters - observability of the cluster. In other words, we want to install “check probes” for the cluster to observe resource utilization and other metrics, making the overall operation of the cluster “transparent and visible” to us. This way, we can accurately and conveniently perform cluster operations and maintenance.

However, observing a cluster cannot be done simply with “probes”. So today, I will show you two system-level monitoring projects provided by Kubernetes for clusters: Metrics Server and Prometheus, as well as the HorizontalPodAutoscaler, which is based on them.

Metrics Server #

If you are familiar with Linux systems, you may know a command top that can display real-time CPU and memory usage of the current system. It is a basic tool for performance analysis and tuning, and it is very useful. Kubernetes also provides a similar command, which is kubectl top. However, by default, this command does not work and you need to install a plugin called Metrics Server to enable it.

Metrics Server is a tool specifically designed to collect core resource metrics of Kubernetes. It collects information from the kubelet on all nodes at regular intervals, but has minimal impact on the overall cluster performance. It only consumes about 1m of CPU and 2MB of memory per node, making it very cost-effective.

The following image from the Kubernetes official website gives you a rough idea of how Metrics Server works: it calls the kubelet API to obtain node and pod metrics, and then sends this information to the apiserver, allowing kubectl and HPA to read the metrics using the apiserver:

You can find the documentation and installation steps of Metrics Server on its project website (https://github.com/kubernetes-sigs/metrics-server). However, if you have followed the steps to deploy a Kubernetes cluster using kubeadm as described in [Lesson 17], you already have all the prerequisites, and you only need to perform a few simple operations to complete the installation.

All the dependencies of Metrics Server are provided in a YAML file, which you can download using wget or curl:

wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

But before you use kubectl apply to create the objects, we have two preparations to make.

The first task is to modify the YAML file. You need to add an additional runtime parameter --kubelet-insecure-tls to the Deployment object of Metrics Server, like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: metrics-server
  namespace: kube-system
spec:
  ... ...
  template:
    spec:
      containers:
      - args:
        - --kubelet-insecure-tls
        ... ...

This is because Metrics Server uses TLS protocol by default and requires certificate verification to communicate securely with kubelet. However, we don’t need this in our experimental environment, and adding this parameter makes our deployment much simpler (but be cautious when using it in production environments).

The second task is to pre-download the Metrics Server image. If you look at the YAML file, you will notice that Metrics Server uses gcr.io as its image repository, which is difficult to download. Fortunately, there are also domestic mirror sites available. You can download the image using the method described in [Lesson 17], rename it, and then load the image into the nodes in your cluster.

Here is a shell script code that you can refer to:

repo=registry.aliyuncs.com/google_containers

name=k8s.gcr.io/metrics-server/metrics-server:v0.6.1
src_name=metrics-server:v0.6.1

docker pull $repo/$src_name

docker tag $repo/$src_name $name
docker rmi $repo/$src_name

After completing these two preparations, we can deploy Metrics Server using the YAML file:

kubectl apply -f components.yaml

Metrics Server belongs to the “kube-system” namespace, and you can use kubectl get pod with the -n parameter to check if it is running properly:

kubectl get pod -n kube-system

Now that we have the Metrics Server plugin, we can use the kubectl top command to view the current resource status of the Kubernetes cluster. It has two subcommands, node to view the resource utilization of nodes, and pod to view the resource utilization of pods.

Since Metrics Server takes some time to collect information, we need to wait for a while before executing the commands to view the status of nodes and pods in the cluster:

kubectl top node
kubectl top pod -n kube-system

From this screenshot, you can see that:

CPU usage of both nodes in the cluster is not high, with 8% and 4%, but memory usage is high. The master node is using almost half of its memory (48%), while the worker node is almost fully utilized (89%).
There are many pods in the “kube-system” namespace, and among them, the apiserver consumes the most resources, using 75m of CPU and 363MB of memory.

HorizontalPodAutoscaler #

With Metrics Server, we can easily view the resource usage of the cluster. But another more important function is to assist in implementing “horizontal autoscaling” for applications.

In [Lesson 18], we mentioned a command kubectl scale that can increase or decrease the number of Pods in a Deployment, that is, horizontal scaling. However, manually adjusting the number of application instances can be cumbersome and requires human involvement. It is also difficult to accurately grasp the timing and respond to sudden high traffic in the production environment. Therefore, it is best to automate the “scaling up” and “scaling down” operations.

For this purpose, Kubernetes defines a new API object called “HorizontalPodAutoscaler”, abbreviated as “HPA”. As the name suggests, it is specifically designed to automatically scale the number of Pods and is applicable to both Deployments and StatefulSets, but not DaemonSets (the reason is obvious).

The ability of the HorizontalPodAutoscaler is based entirely on the Metrics Server. It retrieves the current running metrics of the application from the Metrics Server, mainly CPU usage, and then increases or decreases the number of Pods based on the predefined strategy.

Now let’s take a look at how to use HorizontalPodAutoscaler. First, we need to define a Deployment and a Service, and create an Nginx application as the target object for autoscaling:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ngx-hpa-dep

spec:
  replicas: 1
  selector:
    matchLabels:
      app: ngx-hpa-dep

  template:
    metadata:
      labels:
        app: ngx-hpa-dep
    spec:
      containers:
      - image: nginx:alpine
        name: nginx
        ports:
        - containerPort: 80

        resources:
          requests:
            cpu: 50m
            memory: 10Mi
          limits:
            cpu: 100m
            memory: 20Mi
---

apiVersion: v1
kind: Service
metadata:
  name: ngx-hpa-svc
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: ngx-hpa-dep

In this YAML, I have only deployed one Nginx instance named “ngx-hpa-dep”. Note that the resources field must be specified in its spec to clearly define the resource quotas, otherwise the HorizontalPodAutoscaler will not be able to obtain the metrics of the Pod and will not be able to automate the scaling up and down.

Next, we need to use the command kubectl autoscale to create a sample YAML file for the HorizontalPodAutoscaler. It has three parameters:

min: the minimum number of Pods, which is the lower limit for scaling down.
max: the maximum number of Pods, which is the upper limit for scaling up.
cpu-percent: the CPU usage metric. When it is greater than this value, scaling up will occur, and when it is less than this value, scaling down will occur.

Now, let’s create a HorizontalPodAutoscaler for the Nginx application we just created. We will specify a minimum of 2 Pods, a maximum of 10 Pods, and set the CPU usage metric to a lower value, 5%, for easier observation of scaling:

export out="--dry-run=client -o yaml"              # Define a shell variable
kubectl autoscale deploy ngx-hpa-dep --min=2 --max=10 --cpu-percent=5 $out

The generated YAML description file is as follows:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: ngx-hpa

spec:
  maxReplicas: 10
  minReplicas: 2
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ngx-hpa-dep
  targetCPUUtilizationPercentage: 5

After using the kubectl apply command to create this HorizontalPodAutoscaler, it will find that there is only 1 instance in the Deployment, which does not meet the requirement of the defined minimum value. Therefore, it will first scale up to 2 instances:

From this screenshot, you can see that the HorizontalPodAutoscaler will adjust the number of Pods to 2 according to the description in the YAML, and then continuously monitor the CPU usage of the Pods through the Metrics Server.

Next, let’s put some pressure on Nginx. Run a test Pod using the “httpd:alpine” image, which contains the HTTP performance testing tool ab (Apache Bench):

kubectl run test -it --image=httpd:alpine -- sh

Then we’ll send one million requests to Nginx, lasting for 1 minute. We’ll use kubectl get hpa to observe the status of the HorizontalPodAutoscaler:

ab -c 10 -t 60 -n 1000000 'http://ngx-hpa-svc/'

Since the Metrics Server collects data about every 15 seconds, the automated scaling up and down performed by the HorizontalPodAutoscaler is also done gradually based on these data points.

When it detects that the CPU usage of the target exceeds the predefined value of 5%, it will start to scale up by a power of 2 until the upper limit is reached. It will then continue to monitor for a period of time. If the CPU usage drops, it will scale down to the minimum value again.

Prometheus #

Clearly, with the help of Metrics Server and HorizontalPodAutoscaler, our application management has become easier. However, Metrics Server can only provide limited metrics, such as CPU and memory. To monitor a more comprehensive range of application running status, we need to turn to the authoritative project called “Prometheus”.

In fact, Prometheus was created earlier than Kubernetes. It was initially an open-source project created by former Google employees in 2012, inspired by Borg’s BorgMon monitoring system. Then in 2016, Prometheus became the second project to join CNCF and graduated successfully in 2018, becoming the top project after Kubernetes and a “de facto standard” in the cloud-native monitoring field.

Like Kubernetes, Prometheus is also a massive system. Here, I’ll provide a brief introduction.

The following diagram is the official architecture diagram of Prometheus, which is used in almost all articles about Prometheus, so I can’t help but include it:

The core of the Prometheus system is its Server, which includes a time-series database (TSDB) used for storing monitoring data. Another component, Retrieval, collects data from various targets using the pull method and delivers this data to the outside world through the HTTP Server.

In addition to Prometheus Server, there are three other important components:

Push Gateway, which adapts to some special monitoring targets and converts the default pull mode into push mode.
Alert Manager, the alarm center, sets up rules in advance and sends alarms via email or other methods when problems are detected.
Grafana, a graphical interface that allows the customization of intuitive monitoring dashboards.

Since Prometheus is also under CNCF, it is naturally considered “cloud-native” and running in Kubernetes is a logical choice. However, it contains too many components, making the installation a bit complicated. Here, I have chosen the “kube-prometheus” project (https://github.com/prometheus-operator/kube-prometheus/) because it appears to be relatively easy to operate.

Now let’s follow me to experience Prometheus in a Kubernetes experimental environment.

First, we need to download the kube-prometheus source code package, and the latest version currently is 0.11:

wget https://github.com/prometheus-operator/kube-prometheus/archive/refs/tags/v0.11.0.tar.gz

After extracting the package, all the Prometheus deployment-related YAML files can be found in the manifests directory, totaling nearly 100 files. You can have a general look at them first.

Like Metrics Server, we also need to do some preparation work before installing Prometheus.

The first step is to modify prometheus-service.yaml and grafana-service.yaml.

These two files define the Prometheus and Grafana service objects. We can add type: NodePort to them (refer to [Chapter 20]), which allows access through the IP address of the node directly (of course, you can also configure it as an Ingress).

The second step is to modify kubeStateMetrics-deployment.yaml and prometheusAdapter-deployment.yaml because there are two images stored in gcr.io that need to be downloaded.

Unfortunately, I couldn’t find a way to download them from Chinese websites. To ensure a successful installation, we can download them and then upload to Docker Hub. Therefore, you need to modify the image names and change the prefix to chronolaw:

image: k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.5.0
image: k8s.gcr.io/prometheus-adapter/prometheus-adapter:v0.9.1

image: chronolaw/kube-state-metrics:v2.5.0
image: chronolaw/prometheus-adapter:v0.9.1

After completing these two preparation steps, we need to execute two kubectl create commands to deploy Prometheus. First, navigate to the manifests/setup directory, create the basic objects such as namespaces, and then move to the manifests directory:

kubectl create -f manifests/setup
kubectl create -f manifests

The Prometheus objects are all in the namespace “monitoring”. After creation, we can use kubectl get to check their statuses:

After confirming that these Pods are running properly, let’s take a look at their service ports:

kubectl get svc -n monitoring

Since we modified the Service objects of Grafana and Prometheus, these two services are now available on the nodes. Grafana is using port “30358”, and Prometheus has two ports. Among them, the port “9090” corresponds to “30827” as the web port.

Enter the IP address of the node in the browser (in my case, it is “http://192.168.10.210”), followed by the port number “30827”, and we can see the built-in web interface of Prometheus:

The web interface of Prometheus has a query box that allows us to use PromQL to query metrics and generate visual charts. In this screenshot, I have selected the metric “node_memory_Active_bytes”, which indicates the currently used memory capacity.

The web interface of Prometheus is relatively simple and is usually used for debugging and testing. Let’s also take a look at Grafana. Access the port “30358” of the node (in my case, “http://192.168.10.210:30358”), and it will ask you to log in first. The default username and password are both “admin”:

Grafana has already provided many powerful and easy-to-use dashboards, which you can freely choose from in the left menu under “Dashboards - Browse”:

For example, I selected the dashboard “Kubernetes / Compute Resources / Namespace (Pods)”, which presents a very beautiful chart and is much more visually appealing than the kubectl top command from Metrics Server. It provides a clear overview of the various data:

This is a brief introduction to Prometheus. Further discussion might deviate from our Kubernetes theme. If you are interested in Prometheus, you can refer to its official website for documentation or other learning materials offline.

Summary #

In the era of cloud native, system transparency and observability are very important. Today, we have learned about two system monitoring projects in Kubernetes: Metrics Server, a command-line tool, and Prometheus, a graphical interface. By using them effectively, we can always be aware of the running status of the Kubernetes cluster, achieving “seeing every small detail”.

To summarize today’s content:

Metrics Server is a Kubernetes plugin that can collect core resource metrics of the system, and the relevant command is kubectl top.
Prometheus is the “de facto standard” in the cloud-native monitoring field. It uses the PromQL language to query data and can be used with Grafana to display intuitive graphical interfaces, making monitoring easier.
HorizontalPodAutoscaler implements the automatic scaling feature of applications. It gets the running metrics of applications from Metrics Server and adjusts the number of pods in real-time, which can effectively handle sudden traffic.

Homework #

It’s time for homework. I have two questions for you to think about:

After deploying HorizontalPodAutoscaler, what will happen if you manually scale using kubectl scale?
Do you have any experience with application monitoring? What are the important metrics to consider?

I am looking forward to seeing your comments in the discussion section, where we can discuss with other classmates. See you in the next class.