17 Practical Implementation of Graceful Traffic Switching for Applications

17 Practical Implementation of Graceful Traffic Switching for Applications #

Kubernetes deployments are mostly done using rolling updates, which ensures zero downtime. However, there is a prerequisite for achieving zero downtime deployment that makes it a challenging problem. In order to achieve true zero downtime deployment in Kubernetes, without interrupting or losing any ongoing requests, we need to delve into the operational details of application deployment and identify the root causes for deep analysis. This article builds upon the previous knowledge and provides a more in-depth summary of zero downtime deployment methods.

Getting to the bottom of it #

Rolling Updates #

Let’s first talk about the issues with rolling updates. By default, Kubernetes deployments use a rolling update strategy to update the versions of Pod containers. The idea behind this strategy is to ensure that some old instances are still up and running during the update process, thus preventing any service disruptions. In this strategy, the new version of the Pod is only shut down once the new Pod has started successfully and can handle traffic.

Kubernetes provides a strategy parameter to handle the specific runtime behavior for multiple replicas during the update process. Based on the workload and available computing resources we have configured, the rolling update strategy can fine-tune the number of Pods running in excess (maxSurge) and the number of Pods that can be unavailable (maxUnavailable). For example, given a deployment object that requires three replicas, should we immediately create three new Pods, wait for all the Pods to start and terminate all old Pods except for one, or should we update them one by one? The following code shows a Deployment object named Demo that uses the default RollingUpdate upgrade strategy, allowing only one Pod to run in excess (maxSurge) and no unavailable Pods.

kind: Deployment
apiVersion: apps/v1
metadata:
  name: demo
spec:
  replicas: 3
  template:
    # with image docker.example.com/demo:1
    # ...
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0

15-1-rolling-udpate.gif

This deployment object creates one Pod with the new version at a time, waits for the Pod to start and become ready, then triggers the termination of one of the old Pods, and proceeds to the next new Pod until all replicas are updated. The following shows the output of kubectl get pods and the changes in the new and old Pods over time.

$ kubectl get pods
NAME                             READY     STATUS             RESTARTS   AGE
demo-5444dd6d45-hbvql   1/1       Running            0          3m
demo-5444dd6d45-31f9a   1/1       Running            0          3m
demo-5444dd6d45-fa1bc   1/1       Running            0          3m
...

demo-5444dd6d45-hbvql   1/1       Running            0          3m
demo-5444dd6d45-31f9a   1/1       Running            0          3m
demo-5444dd6d45-fa1bc   1/1       Running            0          3m
demo-8dca50f432-bd431   0/1       ContainerCreating  0          12s
...

demo-5444dd6d45-hbvql   1/1       Running            0          4m
demo-5444dd6d45-31f9a   1/1       Running            0          4m
demo-5444dd6d45-fa1bc   0/1       Terminating        0          4m
demo-8dca50f432-bd431   1/1       Running            0          1m
...

demo-5444dd6d45-hbvql   1/1       Running            0          5m
demo-5444dd6d45-31f9a   1/1       Running            0          5m
demo-8dca50f432-bd431   1/1       Running            0          1m
demo-8dca50f432-ce9f1   0/1       ContainerCreating  0          10s
...

...

demo-8dca50f432-bd431   1/1       Running            0          2m
demo-8dca50f432-ce9f1   1/1       Running            0          1m
demo-8dca50f432-491fa   1/1       Running            0          30s

The Gap Between the Ideal and Reality of Application Availability #

From the example above, the rolling update from the old version to the new version seems to be a smooth update. However, what we don’t want to happen still happens, as the transition from the old version to the new version is not always perfectly smooth, meaning that some client requests may be lost. This is an unacceptable situation.

In order to truly test whether requests are lost when an instance is taken out of service, we need to perform stress tests on our service and collect the results. The main point of interest for us is whether our incoming HTTP requests are being properly handled, including maintaining HTTP connections.

Here, we can use the simple Fortio load testing tool to continuously access the HTTP endpoint of the Demo service with a set number of concurrent connections / goroutines, a request rate of 500 requests per second, and a test timeout of 60 seconds.

fortio load -a -c 50 -qps 500 -t 60s "<http://example.com/demo>"

We can run this test concurrently with the rolling update deployment. The resulting report will show some failed connection requests:

Fortio 1.1.0 running at 500 queries per second, 4->4 procs, for 20s
Starting at 500 qps with 50 thread(s) [gomax 4] for 20s: 200 calls each (total 10000)
08:49:55 W http_client.go:673> Parsed non-ok code 502 (HTTP/1.1 502)
[...]
Code 200: 9933 (99.3%)
Code 502: 67 (0.7%)
Response Header Sizes: count 10000 avg 158.469 +/- 13.03 min 0 max 160 sum 1584692
Response Body/Total Sizes: count 10000 avg 169.786 +/- 12.1 min 161 max 314 sum 1697861
[...]

The output indicates that not all requests were successfully processed.

Understanding the Root Cause #

Now, let’s try to understand what happens when Kubernetes performs a rolling update, where traffic is rerouted from one version of the Pod instance to another. Let’s take a look at how Kubernetes manages workload connections.

Assuming that our clients connect directly to the Demo service from within the cluster, they typically use the service’s virtual IP address resolved through Cluster DNS, which eventually resolves to the Pod instances. This is done via kube-proxy, which runs on each Kubernetes node and dynamically updates iptables to route the requests to the Pod’s IP address. Kubernetes updates the endpoints object in the Pod’s status, so the demo service only includes Pods that are ready to handle traffic.

There’s also a case where client traffic is directed to Pod instances through an ingress. The behavior of application requests during a rolling update could be different in this case. For example, Nginx Ingress directly observes the endpoints object of Pod IP addresses and reloads the Nginx instance when it changes, causing traffic interruption.

It is important to note that Kubernetes aims to minimize service disruption during rolling updates. Once a new Pod is alive and ready to serve, Kubernetes removes an old Pod from the Service by updating its status to Terminating, removing it from the endpoints object, and sending a SIGTERM. The SIGTERM signal causes the container to shut down gracefully (assuming the application can handle it) and not accept any new connections. After the Pod is evicted from the endpoints object, the load balancer routes traffic to the remaining (new) objects. It is worth noting that the removal of Pod records from the endpoints object and refreshing the load balancer configuration occur asynchronously when the load balancer notices the change and updates its configuration. Therefore, there is no guarantee of correct execution order, and some requests may still be routed to terminating Pods, which is the real reason for poor application availability during deployment.

Implementing Zero Downtime Deployment #

Our goal now is to enhance our application to achieve true zero-downtime updates.

Firstly, a prerequisite to achieve this is to ensure that our containers handle termination signals correctly, so the process gracefully shuts down on SIGTERM. Best practices for gracefully shutting down applications can be found online and are beyond the scope of this article.

The next step is to introduce readiness probes to check if our application is ready to handle traffic. Ideally, the probe has checked the status of features that require warm-up, such as cache or database initialization.

To address the issue of Pod terminations not blocking and waiting until the load balancer is reconfigured, include a preStop lifecycle hook. This hook is called before the container terminates and is synchronous, so it must be completed before the final termination signal is sent to the container.

In the example below, we include a preStop hook that waits for 120 seconds before the SIGTERM signal terminates the application process, while Kubernetes removes the Pod from the endpoints object. This ensures that the load balancer can refresh its configuration correctly while the lifecycle hook is waiting.

To achieve this behavior, define a preStop hook in the demo application deployment as follows:

kind: Deployment
apiVersion: apps/v1beta1
metadata:
  name: demo
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: zero-downtime
        image: docker.example.com/demo:1
        livenessProbe:
          # ...
        readinessProbe:
          # ...
        lifecycle:
          preStop:
            exec:
              command: ["/bin/bash", "-c", "sleep 120"]
  strategy:
    # ...

Retest using a load testing tool, and you will find that there are no failed requests, finally achieving a seamless traffic update.

Fortio 1.1.0 running at 500 queries per second, 4->4 procs, for 20s
Starting at 500 qps with 50 thread(s) [gomax 4] for 20s: 200 calls each (total 10000)
[...]
Code 200: 10000 (100.0%)
Response Header Sizes: count 10000 avg 159.530 +/- 0.706 min 154 max 160 sum 1595305
Response Body/Total Sizes: count 10000 avg 168.852 +/- 2.52 min 161 max 171 sum 1688525
[...]

Conclusion #

Atomic operations for smooth traffic switching are the foundation for application or service rolling updates. Only when Kubernetes can handle rolling updates correctly can we achieve seamless traffic updates. On this basis, we can introduce traffic smooth switching by deploying multiple sets of Ingress resources. Additionally, because Helm supports deploying multiple versions of an application, traffic switching can be quickly achieved by selecting the version. These techniques are all based on the fundamental requirement that Pods must ensure no interruption of requests.

References #