06 Traffic Shifting Functions How Do They Execute Under Different Scenarios

06 Traffic Shifting Functions How do they execute under different scenarios #

Hello, I am Jingyuan.

In previous lessons, I shared with you the cold start and scale-up/scale-down of function instances, which are the core features of Serverless. It can be said that when talking about Serverless, cold start and scaling up/down are inevitably mentioned. But have you ever wondered what triggers the function instances in Serverless to go from 0 to 1, from 1 to N for scaling, and then back to 0?

This is exactly the topic I want to share with you in this lesson - the traffic mechanism. More precisely, it is the traffic forwarding mechanism in these scenarios. I hope that through this lesson, you can understand the process of traffic forwarding in different scenarios such as cold and hot start, regular traffic upscaling/downscaling, asynchronous requests, and multiple concurrency. I also hope to help you build a mental picture of the end-to-end traffic forwarding topology in Serverless.

In this lesson, I have chosen Knative as the open-source Serverless engine framework to introduce the traffic forwarding for cold start and load balancing mechanisms. As for the detailed analysis of open-source engine and the overall solution for privatizing the open-source engine, I will discuss them with you in detail in Module 3: Advanced Practices.

Knowledge Reserve #

Before discussing traffic forwarding, let’s first review Knative. It mainly includes three components: Eventing, Serving, and Build. Among these three components, the serving component is mainly responsible for traffic management.

Knative Serving defines a set of specific objects and implements these objects using Kubernetes Custom Resource Definitions (CRDs). Let’s take a look at a simple diagram from the Knative official documentation:

When a user creates a Knative Service, its controller will automatically create corresponding Route and Configuration resources. The Configuration represents the image configuration for the business logic, and whenever the Configuration is updated, a corresponding Revision version is created.

Each Revision represents a snapshot of the Configuration at a specific moment, and each Revision has its own Deployment. If the traffic policy defines forwarding to different Revisions, then it means that the traffic is being forwarded to the Deployments of these Revisions.

The Route inherits the traffic policy configuration from the Service and determines the traffic forwarding direction. If no traffic policy is defined in the Service, then all traffic will be forwarded to the newest Revision by default.

Traffic forwarding #

With the basic understanding of Knative Serving, let’s start with the entry point of traffic.

Entry Point Traffic #

At the gateway level, Knative was designed with scalability in mind right from the beginning. It provides abstractions through the Knative Ingress resource to integrate with different networking extensions. Generally, I recommend using Kourier, Istio, or Contour for integration. If Knative is using Istio as the networking component, external traffic will be sent to the istio-ingressgateway, which will forward the traffic to the user service based on the VirtualService created by Knative Serving, in combination with the Gateway.

Let’s take a look at the specific roles and collaboration between Gateway and VirtualService.

Gateway: Used to expose ports and domains for external services, determining which traffic can pass through. For example, the following example defines that traffic with the domain jy.geekbang.com and port 80 can pass through this Gateway.

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: jy-gateway
spec:
  selector:
    app: istio-ingressgateway
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - jy.geekbang.com

VirtualService: Used to configure traffic routing, similar to Nginx’s server configuration. It determines the final destination of the traffic by associating with the Gateway. Here’s an example that, combined with the previous Gateway, indicates that traffic coming from the gateway will be routed to jy-demo-v1.default.svc.cluster.local.
```
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: jingyuan-vs
spec:
  hosts:
  - jy.geekbang.com
  gateways:
  - jy-gateway
  http:
  - route:
    - destination:
        host: jy-demo-v1.default.svc.cluster.local
```

Traffic Forwarding during Cold Start #

Now let’s take a look at how traffic is handled during a cold start in Knative.

Cold Start Traffic Flow

First, I want to clarify that I have temporarily removed some interactions that are unrelated to traffic to help you focus on the process of traffic forwarding during a cold start, such as the mode transition of ServerlessService and AcutoScaler metric reporting.

Additionally, in the diagram, I have differentiated object resources using colors and line styles. Solid rectangles represent Pod objects, which are the actual processes handling the requests, while dashed rectangles represent non-Pod type resources.

Here, we will focus on the Public Service and Private Service, which determine whether the traffic is directed to the Pod IP or the Activator. Let’s look at each of them.

Public Service: Managed by Knative and its EndPoints are mutable. If the current Revision has User Pods, the EndPoints of the Public Service will be consistent with those of the Private Service, pointing to those instances. But if there are no User Pods during the cold start process, the EndPoints will point to the IP address of the Activator.
Private Service: It uses Label Selector to filter the User Pods generated by the corresponding Deployment shown in the diagram. Other components can monitor the EndPoint status of the Private Service to determine if there are User Pods in the current Revision. Since it is not involved in traffic forwarding itself, it is represented in gray in the diagram.

Cold Start Traffic Flow Diagram

Now that we understand the meaning of each block in the cold start flowchart diagram, let’s compare it with the actual request flow.

Step 0: Preparation: After you package the function image and create the Knative Service, Knative will create Route and Configuration (cfg) resources based on the definition. Then, the cfg will create the corresponding Revision version, which will in turn create the Deployment to provide the service.

Step 1: When a request reaches the Istio Ingress Gateway, it will forward the request to the Public Service based on the Gateway and VirtualService configurations. Step 2: Since there are no User Pods in the cluster at this time (the User Pods in the diagram will only be available after AutoScaler scales up), the endpoints configured for the Public Service are Activators, so the request will be handled by the Activator.

Step 3: The Activator will temporarily cache the received requests, while also tracking the actual traffic for the corresponding Revision and reporting it to the AutoScaler. In addition, the Activator will continuously monitor the Private Service to check if User Pods have been created.

Step 4: The AutoScaler controls the corresponding Revision’s Deployment to scale up the User Pods based on the traffic. During the cold start process, at least one Pod will be created.

Step 5: The Deployment creates a new User Pod and updates the Private Service’s endpoints to point to the newly generated User Pod.

Step 6: After the Activator detects the endpoints associated with the Private Service, it will directly forward the requests to the new User Pod. The User Pod will then execute the business logic.

In fact, there is one more thing to explain in the process from 0 to 1. After the User Pod is successfully created, the AutoScaler will trigger the SKS (Serve Knative Service) mode and synchronize the Public Service’s endpoints with the Private Service’s endpoints, switching back to the regular traffic handling mode. When new traffic arrives, it will reach the User Pod directly via the Public Service without going through the Activator proxy.

However, Knative introduced a Target Burst Capacity (TBC) to control under what circumstances the Activator will be removed from the traffic path. The benefit of this is to prevent sudden increases in traffic from causing individual Pods to become unbalanced, which could lead to lost requests.

This is the cold start process and mechanism of Knative. As we can see, Knative considers the factors of sudden increase in traffic when starting from 0 to 1. It takes into account the convenient scalability at the traffic entry point and the ability to better reuse network components.

Then, how does Knative quickly achieve traffic governance based on existing network components and achieve small traffic online?

Traffic Routing Mechanism #

In our background knowledge, we learned that the Service is responsible for creating Route, Configuration, and Revision resources, and can route traffic to specified Revisions through the Service. In the routing process, the core component is the Route, which is responsible for mapping network endpoints to one or more Revisions and can manage traffic in various ways, including canary traffic and renaming routes.

Let’s first look at a typical Knative Service definition with two versions, traffic-demo-v1 and traffic-demo-v2, where the traffic is routed to different Revisions in an 80:20 ratio, with tags tg1 and tg2 respectively. The traffic strategy is defined as follows:

traffic:
  - tag: tg1
    revisionName: traffic-demo-v1
    percent: 80
  - tag: tg2
    revisionName: traffic-demo-v2
    percent: 20

After the Knative Service is deployed to the default namespace, Knative will create a Route object. Since our Knative uses Istio as the network component, it will continue to create an Istio VirtualService (traffic-demo-ingress).

In the diagram, you can see that the first VirtualService is associated with the two Gateways created by Knative. The second Gateway is “mesh”, which is a reserved keyword in Istio, meaning that the rules of the VirtualService are applied to all sidecars in the cluster. We can ignore it for now.

Let’s take a look at the generated route of traffic-demo-ingress. Since there is a lot of configuration information, I have listed the relevant parts for you to review more clearly:

route:
  - destination:
      host: traffic-demo-v1.default.svc.cluster.local
      port:
        number: 80
    headers:
      request:
        set:
          Knative-Serving-Namespace: default
          Knative-Serving-Revision: traffic-demo-v1
    weight: 80
    - destination:
      host: traffic-demo-v2.default.svc.cluster.local
      port:
        number: 80
    headers:
      request:
        set:
          Knative-Serving-Namespace: default
          Knative-Serving-Revision: traffic-demo-v2
    weight: 20

You may wonder why the host format of the destination is named as traffic-demo-v1.default.svc.cluster.local.

Here’s a brief explanation: this naming convention is the DNS name of Kubernetes Service and is used internally by Kubernetes. The name consists of <service-name>.<namespace>.svc.cluster.local. The zone name is usuallycluster.local. For example, if a traffic-demo-v1 service is deployed in the default namespace, its DNS name will be traffic-demo-v1.default.svc.cluster.local.

In Knative, we can usually customize a domain suffix by modifying the config-domain configuration in order to access external requests. Then, the Istio ingress gateway will implement traffic routing based on the strategies configured in the VirtualService, resulting in the following diagram:

This is how Knative achieves small traffic/canary deployment. However, cloud service providers may have their own self-developed small traffic experiment infrastructure service capabilities due to the comprehensiveness of their platforms, which usually sample and distribute traffic based on specific policies.

Common Features #

In the above content, we started with traffic entry and learned about the traffic forwarding mechanism of Serverless engines under cold start and low traffic. So what other enhancements do cloud vendors have for traffic forwarding? Let’s talk about two common features: asynchronous invocation and single instance, multiple concurrency traffic forwarding mechanisms.

Traffic scheduling under asynchronous invocation #

From what we have learned earlier, synchronous invocation requests can be routed for load balancing, but what are the differences between asynchronous invocation and synchronous invocation?

Some offline tasks are often suitable for being separated from the synchronous flow and processed asynchronously. For example, log cleaning, scheduled triggering, audio and video processing, etc. The mainstream function compute platforms currently provide asynchronous processing capabilities for functions.

When users initiate calls via the asynchronous API interface, you do not need to wait for a response from the function code. You just need to hand over this request to the platform for processing. Of course, in most cases, various asynchronous triggers are used in conjunction with each other. For example, as we have learned before, the object storage trigger can be used to implement audio and video transcoding and ETL business logic processing by setting the matching rules for storage bucket and storage objects.

Here, let’s briefly explain the process of traffic scheduling under asynchronous invocation.

Requests first arrive at the asynchronous processing module and are queued in order. This is why the asynchronous processing module usually combines with message queues. When it’s your request’s turn to be processed, the asynchronous processing service will send the request to the corresponding function instance for execution and feedback the execution result to the monitoring logs you have configured. In addition to this, mainstream function compute platforms such as AWS and Alibaba also support asynchronous execution strategies. You can configure downstream services for “successful execution” and “execution failure” based on the final result of each asynchronous execution.

In general, asynchronous invocation primarily buffers the requests through the event/message queue and processes them according to the request mechanism of synchronous invocation. Apart from the difference at the entry point, the subsequent synchronous requests are still similar to what we have mentioned above.

Traffic forwarding under multiple concurrency #

Now let’s take a look at the traffic scheduling process under single instance, multiple concurrency.

First, let’s look at two concepts:

QPS: the number of requests that a function instance pod can respond to per second.
Concurrency: the number of requests received by a function instance pod at the same time.

From these two concepts, we can see that increasing concurrency does not necessarily lead to an increase in QPS. After reaching this consensus, let’s see what key points we need to pay attention to when implementing concurrency in function compute.

First, your service needs to be of the HttpServer type.

If you are using a cloud vendor’s standard runtime, you need to check if it supports it. In the section on runtimes, I will provide you with an analysis of code snippets related to the GoLang runtime. When a request arrives, besides running in the form of HttpServer, it will also be processed in the form of go func. This way can further improve the concurrent processing capability of the function to some extent.

Second, your service needs to expose a port in the framework for traffic forwarding.

Third, you need a component to control concurrency. Since cloud vendors have not disclosed their implementations, let’s go back to Knative and see what it offers. It provides the combination of queue-proxy and AutoScaler, which perfectly solves the problem of traffic concurrency.

queue-proxy is a sidecar container that runs alongside user containers in the same pod. Each request that arrives before reaching the user container passes through the queue-proxy container.

Its main purpose is to count and limit the number of concurrent requests reaching the user container. When you set a concurrency limit for a revision, such as setting it to 10, queue-proxy ensures that no more than 10 requests reach the user container.

So what if more than 10 requests arrive? They will be temporarily stored in the queue and, at the same time, statistics on the concurrency of the current user pod will be provided to AutoScaler via port 9090 for scaling reference.

Conclusion #

In conclusion, let’s summarize. Today, I introduced to you the process of traffic forwarding in the Serverless paradigm, including cold start, low traffic, asynchronous invocation, and traffic scheduling in scenarios with multiple concurrent requests.

Firstly, we chose Knative, which is widely used by the community and experienced customers, as the starting point. In order to help you understand the process of traffic forwarding, we first explained the key resource objects and their roles in Knative. Through the introduction of traffic ingress, we can see that the scalability of the Knative gateway layer is worth referring to when designing cloud-native architectures.

Next, I analyzed in detail the execution process of a request from 0 to 1 when traffic arrives at a container instance. This involves the core Activator, which bridges the gap between 0 and 1 and between N and 0, and can cache and distribute requests based on request volume. The resources that work together with Activator include Serverless Service (SKS) and AutoScaler.

At the same time, we can define the traffic ratio of revisions in a Knative Service and configure different version-based traffic splitting strategies by creating an Istio VirtualService.

In asynchronous scenarios, we can also use queues to smooth out surges and fill valleys, further enhancing the capabilities of the Serverless platform.

Finally, I mentioned several elements to achieve concurrent response, where the HttpServer service and the bypass traffic proxy component are a good combination. The traffic forwarding capability allows you to enjoy high concurrency processing capabilities in the Serverless paradigm.

I believe that after studying today’s lesson, you are able to learn using open-source engine frameworks and have a better grasp when referring to cloud vendor introductions. In the advanced practical section, I will discuss with you the capabilities of building a Serverless platform based on open-source engines. You can look forward to it.

Thought Question #

Alright, the lesson comes to an end here, and I have a thought question for you.

How does Knative respond to sudden surges in traffic? What parameters can be controlled?

Please feel free to write your thoughts and answers in the comments section. Let’s exchange ideas and discuss together. Thank you for reading, and you’re welcome to share this lesson with more friends for learning and discussion.