19 Knative Based Low Cost Online Application Elastic Auto Scaling

19 Knative-based Low-Cost Online Application Elastic Auto-Scaling #

Why Knative is Needed #

Serverless is already the eagerly anticipated, promising state of the future. Various surveys show that enterprises and developers are already using Serverless to build online services, and this proportion is still growing.

In this trend, let’s take a look at the direction of evolution for IaaS architectures. Initially, enterprises migrated to the cloud and used cloud resources through VMs. Online services were directly deployed in VMs using tools like Ansible, Saltstack, Puppet, or Chef. Launching applications directly in VMs resulted in strong dependencies on the environment configurations of the VMs for online services. With the rise of container technology, people began to deploy applications in VMs using containers.

However, if there are dozens or even hundreds of applications that need to be deployed, it becomes a very headache-inducing task to rapidly deploy and upgrade applications on hundreds or thousands of VMs. Kubernetes addresses these problems very well, so now people are starting to use cloud resources through Kubernetes. With the popularity of Kubernetes, major cloud vendors have started to provide Serverless Kubernetes services, allowing users to use cloud capabilities without having to maintain Kubernetes clusters.

Since Kubernetes is already very good, why do we still need Knative? To answer this question, let’s first summarize the common characteristics of Serverless applications:

On-demand usage, automatic elasticity

On-demand use of cloud resources, automatic scaling up when business volume increases, and automatic scaling down when business volume decreases. Therefore, automatic elasticity is required.

Canary release

Support for multiple version management, the ability to use various canary release strategies when upgrading applications.

Traffic management

Ability to manage north-south traffic, perform canary releases on different versions based on traffic percentages.

Load balancing, service discovery

Automatically increase or decrease the number of instances during application elasticity, and traffic management requires load balancing and service discovery functionality.

Gateway

When multiple applications are deployed in the same cluster, a gateway at the access layer is needed to manage traffic for multiple applications and different versions of the same application.

With the rise of Kubernetes and cloud-native concepts, the first instinct might be to directly deploy Serverless applications on Kubernetes. So, if we want to deploy Serverless applications on native Kubernetes, what would we do?

First, we need a Deployment to manage the workload, and we also need to expose services and implement service discovery capabilities through Services. When there are major changes to the application and a new version is released, it may be necessary to pause and observe, and then continue to gradually increase the proportion of the canary release after confirming that everything is fine. This requires using two Deployments.

The v1 Deployment represents the old version, and during canary release, the number of instances is gradually reduced. The v2 Deployment represents the new version, and during canary release, the number of instances is gradually increased. The HPA (Horizontal Pod Autoscaler) represents elasticity, and each Deployment has an HPA to manage elastic configurations.

There is actually a conflict in this approach: let’s say the v1 Deployment originally had three pods, and during canary release, one pod is upgraded to v2. At this point, only 1/3 of the traffic will be routed to the v2 version. However, when the business peak arrives, because both versions have HPA configurations, v2 and v1 will scale up at the same time, resulting in the number of pods in v1 and v2 no longer being in the initial ratio of 1/3.

Therefore, traditional canary release strategies based on the number of Deployment instances and elasticity configurations naturally conflict. However, if canary release is performed based on traffic proportions, this problem will not exist, which may require introducing the capabilities of Istio.

Introducing Istio as a Gateway component, Istio not only manages traffic canary releases for the same application but also manages traffic for different applications. It looks good, but let’s analyze what problems exist. First, let’s summarize what needs to be done to manually manage Serverless applications on native Kubernetes:

Deployment
Service
HPA
Ingress
Istio
VirtualService
Gateway

Each application needs to maintain a copy of these resources, and if there are multiple applications, multiple copies need to be maintained. These resources are scattered within Kubernetes, making it impossible to see the concept of applications. Additionally, managing them is cumbersome.

What Serverless applications need are application-oriented management actions, such as application hosting, upgrades, rollbacks, canary releases, traffic management, and elasticity. What Kubernetes provides is an abstraction of IaaS usage. Therefore, there is a lack of an application orchestration abstraction between Kubernetes and Serverless applications.

Knative is a Serverless application orchestration framework built on top of Kubernetes. In addition to Knative, there are several other FaaS-like orchestration frameworks in the community, but the applications orchestrated by these frameworks do not have a unified standard. Each framework has its own set of specifications, and they are completely incompatible with the Kubernetes API. The lack of compatibility with the API leads to high difficulty of use and weak replicability. One of the core standards of cloud-native is the Kubernetes API standard, and Knative’s cloud-native feature lies in the fact that Serverless applications managed by Knative maintain the unchanged semantics of the Kubernetes API. Knative has good compatibility with the Kubernetes API, which is its cloud-native characteristic.

What is Knative? #

Knative mainly addresses the problem of providing a common Serverless orchestration and scheduling service on top of Kubernetes, providing atomic operations for higher-level Serverless applications. It also exposes service APIs through Kubernetes native APIs, ensuring seamless integration with the Kubernetes ecosystem. Knative consists of two core modules: Eventing and Serving. This article primarily focuses on the core architecture of Knative Serving.

Introduction to Knative Serving #

The core of Serving is the Knative Service. The Knative Controller automatically operates Kubernetes Service and Deployment based on the configuration of the Service, thereby simplifying application management.

Knative Service corresponds to a resource called Configuration. Each time the Service changes and requires the creation of a new Workload, the Configuration is updated. Every time the Configuration is updated, a unique Revision is created. A Revision can be considered as a version control mechanism for Configuration. In theory, a Revision is not modified once it is created.

The Route is primarily responsible for traffic management in Knative. The Knative Route Controller automatically generates Knative Ingress configuration based on the configuration of the Route. The Ingress Controller implements route management based on Ingress policies.

Knative Serving orchestrates serverless workloads based on traffic. Traffic arrives at the Knative Gateway first. The Gateway splits the traffic to different Revisions based on the percentage specified in the Route configuration. Each Revision has its own independent scaling policy. When incoming traffic increases, the current Revision automatically scales. The scaling policy of each Revision is independent and does not affect each other.

Different Revisions are grayed based on the percentage of traffic. Each Revision has its own independent scaling policy. Knative Serving combines traffic management, elasticity, and gray release perfectly through traffic control. Next, let’s explore the details of the Knative Serving API.

The above diagram demonstrates the working mechanism of the Knative Autoscaler. The Route handles incoming traffic, while the Autoscaler handles elasticity and scaling. When there are no incoming requests, the system scales down to zero. When scaled down to zero, incoming requests are directed to the Activator. Once the first request arrives, the Activator maintains the HTTP connection and notifies the Autoscaler to scale up. After the Autoscaler completes scaling up the first pod, the Activator forwards the traffic to the pod. This ensures that even when scaled down to zero, traffic is not lost.

With this, the core modules and basic principles of Knative Serving have been covered. You should now have a preliminary understanding of Knative. During the exploration of the principles, you may have also realized that using Knative involves maintaining numerous Controller components and Gateway components (such as Istio), plus ongoing investment in IaaS costs and operational costs.

If the Gateway component is implemented using Istio, Istio itself requires more than ten Controllers. For high availability, this number may need to be increased to over twenty Controllers. Knative Serving Controller also requires more than ten Controller instances for high availability. The IaaS costs and operational costs associated with these Controllers are relatively high. Additionally, the cold start problem is evident. Although scaling down to zero reduces the cost of valley periods, the first batch of traffic may still experience timeouts.

Perfect Integration of Knative and Cloud #

To address the issues mentioned above, we have deeply integrated Knative with Alibaba Cloud. Users can still use Knative’s native semantics but with Controllers and Gateways deeply embedded in the Alibaba Cloud ecosystem. This ensures that users can utilize cloud resources using the Knative API without vendor lock-in risks, while also enjoying the existing advantages of Alibaba Cloud’s infrastructure.

The first aspect of the integration is the integration of the Gateway and the cloud. We directly use Alibaba Cloud SLB (Server Load Balancer) as the Gateway. The benefits of using Alibaba Cloud’s SLB include:

Provides support at the cloud product level, with SLA guarantee.
Pay-as-you-go, no need to provision IaaS resources.
Users do not need to bear operational costs and do not need to consider high availability issues; cloud products come with built-in high availability capabilities.

In addition to the Gateway component, the Knative Serving Controller also incurs costs, so we have integrated the Knative Serving Controller with Alibaba Cloud Container Service. Users only need to have a Serverless Kubernetes cluster and enable Knative functionality to leverage cloud capabilities based on the Knative API, without any costs for the Knative Controller.

Let’s now analyze the cold start problem.

In traditional applications that have not enabled elastic configuration, the number of instances is fixed. Serverless applications managed by Knative have elastic policies by default, and they will scale down to zero when there is no traffic. In traditional applications, even in the low traffic period without any requests to process, the number of instances remains unchanged, which actually results in resource waste. However, the advantage is that requests will not time out, and requests that arrive at any time can be handled well. In contrast, when scaling down to zero, the scaling process is triggered only when the first request arrives.

In Knative, scaling from 0 to 1 requires 5 sequential steps. Only after all 5 steps are completed can the first request be processed. However, at this time, the request often times out. Therefore, although scaling down to zero reduces the cost of resident resources, the cold start problem of the first batch of requests is also very noticeable. It can be seen that elasticity is actually seeking a balance between cost and efficiency.

To solve the cold start problem of the first instance, we have introduced the “Reserved Instance” feature. Reserved instances are a unique feature of Alibaba Cloud Container Service for Knative. The default behavior of Knative in the community is to scale down to zero when there is no traffic. However, it is difficult to solve the cold start problem from 0 to 1 after scaling down to zero. In addition to solving the allocation of IaaS resources, Kubernetes scheduling, and image pulling, the cold start problem also involves the application’s startup time, which can range from milliseconds to minutes. The startup time of an application is completely a business behavior and is almost uncontrollable at the underlying platform level.

The solution provided by ASK Knative for this problem is to balance costs and cold start problems through low-priced reserved instances. Alibaba Cloud Elastic Container Instance (ECI) provides instances in various specifications, with different computing capabilities and prices. The following is a comparison of the prices between computing instances and burstable instances for the 2c4G configuration.

From the above figure, it can be seen that the burstable instance is 46% cheaper than the computing instance. Therefore, if there is no traffic, using burstable instances to provide services not only solves the cold start problem but also saves a lot of costs.

In addition to price advantages, burstable instances also have a very eye-catching feature, which is the CPU credit. Burstable instances can utilize CPU credits to handle burst performance demands. Burstable instances can continuously accumulate CPU credits. When the performance cannot meet the load requirements, they can seamlessly improve computational performance by consuming the accumulated CPU credits, without affecting the environment and applications deployed on the instance. With CPU credits, you can allocate computational resources from the perspective of the overall business and smoothly transfer the remaining computational capabilities during business off-peak hours to peak hours of usage (this can be simply understood as hybrid vehicles). For more details about burstable instances, see here.

Therefore, the strategy of ASK Knative is to use burstable instances as reserved instances during business troughs, and seamlessly switch to standard computing instances when the first request arrives. This can reduce the cost of traffic troughs, and the CPU credits obtained during troughs can be consumed when business peaks arrive, ensuring that every penny paid by the user is not wasted.

Using burstable instances as reserved instances is only the default strategy. Users can also specify other types of instances as the specifications for reserved instances. Of course, users can also specify a minimum reserved standard instance to disable the reserved instance feature.

Summary #

Knative is the most popular Serverless orchestration framework in the Kubernetes ecosystem. The native Knative in the community requires dedicated controllers and gateways to provide services. Besides paying IaaS costs, these dedicated instances also bring a lot of operational burdens, making it difficult to achieve serverless transformation for applications. Therefore, in ASK, we fully manage Knative Serving, providing an out-of-the-box ultimate Serverless experience.