01 Life Cycle The basic flow of how functions are executed #

Hello, I am Jingyuan.

After reading the content of the previous section on mental preparation, I believe your understanding of Serverless has become clearer. It is a concept of service design that only requires focus on the development of business logic, without having to worry about environment operation and maintenance and machine management, and at the same time has features such as elasticity and pay-as-you-go.

FaaS (Function as a Service) is the first technology I recommend to you to enter the world of Serverless. Today, I would like to introduce you to the basic process of FaaS, so that you can have a preliminary understanding of FaaS as a whole and have an outline of function computation development, debugging, deployment, and execution in your mind.

In this way, when you encounter a certain confusion in the future, such as “Where did the function get uploaded to?” or “Why did the function timeout during execution?”, you can quickly know which knowledge point you need to delve into.

In this lesson, I will take “Hello Serverless” as an example and select Baidu Intelligent Cloud Function Compute (CFC) as the operating platform. I will walk you through the process and principle of FaaS execution from the perspectives of user usage and platform services. This includes the entire lifecycle of function creation, storage, execution, etc. At the same time, I will also guide you through the process of developing and running function computation.

Initial experience is recommended to use cloud service provider platforms #

Generally, when you are first exposed to Serverless, I recommend starting with the tutorials provided by public cloud service providers to gain some understanding from a usage perspective, and then look at open source frameworks or find some materials for in-depth study.

Why do I recommend this?

Firstly, the function computation platforms of cloud service providers provide users with various runtime environments (Python, Java , PHP, Node.js, Golang, etc.), which can meet the development needs of different technology stacks. You can choose any language you are proficient in to experience, avoiding the cost of switching languages. If you choose an interpreted language, you can develop, debug, and deploy directly on the cloud platform, which is very convenient. In this way, through low-cost learning, you can quickly understand the product form of Serverless.

Secondly, if you want to deploy your own set of open source frameworks, you need machine resources, and cloud service providers usually provide free quotas. Taking function computation as an example, cloud service providers such as Baidu Intelligent Cloud, Alibaba Cloud, and Huawei Cloud provide approximately 1 million monthly quota for function invocations, as well as 400,000 GB-seconds of memory resource usage, which is enough for us to experience. Tencent Cloud has made some adjustments and provides a certain amount of free quota for new users in the first three months. Of course, when you use it, it is best to read the usage instructions of the cloud service provider in advance to avoid spending unnecessary money.

Finally, you can also easily understand the entire Serverless ecosystem and solutions through various APIs/SDKs, trigger integrations, and development tools provided by cloud service providers. This is also what we often say, “When encountering new things, first see how others are doing it,” and it is also a shortcut to learning.

After understanding the general experience method and process, now let’s take a look at how function computation is used from a user’s perspective.

From a User’s Perspective: The Lifecycle of Function Computation #

The entire lifecycle of a function goes through five processes: “Development Settings,” “Packaging and Uploading,” “Event Binding and Triggering,” “Elastic Execution,” and “Instance Termination.” The following image shows the process of function computation from a user’s perspective. Let’s experience it together.

First, you need to write the corresponding function code. We will take Python 3.6 as an example and write a “Hello Serverless” demo. Select Python 3.6 as the runtime, select 128MB for the execution memory, set the timeout to 3s, and set the concurrency to 1. The image below shows an overview of the function information after it is created.

Then, we click on “Edit Function” to enter the online editing mode and write the demo case code. You can also download it locally for code development.

# -*- coding: utf-8 -*-
def handler(event, context):
    return "Hello Serverless"

After completing the code, you still need to specify the entry point for function execution. For example, filling in “index.handler” means calling the handler method defined in the main program file index.py. When a function is triggered, execution will start from the handler method.

The second step is to upload the code to the Function Compute platform. You can directly submit and save your code in the interface, or you can package the code into a zip file. The uploading methods include Function Compute API/SDK, frontend interface upload, and command-line tool CLI upload.

The third step is to execute the function you just uploaded. Generally, you can use the API/SDK to call it, or manually click on the frontend interface to execute the function. Additionally, you can trigger function execution through various triggers.

You may be unfamiliar with triggers, so let me explain briefly here.

FaaS can connect various upstream and downstream services through event triggers. When the trigger source service sends a request, the function responds by running and processing it. Taking the HTTP trigger as an example, when a user visits the URL of the HTTP trigger, an HTTP processing request will be sent to the specified cloud function, and then the platform will start a function instance to process the request.

Let’s continue with the operation of the “helloServerless” function. Select a HTTP trigger from the platform and set the URL path to “/hello/serverless”. Choose “GET” as the HTTP request method.

After creating the trigger, the Function Compute platform will generate an accessible URL address for you to trigger the function execution. If you are in a production environment, it is best to add authentication to ensure the security and reliability of the service.

The fourth step is that when the function execution is complete, the Function Compute platform will return the execution result of the function. Usually, you can view the function execution result through logs or the information returned by the request.

curl https://$HTTP_TRIGGER_URL/hello/serverless

# Output
Hello Serverless

After going through the above steps, you have completed a small demo of “Hello Serverless”. For users, they only need to focus on the development code itself, without worrying about deploying and maintaining the environment. You may wonder, the execution result is no different from traditional code execution, so why choose FaaS?

In fact, the greatest feature of FaaS lies in its ability to dynamically scale up and down, and scale down to 0. When you don’t invoke the function, there are no instances running and incurring costs. This means that when you create and upload a function, there is no cost until you start generating invocations. When the traffic reaches a threshold, the system automatically scales up. When the traffic decreases, the system automatically scales down.

In addition, most cloud vendors provide a certain amount of free quota for FaaS. If your application is based on event triggers or experiences significant fluctuations in traffic, then FaaS is definitely the wise choice.

Viewing the Execution Process of Function Compute from the Platform Perspective #

Earlier, we learned about the lifecycle of Function Compute from the perspective of developers, but I believe you must be more than satisfied with just surface-level usage. So, how is Function Compute actually implemented inside?

Simply put, when an event request arrives, it first goes to the routing service, which checks in the cache to see if there are ready instances. If there are ready instances, meaning a hot start, the function can be executed directly using that instance. If there are no ready instances, it enters the cold start process. The Function Compute engine starts the initialization process of the container, doing some preparation work: downloading the function’s code package or image, preparing the network environment, loading the runtime, and then executing the function. The instance information is then stored in the cache so that for subsequent requests, it will enter the hot start process.

After execution is complete, the instance is kept for a certain amount of time (usually 1-2 minutes) before it is recycled.

The above describes the normal execution flow. When traffic suddenly increases to a certain threshold, the Function Compute service quickly scales up instances to meet the increased concurrency. When there are too many idle instances, instances are scaled down.

At this point, you may have questions; many new terms seem unfamiliar, such as cold start, hot start, runtime, etc. Below, I will explain the lifecycle of Function Compute from the perspectives of development time and runtime, which will help you understand their meanings.

Development Mode of Function Computing #

After we upload the code to the FaaS platform, the backend service will save the code package to the object storage and store the function-related information, such as the function code link, runtime information, memory size, timeout, etc.

When we modify the function-related information again or write the function code online, the FaaS platform will retrieve the stored code and related information and display them on the interface for you to modify.

It is worth noting that currently major cloud vendors only support online compilation and debugging of interpreted languages. For compiled languages, you still need to download them to your local environment for development. Fortunately, some cloud vendors (like Alibaba Cloud) have released tools that allow for debugging in a local and cloud environment, which to some extent facilitates quick local development, debugging, and deployment. In the following chapters, I will also provide detailed introductions on the implementation of this technology.

Runtime of Function Compute #

After you upload the code, how does the FaaS platform execute the function code? Let’s continue using “Hello Serverless” as an example to explain.

At the beginning, we created a function, wrote the corresponding code, and saved it on the cloud platform.

Let’s explain the execution process using the HTTP trigger as an example. When an event request accesses the URL of the trigger, the request is routed to the relevant function instance, based on whether it is the first request or not. It can be divided into two situations: cold start and warm start. Also, based on the amount of traffic, dynamic scaling will be performed.

I have abstracted this process into the following Function Compute architecture diagram. Through this diagram, let’s take a look at how the “Hello Serverless” function is executed.

Traffic Forwarding: Cold Start and Warm Start #

First, when an HTTP event request arrives, the traffic forwarding service is responsible for receiving and forwarding the request, which is the Route service in the diagram. When Route receives the request, it first checks in its cache to see if there is already corresponding information about the Hello Serverless function and instance.

If it exists, the request directly retrieves the execution instance from the instance pool, based on the stored information. At this time, the request is executed in a warm start manner. What exactly is a warm start? It means that after your function is executed, the container instance will be kept for 1-2 minutes. If a function execution is triggered during this time, no new instance or execution of the function runtime mount is needed, and it can be directly reused. Therefore, the response speed is much faster.

What if the relevant information cannot be found? Then, a component similar to an activator (such as Activator) is used to create and request an instance, execute the current request, and then store the relevant information about the instance in the cache. This is the process of cold start.

What operations are performed during the cold start process? Generally, it includes instance scheduling and container creation, code download and decompression, preparation of function execution environment, mounting of user code, VPC network preparation, initialization of runtime and user code. After this series of processes is completed, the function starts to execute. Therefore, the time consumed by cold start is affected by many factors, mainly including:

Different languages have different cold start durations: Generally, GoLang and Python are faster, while Java is relatively slower.
Code package size: The process of downloading and decompressing code is time-consuming during cold start. The larger the code size, the more time it takes.
Container creation speed and VPC network preparation: The time consumed by this process often depends on the cloud service provider, and the speed may vary among different platforms.

Of course, cloud vendors are continuously optimizing cold starts and have introduced methods such as reserved instances, caching to accelerate code download, VPC proxies, and IP tunneling to solve cold start problems. You can also solve this problem on your own by:

Reducing the code package size and eliminating unnecessary configurations and dependencies.
Using warm-up requests to ensure that code instances reside in the container pool, such as triggering a code instance to quickly respond with an empty response through a timer.
Choosing languages with shorter cold start durations as much as possible, avoiding the use of slower starting languages and runtimes like Java.
Choosing larger memory as much as possible: The larger the memory of the function, the faster the cold start. In the next chapter on cold starts, I will discuss the technical points and practical experiences of this aspect in detail.

Dynamic Scaling #

When do we need to scale up or down? When we first request through the HTTP trigger, since there is no pre-loaded pod in the function instance pool, a container scaling process from 0 to 1 is required.

At this time, the package of Hello Serverless needs to be loaded from the object storage into the container and run. After the execution result is returned, in general, the FaaS platform keeps the function instance for a period of time before destroying it. If there are subsequent requests during the retained period, they can be directly reused without scaling. However, if the concurrency at this time exceeds the 1 request set before, the function compute engine will automatically scale up after monitoring the relevant metrics.

Of course, the scenario I mentioned is relatively extreme. Generally, the function compute engine will scale in advance based on the set monitoring threshold.

The scaling algorithm includes scaling at the node level and the pod level. The node and pod generally monitor custom metrics, and if there is a change in the metrics, scaling operations will be performed accordingly. For example, in the HPA scaling algorithm of Kubernetes, a monitoring component called metrics-server is installed to provide the capability of monitoring HPA and underlying resources. CPU and memory metrics are monitored to ensure they stay within a manageable range. Here is a little teaser: can the scaling of function compute be done directly through Kubernetes HPA without any modification? You can think about it first, and we will discuss it in detail in the section on scaling.

When it comes to node-level scaling, the number of nodes is usually determined based on the overall utilization of the nodes. When scaling is needed, a scaling request is sent to the scheduler, which calls the relevant interfaces to carry out the scaling operation.

Function Instance Termination #

Finally, I want to mention that every run has an end. When a function is executed and there is no further execution within 1-2 minutes, the FaaS platform will recycle that instance.

The reclamation time varies among different cloud providers, so it’s important to note this for future development based on cloud platforms. This way, we can optimize the functions in advance to ensure that requests are executed in hot scenarios.

Runtime #

By going through the previous steps, you now have a function instance to execute the function you wrote, “Hello Serverless”. Now let’s take a closer look at the key foundation of function execution: the runtime. The runtime provides a running framework for functions and actually executes the functions.

Cloud providers usually package the execution environments for different languages as base images. Container images consist of multiple layers of images, with the first layer being a base image of file systems like Ubuntu or Alpine, and the second layer being the dependencies of the code, such as Python requiring the pip library and Node.js requiring the npm library. Some function compute engines even support running Docker images directly.

Let’s use Python 3 as an example to illustrate the execution process of the runtime for “Hello Serverless”. Python 3 runtime usually exposes a handler interface for developers to implement the specific business logic. When a request arrives, the Python runtime dynamically loads the file and calls the method you defined earlier.

It’s worth noting that for compiled languages, you need to import the code libraries provided by the FaaS platform and develop business logic code based on a ready-to-use framework. The idea is the same, just the way of running is different.

I will discuss the specific implementation process with you in more detail in the runtime section.

Summary #

Finally, let’s summarize what we have learned in today’s lesson on the workings of function compute from different perspectives.

From the user’s perspective, through the hands-on experience with the four steps, we can understand that as a business developer, you need to pay attention to the development settings, packaging and uploading, event binding and triggering, and pay-as-you-go after the function goes online. The series of actions performed by the function compute engine, such as how functions are executed, scaled, and terminated, can be left to platform operators to focus on.

From the platform’s perspective, based on the development state, we can clearly see that the function compute control plane’s job is to provide business personnel with a good operational platform. Based on the runtime state, we have gained some understanding of the collaboration among various stages of the runtime.

On the one hand, when an event is triggered for the first time, the function compute platform goes through a scaling process from 0 to 1. As the traffic increases, the platform continues to scale to ensure the normal execution of requests. As the requests decrease, the platform scales down by releasing instances.

On the other hand, the runtime that actually supports function execution also has different implementation methods, mainly based on the different characteristics of the languages themselves.

Through today’s introduction, I believe you have gained an overall understanding of function compute and have grasped the essence of this field.

In the upcoming lessons, I will discuss these technical points mentioned in this section in detail with you.

Thought Exercise #

Well, that’s the end of this lesson. Finally, I have a question for you.

FaaS improves the productivity of developers and enables products to be quickly launched into the market for trial and error. Have you come into contact with FaaS? Which businesses are already using it? Have you encountered any problems?

Feel free to write down your thoughts and answers in the comments section, and let’s exchange ideas. Thank you for reading, and please feel free to share this article with more friends for discussion and progress.