18 Practical Guide Experience From a Serverless Sandbox Veteran

18 Practical Guide Experience from a Serverless Sandbox Veteran #

Hello, I’m Jingyuan.

Congratulations on completing the study of the core technologies and extension capabilities of Serverless. From this section onwards, we will apply the knowledge we have learned and engage in practical exercises.

By now, you should have gained a good understanding of trigger mechanisms, cold starts, scaling, runtimes, observability, and other related topics. You should be able to quickly respond to any relevant questions that may arise. However, how can we effectively apply these fragmented “martial arts moves” in real-world scenarios? The development of Serverless architecture differs greatly from that of traditional architecture applications. What are the pitfalls we need to pay special attention to?

Today, I will share with you the experiences I have gained through learning and the “common challenges faced by clients”. I will divide these experiences into several aspects, including solution design, resource assessment, invocation speed, development techniques, and online operations. I will discuss how to make the most of Serverless.

I hope that this section will inspire you in choosing appropriate solutions and development techniques, and truly enable you to enjoy the benefits brought by Serverless.

Solution Selection #

When selecting a solution, it is important to avoid making the wrong choices in terms of scenarios and technology direction. Making the wrong choices can not only lead to unstable services but also increase the amount of repair work. So, when is it suitable to use Serverless technology?

Scenarios that are more suitable for Serverless technology usually consist of the following five dimensions: data processing and computation, application backend services, event-triggered processing, stateful services, and traditional service upgrading.

Next, we can summarize the characteristics of each business scenario and examine the different focuses of the five dimensions. This will help you enrich your own technological selection framework in the ever-evolving Serverless landscape.

Data processing and computation scenarios

If you have high-volume data without significant peak and valley characteristics, you can use technologies such as Spark and Flink for processing. Serverless “extreme elasticity” only has room to play when there are significant peak and valley changes. In other words, Serverless is suitable for data scenarios with large variations in traffic and dynamic resource allocation requirements.

Event-triggered scenarios

This is a field that has been widely used since the birth of FaaS. Its characteristic is “unstable traffic” - triggering when there is traffic and scaling down to zero when there is none. If your business scenario is relatively lightweight and traffic is unstable, then “lean cost” (on-demand usage and pay-as-you-go) Serverless is your best choice.

Application backend services

Different from traditional microservices, application backend services are more complementary to microservices and are more often used for relatively independent and simple architectural business applications. For example, scenarios like mini programs, H5 event pages, and BFF (Backend For Frontend) applications demonstrate the “rapid delivery” feature of Serverless.

I have differentiated these three scenarios with blue color. Both functional programming and custom image writing can support FaaS technologies. The yellow part represents the extension of Serverless capabilities, especially with the development of technological foundations and products in recent years, Serverless can support even more scenarios. Let’s continue.

Stateful scenarios

Stateful scenarios break the previous narrow definition of Serverless as being stateless, enabling interactions with long processes, multitasking, and stateful requirements to be completed under Serverless.

Traditional service upgrading

When we encounter heavy historical baggage, such as integrating frameworks like Spring Boot, Spring Cloud, Dubbo, etc., we can consider transitioning to a Serverless service hosting mode. Additionally, if you wish to eliminate the operational and capacity planning costs of Kubernetes, you can consider the container transformation of Serverless, which can also be categorized under this type of scenario.

Apart from the five dimensions on this chart, let’s think about which situations are not recommended for Serverless technology.

All the scenarios we previously mentioned possess the characteristics of “high operational thresholds requiring free operation and full management,” “large traffic variations requiring elastic scaling,” “lightweight applications requiring rapid delivery,” and “uncertain traffic requiring scalability to zero and pay-as-you-go usage.” Therefore, the opposite cases such as scenarios sensitive to delays, with extremely high SLA levels, such as web search or bank transactions, are not recommended for Serverless technology.

Here, I will also share a tip with you: When you first encounter Serverless, you can refer to the case studies on the cloud service provider’s official website, submit work orders, and communicate with their pre-sales engineers. This can help you make better selections and effectively use their products. Although I have already provided many first-hand case studies and experiences in the previous courses, the development of Serverless is rapidly changing, and staying abreast of the official website is one of the best choices.

Resource Evaluation #

After the solution is determined, we need to perform resource evaluation. Although Serverless has the characteristics of on-demand usage and pay-per-use, this does not mean that the cost of Serverless architecture is lower than traditional services.

You can try to evaluate according to these two steps:

First, check if there are obvious idle periods or significant “peak-valley” phenomena in your service’s daily operation.

Second, after considering the first point, check if you have selected appropriate resource specifications to calculate and compare the costs of different solutions.

These two points apply to all products that use Serverless technology. Let’s take Function Compute as an example. Its charges are calculated based on the number of invocations, resource usage duration, and outbound traffic. From the pricing tables of various cloud vendors, we can see that the main cost comes from resource usage duration, i.e., “Resource Usage Duration = Memory Size * Runtime”.

From the formula, we can see that the cost of resource usage duration is directly proportional to the memory size and runtime. In the resource evaluation stage, let’s first discuss the issue of memory size.

I once encountered a client whose number of invocations was approximately 500,000 per day, with each execution taking about 3 seconds, but the cost was still high.

Through platform monitoring, we found that to process a small amount of data, he prepared 1GB of memory, which was only used by less than 1,000 invocations.

What are your thoughts after hearing this? Don’t worry, let’s first look at the cost under these two memory sizes. I will calculate it based on Alibaba Cloud Function Compute active instance resource prices as follows:

We can see that the cost differs by 4 times under different memory resources. Therefore, our suggestion to the client at that time was to check the file size in the first function, Func1. If it exceeds the capacity of 256MB, it should be asynchronously sent to the next function, Func2, which is configured with 1GB, so that his business processing can be completed at a lower cost.

Let’s take a look at the cost comparison:

Therefore, when performing resource evaluation, it is also crucial to split functions reasonably based on the situation.

Call Speed #

The second key factor in the formula we just discussed is “running time”, which is related to call speed. What factors affect call speed? Or in other words, what is causing the slowdown in call speed?

Based on my experience working with clients, there are several common points that are often overlooked in code development, especially in the Serverless development model.

Large code packages and unnecessary dependencies : This is particularly common among Java developers. It is important to use exclude to remove irrelevant dependencies.
Runtime impact : One of my clients encountered occasional timeout errors due to using Java, which resulted in longer cold start times. After switching to Python, the problem was resolved. Although Java’s runtime has been optimized by cloud providers such as Alibaba Cloud, as we compared in a previous section, Java’s runtime is still slower than Golang both during cold start and invocation stages.
Slow response from downstream services : It is important to remember that functions are billed based on usage and duration. Apart from that, if the downstream service response is too slow, it will result in timeout errors. This not only wastes costs but also provides a poor service experience.
Code practices : For example, introducing unnecessary frameworks can speed up development, but it will also affect service speed. Additionally, unnecessary loops and the use of the Sleep keyword in the code should be carefully considered when designing function code architecture and logic.

To overcome these issues, two things need to be done: optimizing function features and optimizing coordination between functions. So, as a user of a Serverless platform, what can you do specifically?

Regarding function feature optimization, you can do the following five things:

Minimize unnecessary code dependencies and optimize code size to improve code download and loading speed.
Choose high-performance runtimes to improve function startup speed, such as Golang.
Properly use local caching to cache large data in mounted paths when necessary.
Reserve instances. As we know, Serverless instances are scaled down when there is no incoming traffic. The next time there is traffic, it will go through the cold start process. You can reserve and dynamically load some instances for functions that are sensitive to delay within the acceptable cost range.
Request pre-warming. You can configure a scheduled trigger to proactively invoke functions for pre-warming. The timing interval can be flexibly configured based on the cloud provider’s instance recycling time.
Development mode. Try to ensure that your code solves the processing logic in one layer and avoid nesting layers, loops, and pauses, which are common habits in microservices or scripts.

To optimize coordination between functions, you can do the following to improve call speed:

Use asynchronous calls whenever possible, especially for cascading function calls, to avoid the time delay caused by synchronous calls to downstream services. Therefore, it is important to carefully evaluate when selecting the appropriate approach.
Have one function do one thing, which means splitting functions into reasonable granularity based on business requirements. This is necessary considering cost, reusability, and performance.

What benefits can such optimization bring? I believe that you, who have studied cold start, can immediately answer this: it improves the service experience. If you are an HTTP website and your service response is slow, the higher the likelihood of user churn. This indirectly affects page visits and traffic, ultimately impacting revenue.

Development Tips #

Next, let’s discuss the development stage. I will start by discussing the selection of development tools and coding skills, which are the two most important aspects of our work.

Selection of Tools #

Let’s first talk about the selection of tools. Cloud vendors usually provide various convenient tools to support project development, debugging, packaging, deployment, and operation and maintenance, enabling the management of the entire project lifecycle. Examples include Alibaba Cloud’s Serverless Devs, Tencent Cloud’s Serverless Framework CLI, and Baidu Smart Cloud’s BSAM.

Development tools are generally divided into three categories: local development tools CLI, WebIDE, and IDE plugin integration. Among them, WebIDE is more suitable for supporting lightweight online development management, while IDE plugins are more suitable for technical personnel who use mature IDEs, such as developers using VSCode.

So how should we choose these tools?

If your business is relatively simple and you are using an interpreted language, WebIDE is the preferred choice. You can review the advantages and benefits of WebIDE in the 11th lesson to further deepen your memory.
If your business is more complex or you are using a compiled language, I recommend using local CLI tools or plugin integration.

The difference between these types of tools lies in the fact that if you prefer a better interactive experience and find it more convenient to use IDE tools like VSCode, then binding the plugin to the editor is a good choice. As for tools like Serverless Devs, if you are a command line enthusiast with a geek spirit, you can use them, as the rich commands are enough to support your project management work.

Coding Skills #

Now that we have the development tools, what should we pay attention to when coding? On one hand, we need to consider the idea of resource initialization and reuse. On the other hand, we need to pay attention to the invocation and configuration details of function business. Let’s look at an example.

As I mentioned in the section on runtime, functions built on the standard runtime are actually running based on a code framework. The platform provides an entry function called the handler, where you can directly write your business code logic. However, if the function needs to interact with resources, you need to establish a connection before performing storage operations.

If we write the following code to connect to the database in the handler function, can you think of any problems?

Won’t this code be executed every time a request comes in? If this is an occasionally triggered function, you may not see any problems. However, in high-concurrency service scenarios, it will greatly affect performance and may even cause the database’s connection handle to exceed the limit.

import pymysql
def handler(event, context):
  db =pymysql.connect(
               host=os.Getenv("MYSQL_HOST"),
               port=os.Getenv("MYSQL_PORT"),
               user=os.getenv("MYSQL_USER"),
               passwd=os.getenv("MYSQL_PASSWORD"),
               charset='utf8',
               db=os.getenv("MYSQL_DBNAME")
  )
  cursor = db.cursor()

The correct approach is to extract this code into a separate function and call it once separately. This way, before the function instance is recycled, if there is a new request, it can be directly reused.

This solution not only utilizes the instance reuse technology of the FaaS platform but also reuses the resource object based on instance reuse. If you want to further optimize, you can also check whether the database object “db” is null when assigning and retrieving it, making the code more robust. Similarly, the initialization of middleware resource objects such as kafkaClient and redisClient can be done in the same way.

Resource initialization is a crucial issue. If handled improperly, it may cause problems such as exceeding the connection limit of resource services. In addition to this, based on common business logic issues encountered by customers, I have selected the top 5 commonly used and efficient techniques for you to use flexibly.

First, minimize the number of logs when using print statements. If you are using a cloud platform, directly printing the entire structure like “event” may exceed the log limit. Therefore, print only the important business fields.

Second, use asynchronous strategies instead of asynchronous frameworks in microservice architectures. If you are using Python’s Tornado framework in a native microservice, you may find it difficult to leverage its asynchronous capabilities in the form of FaaS. In this case, you can directly use the asynchronous processing framework provided by the FaaS platform, as mentioned in the third lesson on advanced applications.

Third, set the timeout of the downstream service interface to be less than the function’s timeout. Otherwise, if the downstream service has poor performance, it may cause a request to time out. This not only wastes cost but also makes the service unstable.

Furthermore, move common code to layers. This is one of the essential advanced skills that I mentioned in the section on advanced properties, similar to the common libraries we usually use, which can make the specific function code package lighter.

Finally, set the upper limit of concurrency in advance. Cloud vendors usually set default instance concurrency limits, such as Alibaba Cloud’s Function Compute (FC) limit of 300 * InstanceConcurrency, and Baidu Smart Cloud’s CFC limit of 100 for a single instance. This is insufficient for a production application with relatively high concurrency. You need to set the limit in advance and apply for an increased concurrency.

Online Operations and Maintenance #

After developing the code and deploying it online, do you often assume that Serverless means “maintenance-free”? Although we have repeatedly emphasized the maintenance-free features of Serverless, we still need to pay attention to the stability of the deployed service, access metrics, and observability. Let’s focus on stability and observability.

In terms of stability, the asynchronous invocation strategy set by cloud providers ensures that failed requests are retried, and reserved instances ensure timely response to traffic spikes.

In addition, I encountered a simple method from a client that is worth learning from. This client is an online website that uses polling and simulated requests to perform active health checks on important interfaces. When a problem occurs continuously for N times, an alert is triggered and a switch to a backup cluster or region is initiated.

You will find that in the case where Serverless cloud providers are not yet fully mature, as a developer, you can build a layer of “maintenance insurance” on top of “maintenance-free” to ensure service stability. This approach may not be the most optimal, but it is indeed practical. After our exchanges, we also found that it has significant effects on guaranteeing service stability and customer experience.

In terms of observability, besides the discussion on metrics, logs, and traces that I mentioned in the observability section, based on my experience, I also recommend leveraging the associated support services provided by cloud providers. Their monitoring dashboards are sufficient to support your commonly used metrics, and you can also conduct a more comprehensive diagnosis of your service through the associated alarm services and trace tracking services.

Summary #

Let’s recap what we have learned. Today I have introduced to you some practical experience in Serverless applications, including solution selection, resource evaluation, invocation speed, development techniques, and operation and maintenance throughout the complete lifecycle.

You may have noticed that in this lesson, I have referred to some topics covered in Modules 1 and 2. For example, in the section on invocation speed, you can refer to the lessons on Cold Start and Function Invocation. In terms of development techniques, you can focus on the lessons on WebIDE and Advanced Applications. For operation and maintenance, you can pay more attention to observable content.

This lesson may not be able to solve all the problems you encounter in practical applications, but it can certainly serve as your knowledge map, helping you identify the knowledge points that need continuous improvement behind each confusion. I hope it can connect the technical details we have discussed and form your own knowledge network before you start the upcoming advanced practical courses.

As Serverless continues to evolve, many future products will definitely transform into the form of Serverless. In your future work, you will encounter more business transformation demands and challenges. I hope you can find better solutions one by one and share your methods with more Serverless enthusiasts.

In the following practical courses, we will practice the 5 core scenarios of Serverless technology together: connecting cloud ecosystems, cross-platform development, migration of traditional services, private deployment and selection of engine platforms, and building platforms based on engines. Are you ready?

Reflection Questions #

Alright, this class is coming to an end, and I’ve prepared a reflection question for you.

What are some of the “pitfalls” you’ve encountered while using Serverless, or what doubts do you have when it comes to choosing the right technology?

Feel free to share your thoughts and answers in the comments section, and let’s have a discussion together.

Thank you for reading, and please feel free to share this class with more friends to read along.