11 Function Compute Observability

11 Function Compute Observability #

Overview #

What is observability? According to Wikipedia, observability is the measure of how well internal states of a system can be inferred from knowledge of its external outputs.

In application development, observability helps us assess the internal health of a system. It helps us diagnose, troubleshoot, and analyze problems when the system encounters issues. It also helps us evaluate risks and predict potential issues when the system is running smoothly. Evaluating risks is similar to weather forecasting - if it’s predicted to rain tomorrow, you bring an umbrella when you go out. In function compute application development, if you observe that the concurrency of a function keeps increasing, it’s likely due to the business expansion efforts of the business promotion team. To avoid hitting the concurrency limit and triggering flow control, developers need to increase the concurrency in advance.

Observability includes three aspects: Logging, Metrics, and Tracing.

Logging: logs record key information during the execution of a function. This information is discrete and specific. When combined with error logs and function code, it can quickly pinpoint issues.
Metrics: metrics are aggregated data commonly presented in the form of charts. Core metrics such as TPS and error rates in a chart can reflect the operation and health of a function.
Tracing: tracing is request-level tracing, which allows you to see the delay of requests in various modules in a distributed system and analyze performance bottlenecks.

Logging/Metrics/Tracing in Function Compute #

1. Logging #

How do we view function logs in Function Compute? In traditional server development, logs can be recorded in a file on a disk and collected through log collection tools. However, in Function Compute, developers no longer need to maintain servers. So how can they collect logs printed by code?

1) Configure Logs

Function Compute seamlessly integrates with Log Service, which allows you to record function logs in a logstore that you provide. Log configuration is part of service configuration, where you can set the Log Project and Logstore. All logs printed to stdout by all functions in the same service are collected to the corresponding logstore.

2) Record Logs

How do we print logs? Can we collect logs directly printed using console.log/print in code? The answer is yes. Libraries that print logs in various development languages print logs to stdout. For example, console.log() in Node.js, print() in Python, and fmt.Println() in Go. Function Compute collects all logs printed to stdout and uploads them to the logstore.

Function compute is invoked at the request level, where each call corresponds to a request and a request ID. When there is a large volume of requests, there will be a massive number of logs. How can you differentiate which logs belong to which requests? This requires recording the request ID in the logs as well. Function Compute provides built-in log statements that prepend the request ID to every log. This makes it easier to filter logs.

3) View Logs

When function logs are collected to the logstore in Log Service, you can log in to the Log Service console to view the logs.

The Function Compute console also integrates with Log Service, allowing you to view logs on the Function Compute console. There are two query options in the console:

Simple Query: Simple query lists logs corresponding to each request ID and allows you to filter logs by request ID.
Advanced Query: Advanced query is integrated with Log Service and allows you to query logs using SQL statements.

Click the following link to view a demo: https://developer.aliyun.com/lesson202418996

2. Metrics #

Ways to view metrics:

View monitoring metrics in function details: Function Compute provides various system metrics that can be viewed in the Function Compute console without any configuration.
Configure log dashboard: The log dashboard not only allows you to view monitoring metrics provided by Function Compute but also enables you to associate these metrics with developer logs to generate custom monitoring metrics.

3. Tracing #

(Waterfall chart showing the delay of requests in various links)

Tracing is an important part of troubleshooting in distributed systems. Tracing allows you to analyze the delay of requests in various links in a distributed system. There are several scenarios:

Function Compute is part of the entire link. As a result, you can see the delay of requests in Function Compute. This delay includes the time to start the system and the actual execution time of the request, helping you analyze performance bottlenecks.
If Function Compute calls FC SDK, you can see the default call delay of the SDK API.
If developers access databases or other products in function code, you can manually add tracing points in the function to analyze the delay in that part.

Problem Troubleshooting #

Function Compute provides many observability-related features. How do we locate issues exactly? Let’s take a look at a few scenarios.

Scenario 1: After a new version is released, the error rate of the function increases

First, when a version is released, you need to observe various metrics of the function. If the error rate increases, you need to roll back immediately to prevent failures. You can view the function logs to locate the cause of the error and fix the problem before deploying it again.

Scenario 2: Poor function performance, always long execution time, and even timeouts

Enable tracing. Add tracing points in the function at possibly time-consuming points. View the waterfall chart of requests to locate the cause of long execution time and fix the problem.

Scenario 3: The business expands rapidly and the concurrency is about to reach the concurrency limit

View the current concurrency through metrics. If the concurrency keeps increasing, promptly contact Function Compute developers to increase the concurrency.

Recommended Courses #

In order for more developers to enjoy the benefits of serverless, we have brought together more than 10 Alibaba Serverless technical experts to create the most suitable Serverless public course for beginners. This allows you to learn and use serverless easily and embrace the new paradigm of cloud computing - Serverless.