19 Advanced Practical I How Powerful Exactly Is the Serverless Connector

19 Advanced Practical I How powerful exactly is the Serverless Connector #

Hello, I am Jingyuan.

In the previous lesson on conceptual building, I mentioned that the basic components of Serverless are “FaaS + BaaS”. FaaS, due to its lightweight, flexible, open input and output, as well as the integration of numerous cloud ecosystem capabilities, is rightfully known as the “connector” of Serverless.

So, how powerful is its capability? And how does it connect with cloud products to provide strong Serverless capabilities?

In today’s lesson, I will explain the ways and combinations of connection, and gradually introduce you to different usage scenarios of the “connector” through the ETL (Extract, Transform, Load) scenario.

Rediscovering “Connectors” #

We know that the business solutions provided by any single cloud product are limited, and the ability to “interconnect and communicate” within the cloud service ecosystem is a key factor in migrating business to the cloud.

Based on the implementation methods, we can categorize the connection capabilities of Function Compute into three types: trigger connection capabilities, accessing VPC resources capabilities, and accessing public network services capabilities. Function Compute can associate with various products and services in the cloud through these connection capabilities, creating a complete Serverless ecosystem.

Image

Trigger Connection Capabilities #

When it comes to triggers, we are not unfamiliar. Triggers provide an event-driven computing model and act as a bridge to connect event source services with functions. We have already discussed the execution process and implementation principles of triggers in detail in Lesson 2. So, do you know in which scenarios trigger connection capabilities can be applied?

Different from the classification based on integration methods in the section on triggers, from the perspective of scenario applications and the linkage degree of triggers, we can divide them into three categories: cloud service integrated triggers, native triggers, and custom triggers.

First, let’s look at cloud service integrated triggers. As the name suggests, it is a scenario that requires the linkage of other cloud services to enable “connection” capabilities. For example, the familiar Object Storage trigger can be used for audio and video transcoding. When an audio or video file is uploaded, a trigger can be used to invoke a function to process it. The message queue trigger can be used for asynchronous message notifications. When a change is detected in the message queue data, it can also be consumed by a function.

This method is either the connection capability of Function Compute that integrates cloud services or the connection capability of cloud services that integrate Function Compute. Usually, it can be pre-integrated through SDK/API and mutually linked according to the protocols agreed between the services (such as event-driven and callback methods).

So what about native triggers? Here, it refers to a type of triggers that can be implemented without the help of other cloud services and can be realized on its own. For example, the timed trigger only needs to be configured with a specific time point, which allows the cloud function to implement tasks such as timed email notifications and timed weather forecasts.

In terms of linkage and scenario usage, native triggers can also include HTTP triggers, but I tend to see them as an extended capability to access cloud functions. You can use the URL generated by the HTTP trigger as an access address exposed by a Web service.

Lastly, to provide users with more possibilities, most cloud vendors usually also provide the capability of custom triggers, allowing users to access events from any cloud service, such as connecting to self-developed resource storage or services through custom triggers. This greatly enriches the usage scenarios of function computing.

Cross-VPC Connection Capabilities #

Another manifestation of Function Compute’s connection capabilities is cross-VPC access. By establishing connections through VPC, functions can access users’ private network resources in a more secure manner.

For example, by enabling cross-VPC access, functions can access cloud databases and message queues, thereby implementing more complex background processing logic. In addition, by establishing cross-VPC connections in advance, distributed file systems can be directly mounted on function instances, enabling cloud functions to handle large files.

As for the significance of proposing VPC and cross-VPC and how to optimize the access speed of functions in cross-VPC scenarios, you can review Lesson 10 to deepen your memory.

Public Network Access Capabilities #

Some cloud services expose their public network access interfaces, and these services do not require any specific configurations for cloud functions to access. However, for security reasons, authentication and identity verification mechanisms also need to be set up.

The Object Storage trigger, which we mentioned the most in previous lessons, can be accessed directly without configuring VPC. Also, some artificial intelligence scenarios, such as natural language processing and image recognition, can be invoked by directly accessing the public network interface.

It is worth noting that the service scenarios accessed by VPC and through the public network are overlapping, and this depends on which access method the service itself supports, such as Alibaba Cloud’s Message Queue Kafka that supports both public network and VPC access methods.

Combination of Connectors #

Whether it is event-driven or actively requested, the above-mentioned connectivity capabilities are not completely isolated. In practical applications, a complete solution is often achieved through different combinations.

Image

Let’s give two examples.

Do you still remember the “Video Transcoding through Workflow” we talked about in Lesson 16 Hands-on Experience? This problem can be solved by combining the trigger and public network access capabilities to associate workflows, object storage, and cloud functions. When one or more videos are uploaded to the object storage, the trigger can be triggered, and the function can directly call the workflow for batch processing. Then, according to the logic of categorization and transcoding, the videos can be transcoded. Finally, the interface of the object storage is synchronously called to write back the transcoded files.

Image

Also, the web applications or mini-program backends that we develop in daily life can also be deployed on cloud functions. By using VPC, functions can have the ability to access cloud databases, and when combined with HTTP triggers or API gateways, users can easily access them through the exposed links.

I believe that now you have understood the power of connectors. Next, I will take you through a practical ETL scenario to help you truly feel the power of connectors.

ETL Practice #

In many edge computing scenarios, log information often cannot be reported in real time, so it is usually done by uploading log files on a regular basis for subsequent log processing and analysis.

For example, in our most common car logs, a large number of car terminals continuously generate driving logs. Real-time analysis of this driving information is very difficult, so it is usually done by collecting a certain amount and reporting it as a file. This not only improves the efficiency of log transmission, but also allows the entire log file to be temporarily stored for convenient traceability in the future.

Common Combinations of Connectors #

For this processing process, my customers often use object storage triggers and message queues to implement the input and output of function computing. Below is a diagram illustrating the complete flow that I provided. Next, I will use relevant Alibaba Cloud products to perform experimental operations.

Image

First, to receive uploaded log files, we need to create a Bucket in Object Storage Service (OSS) in advance. At the same time, we also need to create an instance on the Message Queue (Kafka) service. For details, you can refer to this Quick Start document.

Next, in Function Compute (FC), I create a cloud function that processes log information. Do you remember the function template we used before? Here, we can also choose the Python 3.6 runtime environment and select the template “Deliver Messages to Kafka” to create the function. As for the part that interacts with OSS, you can refer to the OSS Operation Template.

Image

Next, we need to configure the address and topic of Kafka access for it. Since the template has already declared them in the environment variables, we can directly configure them in the function’s environment variables.

Image

Kafka service provides two access methods: public network and VPC. If you access it through the public network, using the SSL protocol requires downloading certificates, which can be more cumbersome. Here, I recommend using VPC for access to facilitate your quick experimentation.

Image

Note that VPC access requires additional configuration of the VPC under the service where the function is located, so that the function service and the Kafka instance are in the same VPC. For specific configuration, you can refer to the following image:

Image

After creating the function, you also need to bind an OSS trigger and associate it with the bucket created just now. You can set the prefix and suffix according to your own business needs. You have experienced the configuration of triggers before, so I won’t repeat it here.

Finally, let’s take a look at the main code processing logic based on our design idea:

from confluent_kafka import Producer
import logging
import sys
import os
import oss2, json

logger = logging.getLogger()

def initialize(context):
    global p, TOPIC_NAME

    """ Get the environment variables """
    BOOTSTRAP_SERVERS = os.getenv("BOOTSTRAP_SERVERS")
    TOPIC_NAME = os.getenv("TOPIC_NAME")

    p = Producer({'bootstrap.servers': BOOTSTRAP_SERVERS})


def deliveryReport(err, msg):
    """ Called once for each message produced to indicate delivery result.
        Triggered by poll() or flush(). """
    if err is not None:
        logger.info('Message delivery failed: {}'.format(err))
        raise Exception('Message delivery failed: {}'.format(err))
    else:
        logger.info('Message delivered to {} [{}]'.format(msg.topic(), msg.partition()))


def handler(event, context):

    creds = context.credentials

    # Setup auth, required by OSS sdk
    auth=oss2.StsAuth(
        creds.access_key_id,
        creds.access_key_secret,
        creds.security_token)

    # Load event content
    oss_raw_data = json.loads(event)
    # Get oss event related parameters passed by oss trigger
    oss_info_map = oss_raw_data['events'][0]['oss']
    # Get oss bucket name
    bucket_name = oss_info_map['bucket']['name']
    # Set oss service endpoint
    endpoint = 'oss-' +  oss_raw_data['events'][0]['region'] + '-internal.aliyuncs.com'
    # Initiate oss client
    bucket = oss2.Bucket(auth, endpoint, bucket_name)
object_name = oss_info_map['object']['key']

# Download the log file for local processing
bucket.get_object_to_file(object_name, "/tmp/tmp.log")

# Read line by line
f = open("/tmp/tmp.log", 'rb')
line = f.readline()
while line:
    content = json.loads(line)
    info = {}
    
    # Extract key information
    info['speed'] = content['speed']
    info['carID'] = content['carID']
    
    info_str = json.dumps(info)
    
    line = f.readline()
    
    # Send to Kafka
    p.produce(TOPIC_NAME, str(info_str).encode('utf-8'), callback=deliveryReport)
    p.poll(0)
    
    """ Flush the internal queue, wait for message deliveries before returning """
    p.flush()
f.close()

return {}

This code has three main points: resource initialization, data input, and result output.

First, let’s look at the resource initialization part. In the initialize function, global initialization operations are performed, mainly related to Kafka information. This is why I recommend using a template because the template already includes the Kafka dependency libraries.

Next is how to get the data, process it, and output it. Let’s look at the logic of the handler. Because the log file needs to be downloaded for processing when a log is uploaded, this step is mainly to initialize an OSS client.

Finally, using the created client and the object information in the event, we can call the client’s interface to download the log file to the local machine. Once we have the log file, the processing is simple. You only need to iterate through the file content, extract the key information, and then push it to Kafka. For example, in my code, I extracted the speed and carID, and then called p.produce to send them to Kafka for downstream services to process.

You might notice that the p.produce function has a callback parameter because deliveryReport is a callback function that is executed after sending a message to Kafka and outputs a prompt based on the sending result.

After that, we can test if the configuration and code mentioned above can run correctly. I have already simulated a log file here, which contains some basic information about the speed and car traveling. I named it driving.log and uploaded it to OSS.

{"speed":12,"carID":1234,"manufacturer":"BenZ","zone":"China","distance":102,"endAt":"Beijing","startAt":"Shanghai"}
{"speed":24,"carID":3512,"manufacturer":"BenZ","zone":"China","distance":102,"endAt":"Beijing","startAt":"Shanghai"}

After the upload is completed, we can observe the execution status of the function, and it can be seen that the execution has been triggered.

Image

Let’s also take a look at the Kafka monitoring, and it can be found that there is a certain amount of data being written.

Image

Following the above steps, you have already implemented the basic function of log processing. If you think it is not perfect enough, you can further optimize the code structure and robustness for use in your production environment.

Extensibility of Connectors #

So, besides optimizing our code, does Function Compute itself have some extension capabilities to make the system more stable and robust?

We know that log file triggers belong to asynchronous invocation, and asynchronous invocation can be supported in case of exceptions through the configuration of asynchronous policies, ensuring the high reliability of data.

That is to say, we can set a timeout for the function. In the asynchronous policy, we can configure the policy according to the asynchronous execution result, such as the strategy I configured, which is to call the downstream notification function after the execution fails to report the task’s failure. You can also configure it according to your actual needs.

Image

Advanced ETL #

Through the above scenario, we understand how Function Compute can connect to other cloud services to solve log data analysis problems. In the regular version of ETL implementation, once the log file is uploaded, the function can process it through the OSS trigger and send the extracted and transformed log key information to the cloud message queue service. At the same time, the asynchronous policy provides additional retry guarantee for us.

However, sometimes, we only choose object storage for data storage and do not want to occupy additional resources. Is there a way to directly upload logs without using object storage? Yes, in Alibaba Cloud’s “Connector” Function Compute, we can also use the Log Service (SLS) trigger.

Image

The SLS trigger can regularly obtain the incremental information of logs without triggering the ETL task every time the log file is uploaded.

Image

After configuring the trigger interval, the SLS trigger will call the function on time and pass the cursor of the changed log data. This way, you can directly obtain the changed incremental data.

Compared with the OSS trigger, the SLS trigger has a richer access scheme. You can directly collect logs from Alibaba Cloud products, or you can collect logs through open-source software such as Logstash and FluentD, and access data through standard protocols such as HTTP/HTTPS and Promethues.

If the real-time effectiveness of your processing scenario is not sensitive, from a cost-effectiveness perspective, uploading small files to the OSS trigger is also a good choice. You can consider comprehensively according to the actual needs of your business and choose the appropriate solution in the end.

Summary #

In conclusion, in this lesson I introduced to you how Function as a Service (FaaS) plays the role of a Serverless “connector”.

Based on its three major connection capabilities, FaaS forms a new Serverless ecosystem combination of “FaaS + cloud services” through SDK/APIs, event-driven mechanisms, and other methods. As you saw in the ETL practice, by combining object storage triggers and message queues, we were able to easily handle the entire process of log analysis development and deployment in less than five minutes, which would otherwise take several hours.

The connection capability of triggers is like a powerful “attraction tool”. By being integrated with cloud services or integrating cloud services, FaaS collects various “guests” in the cloud, such as message services, file storage, API gateways, and object storage, as shown in the diagram above. The data and other resources brought by these “guests” can be delivered through the two major capabilities of cross-VPC and public network access after computation.

This efficient and fast realization of “input” + “computation” + “output” is exactly what we want to achieve in terms of “cost reduction and efficiency improvement”. The effectiveness of this adhesive is enabling cloud services to generate significant chemical reactions through physical combinations. Once again, the “connector” function of FaaS proves its own value through its characteristics as a “glue language”.

Homework #

Alright, this class has come to an end, and I have left you with a homework assignment.

In this class, I took you through an ETL scenario to experience the essence of Serverless “Connectors.” So, how can we implement the functionality to send alarm emails based on the key performance indicators (KPIs) of logs?

Please feel free to write down your thoughts and answers in the comments section. Let’s exchange and discuss together.

Thank you for reading, and feel free to share this class with more friends to read together.