20 Advanced Practical Ii How to Develop a Bot Skill Based on a Smart Speaker

20 Advanced Practical II How to develop a BOT skill based on a smart speaker #

Hello, I am Jingyuan.

In the last class, we experienced the power of FaaS as a serverless “connector,” but it was limited to the experience on a single cloud platform.

Today, I will take you through cross-platform development, using the combination of smart speakers, IoT development platforms, and cloud function computing platforms to develop a BOT skill that recommends baby food recipes for different age groups.

You can also develop conversational applications such as smart customer service and intelligent Q&A using this approach. Through this practical training, I believe you will further experience the infinite possibilities of Serverless.

Implementation #

Our goal this time is to recommend baby recipes through voice conversations. This functionality aligns with the characteristics of Serverless: event-driven, lightweight application, on-demand invocation, and unpredictable traffic. At the same time, this functionality requires the capability to convert speech to text and a physical carrier.

Therefore, we will use Baidu Intelligent Cloud Function Computing Platform and Baidu DuerOS Open Platform for this case development, and choose a Xiaodu X8 as the carrier. Of course, you can also use the simulation testing tool on the DuerOS platform to achieve the same effect.

As the saying goes, sharpening the knife will not delay cutting firewood. Before starting the actual combat, let’s first understand the various concepts involved today and the overall implementation ideas.

What is a Skill (BOT)? #

We usually call the conversational services developed on the DuerOS platform “skills” or “BOTs”. There are generally two types of skills on the platform: built-in skills and skills submitted by developers. The former can be directly used, while the latter can be free or chargeable. Chargeable skills are purchased and used, just like what you see on the Xiaodu speaker.

A skill usually consists of four important parts, all of which will be involved in our hands-on practice.

Intent: Refers to the user request or purpose that the skill needs to fulfill. For example, the intent of this case is to recommend recipes for babies of different ages, composed of “common expressions” and “slot information”.
Dictionary: It is a collection of domain-specific vocabulary and important information in the interaction between users and skills. For example, the keyword “age” that needs to be recognized in this experiment is an entry in a dictionary, corresponding to the English recognition “number”. For the convenience of developers, DuerOS has built-in many entries in the dictionary. For example, here “number” is represented as “sys.number”.
User Expressions: These are specific examples of user expressions when expressing an intent, and they are the key components of forming an intent. The more user expression examples there are, the stronger the intent recognition ability. Slot information is the key information extracted from user expression sentences, and it can match these key information with the entries in the dictionary one by one. For example, “What does a 3-year-old baby eat?” is a user expression, the extracted slot identifier is “number”, the value is 3, and the corresponding dictionary entry is “sys.number”.
Configuration Service: After the skill is created successfully, it needs to be deployed to a cloud service. We just need to configure the entry function identifier BRN for function computing. The BRN can be obtained in the basic information display of the function on the function computing platform. For example, “brn:bce:cfc:bj:*******:function:babaycook:$LATEST”.

Implementation Ideas #

Next, let’s take a look at the diagram below to get a sense of the interaction process between the “Baby Recipe” skill and the device and platform.

As shown in the figure, when the Xiaodu speaker receives voice input, such as “Xiaodu Xiaodu, what can babies eat?”, it sends a request to the DuerOS platform through a predetermined protocol. The DuerOS platform triggers the execution of the function computing CFC through a DuerOS trigger. Then, the processing logic you have written will output and return to the DuerOS platform according to the instructions, and play through the Xiaodu speaker in voice.

The DuerOS trigger used here is a trigger type unique to Baidu CFC platform, used for integration with DuerOS. Of course, you can also customize any trigger.

If there is any storage involved, you can choose to use cloud-based storage resources, such as Redis, RDS, etc. These have been covered in previous courses, and I believe you are already familiar with them.

Next, let’s review the implementation ideas together. I have divided the whole process into three major steps.

Step 1, Function Computing Creation and Customization: You need to select a suitable template in the CFC management console to quickly create an initial DuerOS function, complete the development and configuration of the function, and finally set the triggers and configure log storage.

Step 2, Skill Creation and Binding: Choose a skill template on the DuerOS platform, set the corresponding metadata, set up intents and dictionaries, configure dialogue scene scenarios, etc., according to your needs. When binding the CFC function, copy the BRN of the function entry to the DuerOS platform.

Step 3, Runtime Request Routing: In fact, you don’t need to do anything in this step, just experience it. But if you want to publish it on smart speakers, you also need to publish it in the console. After the publication, you can experience it through the smart speaker.

You can also review the entire process based on the architecture diagram below.

Hands-on Practice #

Now, let’s start developing the BABY_RECIPE recommendation BOT skill based on the ideas mentioned above.

Cloud Function Creation and Definition #

First, we create a function on the Function Compute platform (CFC). For this example, to familiarize yourself with DuerOS processing, you can choose a template to generate the function. For instance, I chose “dueros-bot-tax” for demonstration purposes.

Next, enter the relevant function metadata. In this case, the function name is “baby_recipe”.

Let’s not worry about the template code for now and continue creating the trigger. Here, you need to select the DuerOS trigger to connect to the Xiaodu speaker’s requests.

Great! With this, the function for processing Xiaodu is done. Now, let’s write our own business logic based on the template.

One thing to note: in the generated template code, there is a PUBLIC KEY in the beginning of the file that needs to be copied to the DuerOS platform. But let’s not rush and continue focusing on the code.

*/
/**
 * @file   index.js This file is the entry file of the function, used to receive function requests for invocation. 
 * @author dueros
 */
const Bot = require('bot-sdk');
const privateKey = require('./rsaKeys.js').privateKey;

class InquiryBot extends Bot {
    constructor(postData) {
        super(postData);

        this.addLaunchHandler(() => {
            this.waitAnswer();
            return {
                outputSpeech: 'Welcome to Baby Recipe Recommendation!'
            };
        });

        this.addSessionEndedHandler(() => {
            this.endSession();
            return {
                outputSpeech: 'Thank you for using Baby Recipe Recommendation!'
            };
        });

        this.addIntentHandler('babyCookBooks', () => {
            let ageVar = this.getSlot('number');
            if (!ageVar) {
                this.nlu.ask('number');
                let card = new Bot.Card.TextCard('How old is the baby?');
                // Can return an asynchronous Promise
                return Promise.resolve({
                    card: card,
                    outputSpeech: 'How old is the baby?'
                });
            }

            if (this.request.isDialogStateCompleted()) {
               
                let cook_1 = 'Breast milk is the main source of nutrition, and it can be supplemented with formula milk. It is recommended to add minced shrimp or vegetable puree, etc.';
                let cook_3 = 'You can eat protein-rich foods such as eggs and lean meat. You can also eat vitamin-rich foods, such as celery and spinach.';

                var speechOutput = cook_1;

                if(ageVar > 1 ) {
                    speechOutput = cook_3;
                }
               
                let card = new Bot.Card.TextCard(speechOutput);
               
                return {
                    card: card,
                    outputSpeech: speechOutput
                };
            }
        });

    }
}
 
exports.handler = function (event, context, callback) {
    try {
        let b = new InquiryBot(event);
        // 0: debug  1: online
        b.botMonitor.setEnvironmentInfo(privateKey, 0);
        b.botMonitor.setMonitorEnabled(true);
        b.run().then(function (result) {
            callback(null, result);
        }).catch(callback);
    }
    catch (e) {
        callback(e);
    }
};

In the code above, we primarily focus on the entry point for intent processing, this.addIntentHandler('babyCookBooks'). The string “babyCookBooks” is the intent name that needs to be filled in when creating a skill on the DuerOS Skill Platform. You can modify it based on the skill name you define later.

In the following code section, for easy understanding, I introduced a simple judgment for recommending baby recipes. You can add more features on top of this. By separating the complex functionality and introducing it as a call, you can achieve a more complete skill.

Skill Creation and Binding #

Next, we create the skill. Go to the DuerOS Skill Open Platform, click on “Authorization,” and then click on “Create New Skill.”

Select “Custom Skill,” then “Start from Scratch.”

Scroll down on the page, enter “Baby Recipe Recommendations” for the “Skill Name” and “Invocation Name” sections. Select both the screen-based and screenless scenarios, and click on “Confirm.”

Next, continue to create the intent:

Here, we need to pay attention to two points.

First, the intent recognition name needs to be consistent with the name set in addIntentHandler in the Function Compute (CFC) code.

Second, expressions and slots are crucial. Different users may express themselves differently, so we need to ensure that common expressions cover as many user colloquial expressions as possible to improve intent recognition accuracy. DuerOS can extract slot information through expressions, and you can manually validate and correct them.

Slots are the parameter information of an intent and are a critical part. The parameters we obtain in the code are slot identifiers. If you are interested, you can learn more about the usage of skills on the DuerOS development platform.

Next, we associate the skill with the backend service. We choose Baidu CFC as the deployment service, and copy and paste the PUBLIC KEY and the unique identifier BRN of the function one by one into the “Public Key” and “BRN” fields on the “Configure Service” page.

Testing and Publishing #

By now, we have completed the two key steps of creating and binding the cloud function and creating the skill. Finally, let’s test if the program can run correctly.

We can choose either simulated testing or real device testing. If you don’t have a Baidu Smart Speaker on hand, you can choose simulated testing to verify. Open “Simulated Testing” and select “DuSmart At Home.” Enter “open baby recipe recommendations” as the invocation for the skill.

You will see that the screen of the smart speaker displays the welcome message from the code: “Welcome to use the Baby Recipe Recommendations.”

Next, if we enter “What to feed a 1-year-old baby,” we will see that the skill service returns the configured response from the code: “Mainly breast milk, can be mixed with milk, and it is recommended to add minced shrimp or vegetable puree.”

If you have a Baidu Smart Speaker nearby, you can also choose “Real Device Testing” and enable “Skill Debug Mode.” Note that the DuerOS platform account you log in with must be the same as the account used to log in to the Baidu Smart Speaker.

Let’s watch a video to get a feel for the skill implemented through Function Compute:

Finally, after refining and fine-tuning the program, you can publish it to share it with other users.

Conclusion #

In conclusion, in today’s class, we have completed the development experience of a baby recipe BOT skill by linking the smart speaker, DuerOS Skill Open Platform, and Cloud Function Compute (CFC).

Through this experiment, we can see that the development of a serverless and voice interaction-based conversational application mainly involves the following three steps:

Step 1: Building a serverless application that suits the business scenario. You can use functions packaging or image methods for development and deployment, such as the baby recipe function service in this case.

Step 2: Building an interactive skill intent and setting up voice interaction rules, such as the baby recipe intent in this case.

Step 3: Binding the skill and function trigger. This includes creating the DuerOS trigger in this case.

Based on these three steps and extensions, due to the open-source nature of DuerOS, it can widely support various hardware devices such as smartphones, TVs, smart speakers, cars, and robots. We can also use it to develop various conversational applications, such as smart customer service, home appliance control, and intelligent assistants. By combining DuerOS with serverless function compute, it can further lower the threshold for various industries, including health, finance, travel, telecommunications, to use AI conversation systems.

In fact, major cloud vendors have rich experience and mature products in integrating the serverless ecosystem. If your business is already on the cloud, you can explore which scenarios can be further utilized to reduce the workload of operations and management. If your business has not yet migrated to the cloud, I hope you can flexibly apply the open integration and the ideology embodied in serverless in your current work.

Finally, although this class is an experimental class, what I really want to communicate with you are two feelings behind this experiment. One is that divergent thinking is important. We can consciously link our needs and technology and make practical use of serverless. On the other hand, in today’s high-pressure work and life, as programmers, we can also use our skills to bring joy to our children’s growth and potentially generate income. In the “roll” era, can you also feel a different kind of happiness?

Thought Question #

Alright, this lesson is coming to an end, and I have a thought question for you.

Can Serverless technology be used for the backend service in children’s programming? In which areas can function computing be used, and where is Elastic Beanstalk a better fit?

Feel free to write your thoughts and answers in the comments section. Let’s have a discussion and exchange ideas together.

Thank you for reading, and feel free to share this lesson with more friends to read together.