Give Cerb bots the power of speech with Amazon Polly
Introduction
In this guide you’ll learn how to generate audio streams of speech from arbitrary text using bots in Cerb.
First, we’ll create a new connected account to securely integrate with Amazon Polly – a fast, inexpensive, and lifelike text-to-speech service from Amazon Web Services.
Next, we’ll create a delegate named Polly Bot to secure your credentials and provide text-to-speech as a simple service for any other bot in Cerb. Your team’s bots will be able to give Polly Bot some text and a preferred voice (gender/accent), and receive back a secure time-limited URL that you can share and play anywhere.
Finally, we’ll demonstrate how to use Polly Bot as a delegate from a conversational bot to respond to workers with speech directly in their web browser.
- Configure the Amazon Web Services service in Cerb
- Log in to Amazon Web Services
- Create Polly Bot in Cerb
- Using Polly Bot from a conversational bot
- References
Configure the Amazon Web Services service in Cerb
-
Log into Cerb as an administrator.
-
Navigate to Search » Connected Accounts.
-
If you don’t have a connected account for Amazon Web Services yet, you can follow these instructions to create one.
Log in to Amazon Web Services
First, we need to create a new user in your Amazon Web Services (AWS) account and attach a policy that describes the services that they are allowed to use. This is accomplished with the Identity Access Management (IAM) service. This user will receive credentials that you can use to interact with AWS from Cerb.
Log in to the AWS Management Console and navigate to the IAM service.
If you don’t have an AWS account, you can sign up for free at: https://aws.amazon.com
Update your IAM policy
Navigate to IAM from the Services menu at the top of the page.
We need to update the policy in your Amazon Web Services (AWS) account to describe the services that your Cerb bot is allowed to use. This is accomplished with the Identity Access Management (IAM) service.
We’re going to update the IAM policy to provide:
- Read-only access to Amazon Polly
Select Policies in the navigation on the left.
Find your bot’s policy in the list or create a new one. In the earlier instructions we created a policy named CerbBot.
Click the Edit Policy button.
Select the JSON tab.
Add the following block to the Statement
list:
{
"Effect": "Allow",
"Action": [
"polly:DescribeVoices",
"polly:GetLexicon",
"polly:ListLexicons",
"polly:SynthesizeSpeech"
],
"Resource": [
"*"
]
}
Click the blue Review policy button in the bottom right.
Click the blue Save changes button in the bottom right.
Create Polly Bot in Cerb
Now we’re ready to create the bot that interacts with AWS using our connected account.
Navigate to Setup » Packages » Import.
Copy and paste the following behavior into the large text box:
{
"package": {
"name": "Polly Bot",
"revision": 1,
"requires": {
"cerb_version": "9.1.0"
},
"configure": {
"placeholders": [
],
"prompts": [
{
"type": "chooser",
"label": "AWS Account:",
"key": "prompt_aws_account_id",
"params": {
"context": "cerberusweb.contexts.connected_account",
"query": "aws OR amazon",
"single": true
}
},
{
"type": "text",
"label": "AWS Region:",
"key": "prompt_aws_region",
"params": {
"default": "us-west-2"
}
}
]
}
},
"bots": [
{
"uid": "bot_polly",
"name": "Polly Bot",
"owner": {
"context": "cerberusweb.contexts.app",
"id": 0
},
"is_disabled": false,
"params": {
"events": {
"mode": "all",
"items": [
]
},
"actions": {
"mode": "all",
"items": [
]
},
"interactions": [
]
},
"image": "",
"behaviors": [
{
"uid": "behavior_66",
"title": "Get interactions",
"is_disabled": false,
"is_private": false,
"priority": 50,
"event": {
"key": "event.interactions.get.worker",
"label": "Conversation get interactions for worker",
"params": {
"listen_points": "global"
}
},
"nodes": [
{
"type": "action",
"title": "Return",
"status": "live",
"params": {
"actions": [
{
"action": "return_interaction",
"behavior_id": "{{{uid.behavior_68}}}",
"name": "Say something",
"interaction": "say",
"interaction_params_json": "{}"
}
]
}
}
]
},
{
"uid": "behavior_67",
"title": "Get presigned speech URL",
"is_disabled": false,
"is_private": false,
"priority": 1,
"event": {
"key": "event.macro.bot",
"label": "Custom behavior on bot"
},
"variables": {
"var_text": {
"key": "var_text",
"label": "Text",
"type": "S",
"is_private": "0",
"params": {
"widget": "single"
}
},
"var_voice": {
"key": "var_voice",
"label": "Voice",
"type": "D",
"is_private": "0",
"params": {
"options": "Brian\r\nGeraint\r\nJoanna"
}
}
},
"nodes": [
{
"type": "action",
"title": "Get presigned URL",
"status": "live",
"params": {
"actions": [
{
"action": "wgm.aws.bot.action.get_presigned_url",
"http_verb": "get",
"http_url": "{% set query = {\t\"OutputFormat\": \"mp3\",\t\"VoiceId\": var_voice,\t\"TextType\": \"text\",\t\"Text\": var_text} %}https://polly.{{{prompt_aws_region}}}.amazonaws.com/v1/speech?{{query|url_encode}}",
"http_headers": "",
"http_body": "",
"expires_secs": "60",
"auth_connected_account_id": "{{{prompt_aws_account_id}}}",
"response_placeholder": "polly_speech_url"
}
]
}
}
]
},
{
"uid": "behavior_68",
"title": "Handle interactions",
"is_disabled": false,
"is_private": false,
"priority": 50,
"event": {
"key": "event.interaction.chat.worker",
"label": "Conversation handle interaction with worker"
},
"nodes": [
{
"type": "action",
"title": "Set bot name",
"status": "live",
"params": {
"actions": [
{
"action": "set_bot_name",
"name": "Polly"
}
]
}
},
{
"type": "action",
"title": "Run behavior",
"status": "live",
"params": {
"actions": [
{
"action": "switch_behavior",
"return": "0",
"behavior_id": "{{{uid.behavior_69}}}",
"var": "_behavior"
}
]
}
}
]
},
{
"uid": "behavior_69",
"title": "Repeat after me",
"is_disabled": false,
"is_private": true,
"priority": 50,
"event": {
"key": "event.message.chat.worker",
"label": "Conversation with worker"
},
"nodes": [
{
"type": "action",
"title": "What would you like me to say?",
"status": "live",
"params": {
"actions": [
{
"action": "send_message",
"message": "What would you like me to say?",
"format": "",
"delay_ms": "1000"
},
{
"action": "prompt_text",
"placeholder": ""
}
]
}
},
{
"type": "action",
"title": "Get pre-signed URL",
"status": "live",
"params": {
"actions": [
{
"action": "_run_behavior",
"on": "_trigger_va_id",
"behavior_id": "{{{uid.behavior_67}}}",
"var_text": "{{message}}",
"var_voice": "Brian",
"run_in_simulator": "0",
"var": "_behavior"
}
]
}
},
{
"type": "action",
"title": "Speak",
"status": "live",
"params": {
"actions": [
{
"action": "send_script",
"script": "<script>\r\nDevblocks.playAudioUrl('{{_behavior.polly_speech_url}}');\r\n</script>"
}
]
}
},
{
"type": "action",
"title": "Again?",
"status": "live",
"params": {
"actions": [
{
"action": "send_message",
"message": "Say something else?",
"format": "",
"delay_ms": "2000"
},
{
"action": "prompt_buttons",
"options": "yes\r\nno",
"color_from": "#ffffff",
"color_mid": "#ffffff",
"color_to": "#ffffff",
"style": ""
}
]
}
},
{
"type": "switch",
"title": "Yes/No",
"status": "live",
"nodes": [
{
"type": "outcome",
"title": "Yes",
"status": "live",
"params": {
"groups": [
{
"any": 0,
"conditions": [
{
"condition": "message",
"oper": "is",
"value": "yes"
}
]
}
]
},
"nodes": [
{
"type": "action",
"title": "Repeat",
"status": "live",
"params": {
"actions": [
{
"action": "switch_behavior",
"return": "0",
"behavior_id": "{{{uid.behavior_69}}}",
"var": "_behavior"
}
]
}
}
]
},
{
"type": "outcome",
"title": "No",
"status": "live",
"params": {
"groups": [
{
"any": 0,
"conditions": [
]
}
]
},
"nodes": [
{
"type": "action",
"title": "Bye!",
"status": "live",
"params": {
"actions": [
{
"action": "send_message",
"message": "Bye!",
"format": "",
"delay_ms": "1000"
},
{
"action": "window_close"
}
]
}
}
]
}
]
}
]
}
]
}
]
}
Click the Import button.
Cerb will prompt you to link your AWS Account: before creating the behavior. Click the chooser button and select AWS (Cerb).
You will also be prompted to enter the AWS region where you created the Lambda function. You can find this at the beginning of the ARN for your function (e.g. arn:aws:lambda:us-west-2:
).
Click the Import button again.
Using Polly Bot from a conversational bot
We’ve included an example of a speech-enabled conversational behavior.
-
Start a chat by clicking on the bot icon in the lower right of your browser:
-
Select Polly Bot in the menu.
-
Click the Say something option in the menu.
-
Type anything you would like the bot to say and press the
<ENTER>
key.
The bot will speak what you typed.
You can use this example behavior to add speech to your other conversations.
Learn how the behavior works
Even though the behavior was automatically created for you, it’s useful to understand how everything works so you can make changes and create your own behaviors later.
-
Navigate to Search » Bots and open the card for Polly Bot.
-
Click on the Behaviors button.
-
Open the card for the Get presigned speech URL behavior.
On the behavior’s card you’ll see its decision tree at the bottom.
Click the Custom bot behavior node and select Edit Behavior from the menu:
You should see:
Here we’ve defined two public behavior variables named Text and Voice. These variables will be provided by other bots when they want Polly Bot to generate speech and return a pre-signed URL to the audio stream.
You can close the edit popup without saving it (click the (x) icon in the top right of the popup).
So that’s the behavior itself. Let’s look at the action node, where the magic actually happens.
Click on the Get presigned URL node in the decision tree. You should see:
This action uses your AWS: Cerb (Polly) connected account to pre-sign a POST
request to the Amazon Polly API. In the Request body: we’re sending a JSON payload that asks for an MP3 audio stream for the text defined in var_text
using the voice defined in var_voice
. We’ve also set the pre-signed URL to expire in 60
seconds.
Most importantly, the URL for the audio stream will be saved in a new polly_speech_url
placeholder.
That’s it! Polly Bot is rather straightforward, but it will save you a lot of work when you need to add speech to several other bots. All of your interaction with AWS Polly is in a single place.
Test the behavior with the bot simulator
We’ve clicked around a lot. By now you’re probably eager to actually hear a bot speak.
While you still have the action editor popup open, click the Simulator button in the bottom toolbar.
Enter some text that you want the bot to say. You can optionally choose the voice to use.
We’ve included a few of our favorite voices, but you can use any of the voices or languages supported by Polly1 by editing the behavior and adding them to the public behavior variables. You can even use the SSML2 format instead for fine-tuning pronunciation, pitch curves, etc.
Click the Simulate button.
You won’t hear speech yet, but in the log at the bottom of the simulator you should should see the audio stream URL under >>> Saving pre-signed URL to {{polly_speech_url}}:
Copy the (very long) URL to your clipboard and paste it into the location bar in a new browser window or tab. You should hear the text you typed being spoken in the selected voice.
For example, we received this audio stream for the example above:
You can do anything with the audio stream URL at this point. For instance:
- A bot could send the URL to a web browser and play it.
- A bot could download it as a new attachment record.
- You could use it in an interactive voice response system during a phone call (e.g. Twilio, Asterisk).
In a conversational bot behavior, you can use the Respond with script action to speak the URL:
<script>
Devblocks.playAudioUrl('{{_behavior.polly_speech_url}}');
</script>
References
-
AWS Developer Guide: Polly Voices http://docs.aws.amazon.com/polly/latest/dg/API_Voice.html ↩
-
AWS: Using SSML - http://docs.aws.amazon.com/polly/latest/dg/ssml.html ↩