Extract text from email messages using bots
- Introduction
- Building the behavior
- Understanding regular expressions
- Running the behavior
- Example
- References
Introduction
Let’s assume that you have a web-based form that emails you every time someone submits it. You want to extract the values from the message and use them in bot behaviors to automatically set custom fields, personalize auto-replies, fulfill new orders, etc.
The responses in your mailbox look something like:
Org: Kataflow Neural Implants, Inc.
Email: team@kataflow.example
Color: Robotic Gray
A simple bot behavior can parse that message and copy each value into a new placeholder. From there you can do almost anything.
Building the behavior
When building a bot behavior like this, it’s helpful to add an action node to the top of the behavior and set its status to Simulator only. That way you can override some placeholders during testing without having to find or create a record with the exact content you’re looking for.
In this case, we want to replace the value of the {{content}}
placeholder in the simulator with a sample form submission:
We’ll now always see that message content in the simulator, no matter which target message we test against. When the behavior is running against live messages it will use the actual message content.
Now a second action node can extract the form values into placeholders using a regular expression1 filter:
Understanding regular expressions
Let’s break down this script real quick to understand what it’s doing:
{{content|regexp('#^Org\:(.*?)$#m', 1)}}
We’re taking the value of the content
placeholder and applying a |regexp
filter to it. That filter expects its first argument to be a regular expression pattern, and the optional second argument is a specific capture group to return (opposed to all matches as an array).
In the first argument, we’re giving the pattern #^Org\:(.*?)$#m
:
#
is the pattern delimiter, in the format of#<pattern>#<flags>
.^Org\:
matches text that starts withOrg:
. We’re escaping:
, even though it’s not technically necessary.(.*?)
is a “non-greedy” capture group for any text after the last match and before the next one. Capture groups are defined with parentheses and you can have many of them. They can also be nested like((\d)-(\w+))
, in which case they’re numbered from the left-most opening parenthesis(
. In the previous example, the entire pattern is the first capture group,(\d)
is the second group, and(\w+)
is the third group.$
stops matching the capture group when we reach the end of the text.#m
ends the pattern and specifies them
flag for line-based matching. This changes the behavior of^
and$
so that they match the beginning and end of lines, instead of the entire message. That way we don’t care what order the lines are in, if they have extra whitespace, etc.
In the second argument, we’re asking for the value of capture group 1
, which was (.*?)
– everything after Org:
and before the end of the line $
.
We save that output to a placeholder named form_org
. Then we can do the same thing for any other form fields.
Running the behavior
This is what the simulator looks like in action:
Example
Here’s a full example behavior that you can import on any bot and customize to meet your needs:
{
"behavior": {
"title": "Extract text from an email-based form",
"is_disabled": false,
"is_private": false,
"priority": 50,
"event": {
"key": "event.macro.message",
"label": "Custom message behavior"
},
"nodes": [
{
"type": "action",
"title": "Set simulator values",
"status": "simulator",
"params": {
"actions": [
{
"action": "_set_custom_var",
"value": "Org: Kataflow Neural Implants, Inc.\r\nEmail: team@kataflow.example\r\nColor: Robotic Gray\r\n",
"format": "",
"is_simulator_only": "0",
"var": "content"
}
]
}
},
{
"type": "action",
"title": "Extract values",
"status": "live",
"params": {
"actions": [
{
"action": "_set_custom_var",
"value": "{{content|regexp('#^Org\\:(.*?)$#m', 1)}}",
"format": "",
"is_simulator_only": "0",
"var": "form_org"
},
{
"action": "_set_custom_var",
"value": "{{content|regexp('#^Email\\:(.*?)$#m', 1)}}",
"format": "",
"is_simulator_only": "0",
"var": "form_email"
},
{
"action": "_set_custom_var",
"value": "{{content|regexp('#^Color\\:(.*?)$#m', 1)}}",
"format": "",
"is_simulator_only": "0",
"var": "form_color"
}
]
}
}
]
}
}
References
-
Wikipedia: Regular Expressions - https://en.wikipedia.org/wiki/Regular_expression ↩