Extract text using regular expressions
Here are examples of using regular expressions to extract matching text in automation scripting.
Matching a single capture group
The pattern is a KATA key.
start:
set:
text: Your Amazon Order #Z-1234-5678-9 has shipped!
pattern: /Amazon Order #([A-Z0-9\-]+)/
return:
order_id: {{text|regexp(pattern, 1)}}
We're taking the value of the text
placeholder and applying a |regexp
filter to it. That filter expects its first argument to be a regular expression pattern
, and the optional second argument is a specific capture group to return (opposed to all matches as an array).
In the first argument, we're giving the pattern /Amazon Order #([A-Z0-9\-]+)/
:
/
is the pattern delimiter, in the format of/<pattern>/<flags>
.Amazon Order #
matches text that starts with that phrase.(...)
is a "capture group" for text after the previous match. Capture groups are defined with parentheses and you can have many of them. They can also be nested like((\d)-(\w+))
, in which case they're numbered from the left-most opening parenthesis(
.[A-Z0-9\-]+
matches one or more consecutive characters while they are capital letters, digits, or a dash (-
). The[...]
brackets list the characters to match. The+
at the end means "one or more", opposed to*
which would mean "zero or more'.
Setting the pattern as a variable
The pattern is a scripting variable.
start:
set:
mask@text:
{% set text = "The ticket mask that I am looking for is: KRN-69622-357 something else" %}
{% set pattern %}/[A-Z]{3}-\d{5}-\d{3}/{% endset %}
{{text|regexp(pattern)}}
outcome/hasMask:
if@bool: {{mask}}
then:
return:
output: The ticket mask is #: {{mask}}
Using multiple capture groups
The second argument to |regexp
specifies the capture group to return.
start:
set:
text: (123,456)
pattern: /^\((\d+),(\d+)\)$/
return:
x@int: {{text|regexp(pattern, 1)}}
y@int: {{text|regexp(pattern, 2)}}
Returning all matches for all capture groups
Use the regexp_match_all() function to return multiple capture groups for all matches.
-
start: set: headers@text: X-Mailer: Cerb From: customer@cerb.example To: support@cerb.example return: results@text: {% set results = regexp_match_all("#^(.*?): (.*?)$#m", headers) %} {{results|json_encode|json_pretty}}
-
__return: results: |- [ [ "X-Mailer: Cerb", "From: customer@cerb.example", "To: support@cerb.example" ], [ "X-Mailer", "From", "To" ], [ "Cerb", "customer@cerb.example", "support@cerb.example" ] ]