Sequence Group

‘(‘ matcher_expr … ‘)’

A sequence group glues matcher expressions together - i.e for a sequence group to match, all its members must match.

Note

Sequence group member matching results are not visible outside of the group - i.e for other matcher expressions in a pattern only the resulting group matching result is visible.

output type:

STRING

quantifier:

not allowed

configuration:

fs = string (enclosed in single or double quotes) or matcher expression (enclosed in curly brackets) representing field separator

charset = character set name enclosed in single or double quotes (for example charset="ISO-8859-1")

locale = string specifying IETF BCP 47 language tag enclosed in single or double quotes (see the list here ). The default locale is English.

Sequence group is used when you want to match a subpattern independently of surrounding data elements, typically when performing conditional matching using LD, LDATA, DATA or with lookarounds.

Example: Parsing multiline records separated by an empty line (i.e a sequence of two consecutive line-feed characters). Suppose we have two records: one represented by strings in lines 1-3 and other on line 5:

aaaaaaaa
bbbbbbbb
cccccccc

dddddddd

To extract records as strings we need to use DATA to match strings until we encounter two consecutive line-feeds. If we do not enclose them in sequence group then DATA will consume all characters until it encounters first line-feed. The engine continues to look match for next line-feed but as it finds beginning character “b” of string on line 2, it will consider parsing failed and continue from the beginning of pattern again. As a result, only line 3 and line 5 will be extracted - not as we expected.

1
DATA:record EOL EOL;
record _unmatched
NULL pos=0 len=13 data=’aaaaaa bbbbbb’}
cccccc NULL
dddddd NULL

By simply enclosing two EOL expressions (matching line-feeds) in the sequence group changes the behavior to intended: now the DATA stops matching only when two consecutive line-feeds appears:

1
DATA:record (EOL EOL);
record _unmatched
aaaaaa\nbbbbb\ncccccc NULL
dddddd NULL

The sequence group should be used also in cases when data elements are expected to be present of absent collectively.

Example: consider a simplified DNS server request log, consisting of timestamp, question, and optional DNS server IP-address enclosed in parenthesis. Suppose that the latter appears only when enabled in the server configuration (as it happens to be with BIND9), hence some logs may not have it:

2016-03-14 23:37:07;www.example.com (192.168.0.1)
2016-03-14 23:37:06;www.example.com

As the server is present or omitted together with enclosing parenthesis, we can use the sequence group to make them all optional:

1
2
3
4
TIMESTAMP:datetime ';'
LD:question
( '(' IPADDR:server ')' )?
EOL

Parsing results with DATA field evaluated to 192.168.0.1 for data in line 1 and NULL for data in line 2:

datetime question server _unmatched
2016-03-14 23:37:07 www.example.com 192.168.0.1 NULL
2016-03-14 23:37:07 www.example.com NULL NULL

When dealing with delimiter separated fields (such as CSV), the sequence group allows writing patterns in a more readable way. The sequence group recognizes and matches field separators defined by fs configuration parameter.

Example: Consider the following CSV fields: a sequence number, a username, and an ip-address.

1,alice,192.168.1.1
2,bob,10.6.24.18
3,mallory,192.168.1.3

We can extract these using the following pattern:

1
2
3
4
5
6
(
 INT:sequence
 LD:username
 IPADDR:ip
)(fs=',')
EOL
sequence username ip _unmatched
1 alice 192.168.1.1 NULL
2 bob 10.6.24.18 NULL
3 mallory 192.168.1.3 NULL