‘(‘ matcher_expr … ‘)’
A sequence group glues matcher expressions together - i.e for a sequence group to match, all its members must match.
Sequence group member matching results are not visible outside of the group - i.e for other matcher expressions in a pattern only the resulting group matching result is visible.
fs = string (enclosed in single or double quotes) or matcher expression (enclosed in curly brackets) representing field separator
charset = character set name enclosed in single or double quotes (for example
locale = string specifying IETF BCP 47 language tag enclosed in single or double quotes (see the list here ). The default locale is English.
Example: Parsing multiline records separated by an empty line (i.e a sequence of two consecutive line-feed characters). Suppose we have two records: one represented by strings in lines 1-3 and other on line 5:
aaaaaaaa bbbbbbbb cccccccc dddddddd
To extract records as strings we need to use DATA to match strings until we encounter two consecutive line-feeds. If we do not enclose them in sequence group then DATA will consume all characters until it encounters first line-feed. The engine continues to look match for next line-feed but as it finds beginning character “b” of string on line 2, it will consider parsing failed and continue from the beginning of pattern again. As a result, only line 3 and line 5 will be extracted - not as we expected.
DATA:record EOL EOL;
|NULL||pos=0 len=13 data=’aaaaaa bbbbbb’}|
By simply enclosing two EOL expressions (matching line-feeds) in the sequence group changes the behavior to intended: now the DATA stops matching only when two consecutive line-feeds appears:
DATA:record (EOL EOL);
The sequence group should be used also in cases when data elements are expected to be present of absent collectively.
Example: consider a simplified DNS server request log, consisting of timestamp, question, and optional DNS server IP-address enclosed in parenthesis. Suppose that the latter appears only when enabled in the server configuration (as it happens to be with BIND9), hence some logs may not have it:
2016-03-14 23:37:07;www.example.com (192.168.0.1) 2016-03-14 23:37:06;www.example.com
As the server is present or omitted together with enclosing parenthesis, we can use the sequence group to make them all optional:
1 2 3 4
TIMESTAMP:datetime ';' LD:question ( '(' IPADDR:server ')' )? EOL
Parsing results with DATA field evaluated to 192.168.0.1 for data in line 1 and NULL for data in line 2:
When dealing with delimiter separated fields (such as CSV), the sequence group allows writing patterns in a more readable way. The sequence group recognizes and matches field separators defined by fs configuration parameter.
Example: Consider the following CSV fields: a sequence number, a username, and an ip-address.
1,alice,192.168.1.1 2,bob,10.6.24.18 3,mallory,192.168.1.3
We can extract these using the following pattern:
1 2 3 4 5 6
( INT:sequence LD:username IPADDR:ip )(fs=',') EOL