SpectX URI Syntax

The location of source data is described using the URI syntax where:

  • the URI scheme contains a protocol specifier (see supported protocols here: Data Access Protocols)
  • the authority part can contain either DataStore configuration name or protocol specific target info (host, container, etc).
  • path contains path and filename. Note that you may use GLOB Patterns and Time Patterns to specify multiple files in the path.

Example 1:

sa://mylogstore/archive/apache/access.log
where:
    sa://  is uri scheme for Source Agent protocol
    mylogstore  is the uri authority part referring to DataStore named mylogstore
    /archive/apache/access.log  is uri path

Example 2:

ssh://192.168.1.100/data/logs/apache/access.log
where:
    ssh://  is uri scheme for SSHv2 protocol
    192.168.1.100  uri authority part specifies target ip-address.
    /data/logs/apache/access.log  is uri path

SpectX resolves URIs using a staged approach:

1. Resolving the URI scheme. SpectX tries to find the protocol provider that implements the specified schema in URI. If no provider is found, an error is thrown.

2. Resolving the authority part of the URI. First, SpectX tries to match the authority part to any of the defined DataStore names. If a match is found, it takes connection parameters from the DataStore and passes them to the protocol provider. If necessary parameters are not found, an error is thrown.

GLOB Patterns

SpectX supports the GLOB pattern syntax for matching URIs to individual files/blobs. Any character that appears in a pattern matches itself, except the special pattern characters described below. The NUL character must not occur in a pattern. To match special pattern characters literally, they must be escaped by the preceding backslash.

Special pattern characters have the following meaning:

1. * (single star) - matches any string, including the null string.

Example 3: /user/examples/doc/user_manual/datastore/spectx_uri_syntax/example3.sx

1
2
3
4
5
/*  pattern: `sx:/user/data/logs/production/auth/2015/*.log` matches and retrieves all files with the extension 'log'
    in the /user/data/logs/production/auth/2015/ directory:
*/

LIST('sx:/user/examples/data/logs/production/auth/2015/*.log').select(uri);

2. ** ... (multiple star) - when placed between ‘/’ characters, it matches directories and subdirectories up to the depth of the number of stars (i.e recurring match). When placed anywhere else (i.e in file expansion, preceded, followed or enclosed by any other characters), it is treated as a single star pattern.

Example 4: /user/examples/doc/user_manual/datastore/spectx_uri_syntax/example4.sx

1
2
3
4
// pattern: 'sx:/user/examples/data/logs/***/*.srv04.v1.log' searches in 3 levels deep under
// the /user/data/logs/ directory and retrieves all files ending with 'srv04.v1.log'

LIST('sx:/user/examples/data/logs/***/*.srv04.v1.log').select(uri);

3. ? - (question mark) matches any single character.

Example 5: /user/examples/doc/user_manual/datastore/spectx_uri_syntax/example5.sx

1
2
3
4
// pattern: 'sx:/user/examples/data/logs/production/auth/2016/010?.srv01.v1.log matches logs
// from server01 from first decade of January 2016
    
LIST('sx:/user/examples/data/logs/production/auth/2016/010?.srv01.v1.log').select(uri);

4. [...] Matches any one of the enclosed characters. A pair of characters separated by a hyphen marks a range. Any character that falls between those two (inclusive) characters, is matched. UTF8 collating sequence and character set are used. If the first character following [ is ! or ^, then any character not enclosed in the range is matched. - may be matched by including it as the first or last character in the set. ] may be matched by including it as the first character in the set.

Example 6: /user/examples/doc/user_manual/datastore/spectx_uri_syntax/example6.sx

1
2
3
4
// pattern `sx:/user/examples/data/glob/l2/[a-c]*.log` will match and retrieve files whose name begins
// with characters 'a', 'b' or 'c' and ends with '.log':
    
LIST('sx:/user/examples/data/glob/l2/[a-c]*.log').select(uri);

Within ‘[’ and ‘]’, character classes can be specified using the syntax [:class:]. Classes are defined in the POSIX standard:

POSIX Character Class Description
[:alnum:]
Alphanumeric characters a-z; A-Z; 0-9
[:alpha:]
Alphabetic characters a-z; A-Z
[:ascii:]
All ASCII characters in range of 0x0 - 0x7F
[:blank:]
Space (0x20) and tab (0x9) characters
[:cntrl:]
Control characters in ASCII range
0x1-0x1F; 0x7
[:digit:]
Digit in range of 0-9
[:graph:]
Visible characters in the ASCII code
range 0x21 - 0x7E
[:lower:]
Lowercase letters a-z
[:print:]
Printable characters in the ASCII
code range 0x20 - 0x7E
[:punct:]
Punctuation and symbols
!”#$%&’()*+,-./:;<=>?@[]^_`{|}~|
[:space:]
All whitespace characters. In ASCII codes:
0x20; 0x9; 0xA 0xB; 0xC ;0xD
[:upper:]
Uppercase letters A-Z
[:xdigit:]
Digit in hexadecimal notation 0x0 - 0xF
[:word:]
Word characters: letters a-z; A-Z;
numbers 0-9 and underscore _)

Example 7: /user/examples/doc/user_manual/datastore/spectx_uri_syntax/example7.sx

1
2
3
4
// pattern `sx:/user/examples/data/glob/l2/[a-c][[:digit]].log` will match and retrieve files whose
// name begins with characters 'a', 'b' or 'c', followed by single digit and ends with '.log':
    
LIST('sx:/user/examples/data/glob/l2/[a-c][[:digit:]].log').select(uri);

Note: SpectX does not support matching equivalence classes or collating symbols because we do not foresee functional need for such directory or file expansion in SpectX.

5. In the following description, a pattern-list is a list of one or more patterns separated by a ‘|’. Composite patterns can be formed using one or more of the following sub-patterns:

  • ?(pattern-list) Matches zero or one occurrence of the given patterns.
  • *(pattern-list) Matches zero or more occurrences of the given patterns.
  • +(pattern-list) Matches one or more occurrences of the given patterns.
  • @(pattern-list) Matches one of the given patterns.
  • !(pattern-list) Matches anything except one of the given patterns.

Example 8: /user/examples/doc/user_manual/datastore/spectx_uri_syntax/example8.sx

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// LIST('sx:/user/examples/data/glob/l1/a?(b|x)c');
// matches:
// abc	ac	axc

// LIST('sx:/user/examples/data/glob/l1/a*(b|x)c');
// matches:
// abbc	abc	abxc	ac	axc

// LIST('sx:/user/examples/data/glob/l1/a+(b|x)c');
// matches:
// abbc	abc	abxc	axc

// LIST('sx:/user/examples/data/glob/l1/a@(b|x)c');
// matches:
// abc	axc

// LIST('sx:/user/examples/data/glob/l1/!(a@(b|x)c)');
// matches:
// abbc	abxc	ac DCE

LIST('sx:/user/examples/data/glob/l1/*([[:lower:]])');
// matches:
//abbc	abc	abxc	ac	axc

Time Patterns

SpectX supports URI expansion using the following time patterns:

Time pattern Description
$yy$ 2-digit year
$yyyy$ 4-digit year
$M$ variable length month in year (1 - 12)
$MM$ 2-digit month in year (01 - 12)
$d$ variable length day in month (1 - 31)
$dd$ 2-digit day in month (01 - 31)
$H$ variable length hour of zero based 24 hour clock (0 - 23)
$HH$ 2-digit hour of zero based 24 hour clock (00 - 23)
$m$ variable length minute in hour (0 - 59)
$mm$ 2-digit minute in hour (00 - 59)
$s$ variable length second in minute (0 - 59)
$ss$ 2-digit second in minute (00 - 59)
$S$ variable length millisecond in second (0 - 999)
$SSS$ 3-digit millisecond in second (000 - 999)

The abbreviated 2-digit year must be interpreted relative to a century. If the year value is less than 32, the date is adjusted to the 21st century, otherwise to the 20th century. I.e. the year 12 parses to 2012 and the year 72 parses to 1972.

2, 3 and 4-digit tokens are treated as fixed length matchers, accepting only respective amount of digits.

Example 9: /user/examples/doc/user_manual/datastore/spectx_uri_syntax/example9.sx

1
2
// path_time field will be evaluated when using time patterns with LIST command
LIST('sx:/user/examples/data/glob/l1/$yyyy$$MM$$dd$').select(path_time);

evaluates to 2015-06-25 00:00:00

Variable length patterns must accept a variable amount of digits. This means the parser needs information on time unit length. The only feasible way to do this is to separate time units with distinct markers (non-numeric characters). Variable length time units placed consecutively, without non-numeric separators in-between, are impossible to parse correctly.

Example 10: /user/examples/doc/user_manual/datastore/spectx_uri_syntax/example10.sx

1
2
3
4
5
// pattern 'sx:/user/examples/data/glob/l3/$yyyy$/$M$/access_$d$-$H$.log' will match
// variable length month, day and hour in '/user/examples/data/glob/l3/' subdirectories
// LIST output field path_time contains evaluated time pattern

LIST('sx:/user/examples/data/glob/l3/$yyyy$/$M$/access_$d$-$H$.log').select(uri, path_time);