Getting Started

Installation

SpectX requires Oracle’s Java Runtime Environment (JRE) 1.8 to be installed on the system. It is available for download from Oracle download site. Please note that using OpenJDK is not recommended, as it results in reduced performance. Please check your Java version first by running (and then installing/upgrading accordingly if needed):

$ java -version

Download SpectX and follow the installation steps below. SpectX can be run in standard user privileges.

  1. Using terminal unpack the tarball and change to the unpacked directory: /spectx/
$ tar -zxf spectx-v{version}.tar.gz
$ cd spectx
  1. Run the server:
$ bin/spectx.sh
  1. Open your browser and navigate to http://localhost:8388. You should see the SpectX login screen.
  2. Log in with the initial username admin and password spectx. You must change the password before you can continue.

At this point, you’re done setting up SpectX with the default configuration parameters. For detailed installation instructions, please see installation section of the admin manual.

Your First Query

Since SpectX doesn’t keep the source data in its “belly” you must first think where it is located. There are many options: cloud storages, web, your on-premise servers or the local file system (see Data Access Protocols for all data access protocols supported).

Data Browser helps you to explore data in all types of data storages. You can always use it for creating a new query script.

To make it easier to choose data for your first query we’ve prepared a few sample datasets. You’ll need to extract the examples first: right click on the User directory and choose Extract examples from the drop-down menu.

  1. Open Data Browser (press Data Browser below the resource tree).
  2. Navigate to sx:/user/examples/data/auth.log.
  3. Press Prepare query - this creates a query extracting the first 100 records from the chosen file. It also tries to autodetect the structure of records and to extract the fields found:

Example 1.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
$pattern = PATTERN{
   $str=(CSVDQS{2,256000}:value| CSVSQS{2,256000}:value| DATA{0,256000}:value);
   $FS='\t';

   $str:field_0 $FS, $str:field_1 $FS, $str:field_2 $FS, $str:field_3
   EOL
};
@src = PARSE(pattern:$pattern, src:'sx:/user/examples/data/auth.log');

@src.limit(100);

Pressing “Run” will execute the script and display the results in the Results window. Congratulations, you have successfully created and executed your first query!

Your First Pattern

When you look closely at fields produced by the autodetect pattern, you’ll see they’re all the same type - STRING. While being sufficient to perform generic search and filtering, strings don’t allow us to perform time or network related data functions. Therefore we need to change the type of these extracted fields and perhaps assign them a more meaningful name, too.

As we can see from our first query, we’re dealing with TAB-separated fields, where:

  • field_0 is a timestamp
  • field_1 is an ip-address
  • field_2 is a username and
  • field_3 is a numeric result code

Each line represents one record.

Press Data Browser and then (after navigating to sx:/user/examples/data/auth.log if needed) press Prepare Pattern. This will open the Pattern Editor window with an autodetected pattern. Although we could easily modify it, it is best to get rid of this pattern altogether. This way the working principles of pattern matching are best explained. (If you don’t care typing in the following, you can just open it here: /user/examples/doc/getting_started/first_pattern/example2.sxp)

The first field in the record is a timestamp. We can use the TIMESTAMP matcher to extract it. We also need to describe the time units format TIMESTAMP('yyyy-MM-dd HH:mm:ss Z'). In order to make the parsed timestamp visible for the query we also need to assign it an Export Name: TIMESTAMP('yyyy-MM-dd HH:mm:ss Z'):tstamp. As you type this into the upper window in Pattern Editor, the lower window displays Parse Preview automatically coloring matched items in source data.

The first field is followed by the field separator TAB, for this we use the Constant String matcher '\t'. As we’re not interested in exposing field separators for query let’s not assign an export name for it. You can immediately see that the parser now matches the tstamp field and following TAB.

Let’s add the remaining elements. IP-address (version 4) can be described using IPV4:ipAddr followed again by the field separator TAB '\t'. For the username we can use the LD:userName matcher (a wildcard for any character). Add another field separator. The last field is a numeric result code, let’s use INT to extract this: INT:result And the record is terminated by EOL capturing line feed. And we’re done! Here’s how our pattern looks like:

Example 2:

1
2
3
4
5
TIMESTAMP('yyyy-MM-dd HH:mm:ss Z'):tstamp '\t'
IPV4:ipAddr '\t'
LD:userName '\t'
INT:result
EOL

Pressing “Parse” executes data parsing in the Data Editor tab. The resultset contains one additional column: _unmatched which would contain unmatched bytes, should they occur during parsing. In our case there are none:

_unmatched tstamp ipAddr userName result
NULL 2016-01-03 00:13:28 +0200 110.188.4.216 forerequest 200
NULL 2016-01-06 06:35:24 +0200 48.242.116.66 unrioting 200
NULL 2016-01-05 11:49:01 +0200 223.11.158.94 ribassano 404
...        

We can go back to our query tab and replace the pattern with the one we just created. Now we’re able to run more interesting computing on our extracted data. For instance, let’s find out how many successes and rejects we get on a weekly basis:

Example 3.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
$pattern = PATTERN{
TIMESTAMP('yyyy-MM-dd HH:mm:ss Z'):tstamp '\t'
IPV4:ipAddr '\t'
LD:userName '\t'
INT:result
EOL
};

@src = PARSE(pattern:$pattern, src:'sx:/user/examples/data/auth.log');

@src
 .select(
    tstamp[1 week] as period,       // truncate tstamp to 1 week
    count(result=200) as success,
    count(result!=200) as rejects
    )
 .group(period);