ES

Elasticsearch

SpectX supports reading data directly from Elasticsearch database using ES command:

ES(uri:uri_str)
ES(uri:uri_str, index:idx_str)
ES(uri:uri_str, index:idx_str, credentials:creds_tuple)
ES(uri:uri_str, index:idx_str, credentials:creds_tuple, _insecure_tls:tls_type_boolean)
ES(uri:uri_str, index:idx_str, credentials:creds_tuple,
   _insecure_tls:tls_type_boolean, _tasks_per_shard:tasks_count_int)

where:

  • uri_str - ElasticSearch endpoint root uri.

  • idx_str - index(es) to read from. Comma separated (no whitespaces), wildcards allowed.

  • tls_type_boolean - whether to skip server certificate chain & host validation, default false. Optional.

  • _tasks_count_int - specifies how many retrieval tasks to create per replica of each shard of an index, default 1. Optional.

  • creds_tuple - authentication attributes for different schemes (expressed as tuple):

    • credentials:{type:'basic', user::STRING, password::STRING}
    • credentials:{type:'xpack', user::STRING, password::STRING}
    • credentials:{type:'token', token::STRING} - OAuth2 Bearer token obtained via Get token API
    • credentials:{type:'aws', accessKeyId::STRING, secretKey::STRING, region::STRING} - AWS IAM user credentials for using Elasticsearch Service
    • credentials:{type:'ec2'} - use when accessing AWS Elasticsearch Service from AWS EC2 role (credentials are retrieved from instance metadata)

When Elasticsearch is configured to accept anonymous commands then credentials can be omitted.

Example Query from Elasticsearch instance running on localhost in anonymous mode, from index apache containing entries of Apache web server access log:

@src = ES(
    uri: "http://127.0.0.1:9200"
    ,index: "apache"
    ,credentials:{type:'basic', user:'elastic', password:'...'}
);

@src | select(timestamp, ip, uri, verb, response, bytes, referrer);

Search Predicate Delegation

When performing query against Elasticsearch using filter command the search predicates constructed using Comparison will be passed to Elasticsearch engine for execution. In other words, Elasticsearch executes search using predicates and passes results back to SpectX.

Example Retrieve all records from index apache where response size exceeds 300 bytes.

@src = ES(
    uri: "http://127.0.0.1:9200"
    ,index: "apache"
);

@src | filter(bytes > 300);

Elasticsearch Query String Support

You can make Elasticsearch engine to apply Query String when retrieving data from Elasticsearch. Using ES_QUERY function in a filter command instructs SpectX query optimizer to pass a specified query string to Elasticsearch engine for execution.

Example Retrieve all records from index apache which contains words “login” or “logout” in the uri field.

@src = ES(
    uri: "http://127.0.0.1:9200"
    ,index: "apache"
);

@src | filter(ES_QUERY(uri, "login OR logout"));

Multivalue Fields

In Elasticsearch, there is no dedicated array type. Any field can contain zero or more values by default (https://www.elastic.co/guide/en/elasticsearch/reference/7.0/array.html). This means Elasticsearch treats internally the fields as arrays. For instance, single value fields are represented as single-member arrays.

Transforming fields always to SpectX ARRAY would make writing queries very uncomfortable considering that single values are predominant. Hence SpectX exposes each Elasticsearch index field as two SpectX fields:

  • single value field with concrete type, named after the corresponding ES field (for instance with type LONG). It contains always the first element of the Elasticsearch field array.
  • hidden multivalue field as ARRAY of concrete type, e.g. ARRAY(LONG) named after the corresponding ES field with _ prepended and s appended, e.g. _customers for customer.

By default, only single value type fields appear in the resultset. Multivalue fields must be queried explicitly.

Example

@src = ES(
    uri: "http://127.0.0.1:9200"
    ,index: "apache"
);

@src | select(ip, _ips);

Handling Large Responses

SpectX retrieves results from Elasticsearch by batches of size 512MB containing up to 10 000 documents. These default settings can be adjusted for optimizing querying documents with large sizes using the following parameters:

  • _response_size_limit - maximum size of response from the Elasticsearch. Units: ‘G’ - gigabytes, ‘M’ - megabytes, ‘K’ - kilobytes. Default 512M.
  • _docs_per_batch - the number of documents to retrieve per request (batch) (default 10000).

Example 5.

@src = ES(
    uri: "http://127.0.0.1:9200"
    ,index: "apache"
    ,_response_size_limit:1G  // set response batch size to 1Gb
);

@src;

Protecting the Confidentiality of Credentials

To protect the confidentiality of database access credentials, the query scripts containing JDBC connection strings should be kept in user private directory (i.e under /user/).

When such query scripts are meant to be shared between users, SpectX recommends that admins declare the stream in a separate script file which can be called by each analysis script. The stream declaration must be placed under /system/ directory or protected by Data Access Control from viewing while allowing execution.

Example:

The script querying from index employeeTable, saved as: /system/employeeTable.sx:

1
2
3
4
5
ES(
    uri: "http://127.0.0.1:9200"
    ,index: "employeeTable"
    ,credentials:{type:'basic', user:'elastic', password:'test'}
);

Analysis script at user directory calls it:

1
@[/system/employeeTable.sx] | select(name, id);