Elasticsearch

SpectX supports reading data directly from Elasticsearch database using ES command:

ES '('   uri::STRING
           [, index::STRING ]
           [, credentials '{' type::STRING, ... '}' ]
           [,_insecure_tls::BOOLEAN]
           [,_tasks_per_shard:INTEGER]
        ')'

where:

  • uri - ElasticSearch endpoint root uri.

  • index - index(es) to read from. Comma separated (no whitespaces), wildcards allowed. When omitted then ES outputs list of indexes.

  • _geopoint_equality_threshold - distance limit for two geopoints to be considered equal, default 1km. Optional.

  • _insecure_tls - whether to skip server certificate chain & host validation, default false. Optional.

  • _tasks_per_shard - specifies how many retrieval tasks to create per replica of each shard of an index, default 1. Optional.

  • credentials - authentication attributes for different schemes:

    • credentials:{type:'basic', user::STRING, password::STRING}
    • credentials:{type:'xpack', user::STRING, password::STRING}
    • credentials:{type:'token', token::STRING} - OAuth2 Bearer token obtained via Get token API
    • credentials:{type:'aws', accessKeyId::STRING, secretKey::STRING, region::STRING} - AWS IAM user credentials for using Elasticsearch Service
    • credentials:{type:'ec2'} - use when accessing AWS Elasticsearch Service from AWS EC2 role (credentials are retrieved from instance metadata)

When Elasticsearch is configured to accept anonymous commands then credentials can be omitted.

Example 1. Query from Elasticsearch instance running on localhost in anonymous mode, from index apache containing entries of Apache web server access log:

@src = ES(
    uri: "http://127.0.0.1:9200"
    ,index: "apache"
);

@src
 .select(timestamp, ip, uri, verb, response, bytes, referrer);

Search Predicate Delegation

When performing query against Elasticsearch using filter command (or in SQL style WHERE clause) the search predicates constructed using Comparison operators will be passed to Elasticsearch engine for execution. In another words Elasticsearch executes search using predicates and passes results back to SpectX.

Example 2. Retrieve all records from index apache where response size exceeds 300 bytes.

@src = ES(
    uri: "http://127.0.0.1:9200"
    ,index: "apache"
);

@src
 .filter(bytes > 300);

/* or alternatively using SQL style:
SELECT * FROM @src WHERE bytes > 300;
*/

Elasticsearch Query String Support

You can make Elasticsearch engine to apply Query String when retrieving data from Elasticsearch. Using ES_QUERY function in a filter command (or in SQL style WHERE clause) instructs SpectX query optimizer to pass specified query string to Elasticsearch engine for execution.

Example 3. Retrieve all records from index apache which contain words “login” or “logout” in the uri field.

@src = ES(
    uri: "http://127.0.0.1:9200"
    ,index: "apache"
);

@src
 .filter(ES_QUERY(uri, "login OR logout"));

/* or alternatively using SQL style:
SELECT * FROM @src WHERE ES_QUERY(uri, "login OR logout");
*/

Multivalue Fields

In Elasticsearch, there is no dedicated array type. Any field can contain zero or more values by default (https://www.elastic.co/guide/en/elasticsearch/reference/7.0/array.html). This means Elasticsearch treats internally the fields as arrays. For instance single value fields are actually represented as single member arrays.

Transforming fields always to SpectX ARRAY would make writing queries very uncomfortable considering that single values are predominant. Hence SpectX exposes each Elasticsearch index field as two SpectX fields:

  • single value field with concrete type, named after the corresponding ES field (for instance with type LONG). Contains always the first element of Elasticsearch field array.
  • hidden multivalue field as ARRAY of concrete type, e.g. ARRAY(LONG) named after the corresponding ES field with _ prepended and s appended, e.g. _customers for customer.

By default only single value type fields appear in resultset. Multivalue fields must be queried explicitly.

Example 4.

@src = ES(
    uri: "http://127.0.0.1:9200"
    ,index: "apache"
);

@src
 .select(ip, _ips);

Handling Large Responses

SpectX retrieves results from Elasticsearch by batches of size 512MB containing up to 10 000 documents. These default settings can be adjusted for optimizing quering documents with large sizes using following parameters:

  • _response_size_limit - maximum size of response from elasticsearch. Units: ‘G’ - gigabytes, ‘M’ - megabytes, ‘K’ - kilobytes. Default 512M.
  • _docs_per_batch - the number of documents to retrieve per request (batch) (default 10000).

Example 5.

@src = ES(
    uri: "http://127.0.0.1:9200"
    ,index: "apache"
    ,_response_size_limit:1G  // set response batch size to 1Gb
);

@src
 .select(*);