SpectX Base

Installation

SpectX requires Oracle’s Java Runtime Environment (JRE) 1.8 to be installed on the system. It is available for download from Oracle download site. Please note that using OpenJDK is not recommended, as it results in reduced performance. Please check your Java version first by running (and then installing/upgrading accordingly if needed):

$ java -version

Download SpectX and follow the installation steps below. SpectX can be run in standard user privileges.

  1. Using terminal unpack the tarball and change to the unpacked directory: /spectx/
$ tar -zxf spectx-v{version}.tar.gz
$ cd spectx

During the installation following directories are created:

./
└── spectx/
    ├── bin/
    │   ├── spectx-init.d.sh        : script for launching SpectX as service
    │   ├── spectx.sh               : script for launching SpectX from commandline
    │   ├── spectx.common.sh
    │   └── spectx.env.sh.default   : default env variable definitions template
    ├── conf/
    │   └── sx.conf.default         : default configuration template
    ├── lib/                        : contains SpectX binaries
    └── tools/                      : contains sxgzip and source agent utilities install packages

First Run

$ bin/spectx.sh

During first run few additional directories and files will be created:

./
└── spectx/
    ├── bin/
    │   ├── ...
    │   └── spectx.env.sh   : copied from spectx.env.sh.default template. Contains environment variable definitions.
    ├── conf/
    │   ├── ...
    │   └── sx.conf         : copied from sx.conf.default template. Contains actually used SpectX configuration values.
    ├── ...
    ├── data/               : contains SpectX user data (resource tree)
    ├── pudata/             : contains SpectX user data (resource tree)
    └── sxwgui.db           : SQLite database containing SpectX user role definitions

All needed configuration customizations or as how to launch SpectX should be made to spectx.env.sh and sx.conf. They will not be overwritten or deleted during future upgrades.

NB! Do not modify the name of SpectX configuration file. SpectX searches for configuration file only with with the name sx.conf.

Upgrading

Download SpectX installation package from https://www.spectx.com/#signup. Follow the installation instructions to unpack the tarball. You can safely unpack to installation directory, actual configuration file and launch scripts created during first run as well as all other custom files will be preserved.

Startup scripts

SpectX server process can be started manually from command line by run script:

$ ./bin/spectx.sh [ARGS]

where optional arguments ARGS are:

-h,--help            displays this help and exits
-t,--testconf        validates configuration, prints current conf to stdout and exits
-v,--verbose         verbosity level of debug logging, use multiple times to increase the level
-q,--quiet           disables logging to standard output

Alternatively, you can start SpectX as daemon by executing init.d script:

$ bin/spectx-init.d.sh ARGS

where arguments ARGS are:

  • start - to start server process
  • stop - to stop running server process
  • restart - to stop and start server process
  • status - to check if server process is running
  • configtest - to check if current configuration file syntax is valid. Prints current config content to stdout.

The logging directory path for standard output and error can be set in the init.d script configuration section:

# config section

readonly SPECTX_STD_LOG_DIR="${SPECTX_HOME}"/logs
readonly SPECTX_PID_DIR="${SPECTX_HOME}"/bin

# end of config section

Both startup scripts use following environment variables:

  • JAVA_HOME - specifies path to JRE/JDK 1.8 home directory. Use for local Java installations (download and extract Java to the folder of your choice, point JAVA_HOME to it and be done). If not set the system default installation locations are used.
  • JAVA_OPTS - specifies a space-separated list of java system property-value pairs (each in form -Dproperty=value).
  • SPECTX_HOME - specifies path to SpectX home directory (writable directory containing /lib and /conf subdirectories with respective contents). If not set explicitly then directory containing execution script will be used as SPECTX_HOME.

These variables are set in include script bin/spectx.env.sh.

The init.d script uses SPECTX_LAUNCHER_ARGS environment variable which is expected to contain additional command-line arguments for the SpectX. The variable is also declared in bin/spectx.env.sh with empty value. You can modify that to change verbosity level of SpectX debug logs.

Return values:

0 successful
1 incorrect script arguments
2 pid file not found, assuming server process is not running
3 server process failed starting
4 server process is dead but pid file exists
5 java executable cannot be found
6 pid file cannot be written
7 server process stopped but pid file cannot be deleted
8 jar cannot be found or is not readable
9 conf file cannot be found or is not readable
10 log dir cannot be found or is not writable
127 error in processing file paths

any other value indicates java server process failure.

Configuration

Configuration items in sx.conf follow Java properties format. Values specified in config file override default values used by the server. Any change in the configuration file requires server restart for the change to take effect.

Local File System access

  • engine.fs_access - enables or disables file system access using file:// protocol. When enabled, then local file system access is dictated by engine.fs_unmanaged_access setting. When disabled then file system access from SpectX is completely disabled. Default disabled.
  • engine.fs_unmanaged_access - enables or disables unmanaged file system access using file:// protocol. When enabled then all SpectX users can use file:// protocol to access local file system within the rights of the local machine user, under which SpectX is executed. When disabled then file system can be accessed only by datastores defined in /system/datastores. (Note that defining datastores in /system/datastores is allowed only for users with admin role.) Default disabled.

Limiting usage of CPU cores

  • engine.pu_count - integer value sets the max number of CPU cores SpectX can use for processing queries. Max value can not exceed the number of real CPU cores in the machine and the number of allowed CPU cores by SpectX Base license. Default value 0 instructs SpectX to use max number of CPU cores (either of real or allowed by license).

Directories

  • sx.user_data.dir - specifies name and location of resource data head directory. Default: ${SPECTX_HOME}/data
  • sx.pu_data.dir - specifies name and location of processing data head directory. Default: ${SPECTX_HOME}/pudata
  • sx.pu_data.temp.dir - specifies name and location of directory for temporary data. Default: ${SPECTX_HOME}/pudata/temp
  • sx.pu_data.store.dir - specifies name and location of directory for persisted data. Default: ${SPECTX_HOME}/pudata/store
  • sx.pu_data.inetdb.dir - specifies name and location of directory for temporary geoip data. Default: ${SPECTX_HOME}/pudata/inetdb
  • sx.pu_data.cache.dir - specifies name and location of directory for caching source data. Default: ${SPECTX_HOME}/pudata/cache
  • sx.db.dir - specifies name and location where swgui.db is kept. Default: ${SPECTX_HOME}
  • sx.pu_data.cache.enabled - enables or disables source data caching. Default enabled.
  • sx.pu_data.cache.max_size - specifies max disk space allocated for source data caching. Units: ‘G’ - gigabytes, ‘M’ - megabytes, ‘K’ - kilobytes. Default value: 64G.

Web GUI server parameters

  • wgui.host - specifies hostname or ip-address of interface where web GUI server is listening. Default: 127.0.0.1
  • wgui.port - specifies listening port. Default: 8388
  • wgui.maxReqHeaderSize - maximum size of a request header in bytes. Read: not individual HTTP header line, but whole request header containing all header lines. Default value is 8192 bytes.

  • wgui.dataBrowser.preview_size - specifies the amount of bytes fetched for file preview in Data Browser. Default: 16Kb
  • wgui.dataBrowser.max_items_to_fetch - specifies the max number of items for listing in Data Browser. When there are more items then warning is displayed. Default: 4000
  • wgui.dataBrowser.download.enabled - specifies if download of a blob is enabled in Data Browser. Default: true
  • wgui.remoteIPAddressHeader - specifies name of a HTTP header containing clients remote IP address. The header name is case insensitive. Use this only when SpectX runs behind a trusted frontend server which is configured to forward real client remote address to backend.

  • wgui.log.dir - path to existing writable directory to write server logs to. If not specified then filesystem logging is disabled.

  • wgui.log.rotate - boolean parameter enabling automatic daily log rotation in the log directory. Default value is true.

  • wgui.log.tz - time zone ID (as defined in IANA Time Zone Database) to be used for creating log file names (when wgui.log.rotate = true) and timestamps in log file records. Default value is “UTC”.

Decompression

  • Parallel processing

    • query.parse.chunkSizeCompressed - size of compressed data split to be used by each parallel decompression task. Default is 16000000 (16 Mb).
    • query.parse.<parallel_compression_type>.ptSize - size of a rolling plaintext buffer used for decompression of chunks of blobs compressed with correponding utility (specified by <parallel_compression_type>, which is one of sxgz, bzip2, pbzip2, pigz, pizz). Default is 64000000 (64 Mb).
    • query.parse.sxgz.blockSize - value of –blocksize (-b) parameter of sxgzip compression utility. The value is in bytes (see sxgzip help for details). If you have used different block sizes then the largest one should be used.
    • query.parse.pigz.blockSize - value of –blocksize (-b) parameter of pigz compression utility. The value is in KiB (see pigz help for details). If you have used different block sizes then the largest one should be used.
    • query.parse.pizz.blockSize - value of –blocksize (-b) parameter of pigz compression utility when used with -z parameter. The value is in KiB (see pigz help for details). If you have used different block sizes then the largest one should be used.
    • query.parse.bzip2.blockSize - value of block size parameter -1 ... -9 of bzip2 or lbzip2 utilities. Allowed values are 1 ... 9 corresponding to 100 Kb to 900 Kb block size (see respective utility help for details). If you have used different values then the largest value should be used.
    • query.parse.pbzip2.blockSize - value of -b (block size) parameter of pbzip2 compression utility. The value is positive integer representing multiplier of 100 Kb block size (see pbzip2 help for details). If you have used different values then the largest value should be used.
  • Single-thread (non-parallel) processing of compressed blobs with defined size. In below configuration key names, <compression_type> is one of gz, bz2, lz4, xz, zz

    • query.parse.<compression_type>.readLen - size of a rolling buffer used for fetching compressed content of blobs compressed with corresponding method. Default is 16000000 (16 Mb).
    • query.parse.<compression_type>.ptSize - size of a rolling plaintext buffer used for decompression of blobs compressed with corresponding method. Default is 64000000 (64 Mb).
  • Processing of compressed blobs with undefined size, which eventually is single-threaded/non-parallel. In below configuration key names, <ulen_compression_type> is one of gz, bz2, lz4, xz, zz

    • query.parse.ulen.<ulen_compression_type>.readLen - size of a rolling buffer used for fetching compressed content of blobs compressed with corresponding method. Default is 16000000 (16 Mb).
    • query.parse.ulen.<ulen_compression_type>.ptSize - size of a rolling plaintext buffer used for decompression of blobs compressed with corresponding method. Default is 64000000 (64 Mb).
  • Batch processing of compressed blobs. In below configuration key names, <batch_compression_type> is one of gz, bz2, lz4, xz, zz

    • query.parse.batch.<batch_compression_type>.ptSize - size of a rolling plaintext buffer used for decompression of blobs compressed with corresponding method. Default is 2000000 (2 Mb).
    • query.parse.batch.<batch_compression_type>.maxSize - max size of a compressed batch, which is calculated as sum of sizes of blobs. Default is 16000000 (16 Mb).
    • query.parse.batch.<batch_compression_type>.maxBlobCount - max count of blobs in compressed batch of given type. Default is 32.

Plaintext

  • Parallel processing

    • query.parse.chunkSize - size of data split to be used by each parallel processing task. Default is 64000000 (64 Mb).
  • Processing of blobs with undefined size, which eventually is single-threaded/non-parallel

    • query.parse.ulen.pt.readLen - size of a rolling buffer used for fetching blob content. Default is 64000000 (16 Mb).
  • Batch processing

    • query.parse.batch.pt.maxSize - max size of a batch, which is calculated as sum of sizes of plaintext blobs. Default is 64000000 (64 Mb).
    • query.parse.batch.pt.maxBlobCount - max count of plaintext blobs in batch. Default is 32.

Geoip, ASN, MAC databases

For performing geoip, ASN and MAC manufacturer information lookups SpectX needs respective databases. Following configuration items allow to set up downloading and updating them in different environments. For example if the host has direct access to Internet the databases can be updated directly from suppliers websites. In case of closed environments the update location can be set to local filesystem therefore leaving control over updating process entirely to customer.

  • inetdb.geoip.resourceUrl - a http/https url or local filesystem path specifying MaxMind geoip database update location. Default value is: http://geolite.maxmind.com/download/geoip/database/GeoLite2-City-CSV.zip
  • inetdb.asn.resourceUrl - a http/https url or local filesystem path specifying the MaxMind ASN IPv4 database update location. Default value is: http://download.maxmind.com/download/geoip/database/asnum/GeoIPASNum2.zip
  • inetdb.asnv6.resourceUrl - a http/https url or local filesystem path specifying the MaxMind ASN IPv6 database update location. Default value is: http://download.maxmind.com/download/geoip/database/asnum/GeoIPASNum2v6.zip
  • inetdb.macmanuf.resourceUrl - a http/https url or local filesystem path specifying the MAC manufactures database update location. Default value is: http://update.spectx.com/mac_manuf/mac_manuf.tsv.gz
  • inetdb.*.updateInterval - specifies interval SpectX looks for updates of respective databases. The value is in following time period units: ms, ses, min, hour, day, week. Default value: 1 day.

User Authentication

SpectX implements different HTTP endpoints for end user WebGUI and API. They also implement different authentication methods. This section discusses user authentication methods, details of API authentication can be found in section SpectX API Authentication.

Default mode of user authentication is local password based authentication. This requires no configuration. Four additional authentication methods are supported: Integrated Windows Authentication, SAML SSO, Google OAuth and pass-through authentication.

Integrated Windows Authentication

SpectX supports only Kerberos provider in SPNEGO negotiation scheme of Integrated Windows Authentication in Active Directory domain.

When enabled then Integrated Windows Authentication is displayed as alternative login method at SpectX login screen. Choosing this method starts current Windows user authentication through a cryptographic exchange with SpectX server. The SpectX machine does not have to be part of the Windows domain.

To enable Integrated Windows Authentication authentication in SpectX, you have to specify values for the three following configuration parameters:

  • wgui.spnegoAuth.realm - the Kerberos realm (the domain name in the Active Directory). Although the value is case-insensitive, it is always converted to upper case internally.

  • wgui.spnegoAuth.keytab - path to Kerberos keytab file containing principal credentials of SpectX service account in Active Directory. Refer to service account creation instructions for acquiring the file.

  • wgui.spnegoAuth.redirectUri - The fully qualified URI of your SpectX WGUI instance or its frontend server. The hostname in the URI must match with one used in service principal name set for the SpectX service account in Active Directory.

Other optional authentication-related parameters are:

  • wgui.spnegoAuth.name - the name of authentication scheme displayed on Login screen for this type of authentication (default is “with AD account”)

  • wgui.spnegoAuth.autoCreateAccount - boolean setting enabling automatic creation of user accounts in SpectX user database when they first log into SpectX with given authentication method. This feature is off by default

  • wgui.spnegoAuth.autoCreateApiKey - boolean setting enabling automatic creation of SpectX API key for user accounts which get created automatically when they first log into SpectX with given authentication method. The setting is ignored if wgui.spnegoAuth.autoCreateAccount is not set to true. The default value for this setting is false.

When automatic user account creation is configured, each domain user can potentially log in to SpectX. To facilitate additional access control to SpectX you can set up an Active Directory group. Users entitled to access SpectX must be assigned membership to this group.

There are two approaches SpectX can use to verify authenticated user’s group membership.

First is by matching a list of configured Active Directory group SIDs against Active Directory Group membership information transported in Privilege Attribute Certificate (PAC) field of the Kerberos packet sent by user’s browser during authentication.

Second is by connecting to specified Active Directory LDAP service (using credentials from the keytab file) and executing queries with specified filters.

Both methods are optional and are disabled by default. If both are enabled, then the second one is applied only if the first one fails to identify group membership for any reason.

To enable group membership verification using PAC field of a Kerberos packet, you have to specify the following configuration parameter:

  • wgui.spnegoAuth.groupSids - comma-separated list of security identifiers (SID ) for Active Directory groups the authenticated users must be members of to get access to SpectX. To identify target group’s SID by its name you may execute the following command on a Windows domain machine

    dsquery group -name "<group_name>" | dsget group -sid
    

To enable group membership verification using LDAP queries the following configuration parameters must be appropriately specified:

  • wgui.spnegoAuth.ldap.url - the LDAP URL of the form ldap[s]://[host][:port][/dn]. The URL protocol may be ldap for plain connections or ldaps for TLS connections, host must specify fully qualified domain name (not IP address) of domain controller (default value is localhost), port must specify LDAP/LDAPS listening port on the domain controller (defaults are 389 for ldap and 689 for ldaps). The dn specifies root distinguished name of the base object of the LDAP search (default is empty); note that searches with empty base usually result in errors in Active Directory, with exceptions to searches in global catalog (ports 3268/3269) which allows such queries as a rule. If any entry in dn contains spaces, these need to be %-encoded (replaced with %20).

    If DNS service configured for the platform the SpectX is running on is set up properly and provides information on LDAP servers in the network, then the specified LDAP URL can only contain protocol and dn part, and ‘DC’ components of the latter will be used to obtain the DNS name for the LDAP server through service discovery. For example, given the URL ldaps:///OU=Users,DC=int,DC=company,DC=com, the DNS name ‘dc.int.company.net’ will be obtained by locating DNS SRV records for _ldap._tcp.int.company.com.

    Multiple LDAP servers may be specified by setting the value for the wgui.spnegoAuth.ldap.url configuration parameter to a space-separated list of URLs. While initial connect, each of the servers will be contacted in turn until one of them will respond and successful connection will be established.

  • wgui.spnegoAuth.ldap.filter - the LDAP filter expression to use for the search. The expression may contain any of variables {0} or {1}, each will be substituted with username part of user principal name of the authenticated user, and with user principal name respectively. Thus, the first one can be used for searches with sAMAccountName, and second one - with userPrincipalName Active Directory schema attributes. The user is considered successfully authorized when returned LDAP resultset for the query contains at least one result.

Other optional LDAP connection related parameters are:

  • wgui.spnegoAuth.krb5Conf - path to Kerberos configuration file krb5.conf. If this parameter is set its value becomes system-wide pointer to Kerberos configuration, and thus affects other components of SpectX requiring Kerberos for authentication (like HDFS data sources).

  • wgui.spnegoAuth.kdc - the name or IP address of a host running a KDC for the realm (Active Directory Domain Controller). An optional port number, separated from the hostname by a colon, may be included. If the name or address contains colons (for example, if it is an IPv6 address), enclose it in square brackets to distinguish the colon from a port separator. If value for this parameter is specified, it overrides KDC value specified in krb5.conf (if such a file is found) for given Kerberos realm. If the value for this parameter is not specified, it is more than probable you would need correct krb5.conf for things to work properly. For instance, if you are going to use HDFS data sources, you would need to have Kerberos configuration file for that, thus you have to leave wgui.spnegoAuth.kdc unset and specify KDC for the Windows Kerberos realm in krb5.conf.

  • wgui.spnegoAuth.ldap.connect.timeout - the timeout for connecting to the server. Default value is 30s

  • wgui.spnegoAuth.ldap.read.timeout - the timeout for reading response from server. Default value is 1min.

Refer to Integrated Windows Authentication configuration guide for description of steps be performed for getting required configuration options values.

Example SpectX configuration excerpt, given the service principal HTTP/spectx.int.company.net@INT.COMPANY.COM has been created for SpectX service account in Active directory in domain int.company.com and the SpectX instance is addressable by URL http://spectx.int.company.net:8388/, with membership verification in group CN=SpectX Users,OU=Groups,DC=int,DC=company,DC=com having SID S-1-5-21-2456665656-1224920878-1826986580-1111:

...
    # users authentication
    wgui.spnegoAuth.realm=int.company.com
    wgui.spnegoAuth.keytab=${SPECTX_HOME}/conf/spectx.keytab
    wgui.spnegoAuth.redirectUri=http://spectx.int.company.net:8388/
    wgui.spnegoAuth.autoCreateAccount=true
    # users group membership check in Kerberos packet
    wgui.spnegoAuth.groupSids=S-1-5-21-2456665656-1224920878-1826986580-1111
    # users group membership check in AD, gets executed if the above fails
    wgui.spnegoAuth.kdc=dc.int.company.net
    wgui.spnegoAuth.ldap.url=ldap://dc.int.company.net/OU=Users,DC=int,DC=company,DC=com
    wgui.spnegoAuth.ldap.filter=(&(objectClass=user)(sAMAccountName={0})(memberOf=CN=SpectX Users,OU=Groups,DC=int,DC=company,DC=com))
...

Once the configuration is set, restart the SpectX server and you’re done.

For login attempts to be successful, users identities (the Active Directory user principal names in lower case) must be registered in the SpectX user database. This is needed for determining the user rights upon successful authentication. If SpectX does not find matching user in its user database then the user interaction flow falls back to default authentication scheme via login screen. So if automatic user account creation is not enabled, remember to register Active Directory user principal names in SpectX user database (see Managing Users and Groups).

SAML SSO

When enabled then SAML SSO is displayed as alternative login method at SpectX login screen. Choosing this method starts service provider-initiated SAML 2.0 single sign-on flow by issuing an explicit authentication request to the identity provider (e.g. ADFS), and finally asserting authentication response. User identity (SAML NameID attribute from authentication response) must be registered as SpectX user to determine the user rights upon successful authentication. If SpectX does not find matching user in its user database and automatic creation of user accounts is disabled then the user interaction flow falls back to default authentication scheme via login screen.

SpectX can be configured to represent multiple different SAML service providers to support environments with more than one SAML identity providers.

SpectX does not expose HTTPS interface at the moment, so should identity provider(s) in target environment require HTTPS connectivity when calling up to service provider, the SpectX instance should be put behind an SSL/TLS terminating frontend server (e.g. apache, nginx).

To enable SAML SSO authentication in SpectX, you have to first specify comma-separated list of string identifiers for SAML service provider entities the SpectX must represent in configuration parameter wgui.samlAuth.ids. For environments requiring only one service provider instance, the list contains only one identifier. Syntactically, each identifier is allowed to contain only digits, latin letters, dot and dash characters. These identifiers are used for:

  • distinguishing between enabled SAML service providers, and composing their assertion and metadata URLs
  • composing SpectX user name. If the NameID value of the SAML authentication response processed by corresponding service provider does not contain ‘@’ symbol, the value of the identifier is appended to the NameID value.

For each identifier <id>, the following configuration parameters must be specified to enable corresponding SAML service provider interface:

  • wgui.samlAuth.<id>.spEntityId - the globally unique SAML SP EntityId (Relying party trust identifier in ADFS). SAML specification recommends it to be a SP URL containing its own domain name to identify itself. You may specify an URL of the SpectX WGUI and append the value of identifier to it.

  • wgui.samlAuth.<id>.idpSsoServiceUrl - URL of SAML identity provider single sign-on service. For ADFS, it is typically https://<adfs-host>/adfs/ls/.

  • wgui.samlAuth.<id>.idpTokenSigningCertFile - Path to file with identity provider’s token signing certificate. Both DER and BER formats are supported.

  • wgui.samlAuth.<id>.redirectUri - The fully qualified URI of your SpectX instance or its frontend server.

Other optional parameters are:

  • wgui.samlAuth.<id>.name - the label of authentication scheme displayed on Login screen for this type of authentication (default is “as <id> user”)

  • wgui.samlAuth.<id>.autoCreateAccount - boolean setting enabling automatic creation of user accounts in SpectX user database when they first log into SpectX with given authentication method. This feature is off by default. If it is on, then user’s full name is obtained from the following user metadata attributes (“claims”) of a SAML authentication response:

    • either Name attribute, if present
    • or by combining values of givenName and surName attributes, if present.
  • wgui.samlAuth.<id>.autoCreateApiKey - boolean setting enabling automatic creation of SpectX API key for user accounts which get created automatically when they first log into SpectX with given authentication method. The setting is ignored if wgui.samlAuth.<id>.autoCreateAccount is not set to true. The default value for this setting is false.

When automatic user account creation is configured, each user successfully authenticated by identity provider can potentially log in to SpectX. To facilitate additional access control to SpectX you can configure list of group names which will be matched against values of Group metadata attribute (“claim”) of a SAML authentication response. If at least one match is found, the user will be granted access to SpectX, and will be denied otherwise.

To enable claim group matching, you have to specify the following configuration parameter:

  • wgui.samlAuth.<id>.allowedGroups - comma-separated list of group names to be matched against Group metadata attribute values of SAML authentication response. Note that each separate name cannot contain commas, and any backslash (‘\‘) symbol in it should be escaped by another backslash. Matching is case-insensitive.

Example SpectX configuration excerpt, given the company has 2 ADFS instances in separate regions, each authenticating its own users which access SpectX’s frontend server at URL https://spectx.int.company.net/, and get access to it only if they are members of “SpectX Users” group in own domain:

...
    wgui.samlAuth.ids=Asia,Europe

    wgui.samlAuth.Asia.spEntityId=https://spectx.int.company.net/Asia
    wgui.samlAuth.Asia.redirectUri=https://spectx.int.company.net/
    wgui.samlAuth.Asia.idpSsoServiceUrl=https://dc.asia.int.company.net/adfs/ls/
    wgui.samlAuth.Asia.idpTokenSigningCertFile=${SPECTX_HOME}/conf/dc.asia.int.company.net.signing.cer
    wgui.samlAuth.Asia.allowedGroups=asia\\SpectX Users

    wgui.samlAuth.Europe.spEntityId=https://spectx.int.company.net/Europe
    wgui.samlAuth.Europe.redirectUri=https://spectx.int.company.net/
    wgui.samlAuth.Europe.idpSsoServiceUrl=https://dc.europe.int.company.net/adfs/ls/
    wgui.samlAuth.Europe.idpTokenSigningCertFile=${SPECTX_HOME}/conf/dc.europe.int.company.net.signing.cer
    wgui.samlAuth.Europe.allowedGroups=europe\\SpectX Users

Once the configuration is set, restart the SpectX server and you’re done. Now you have to register configured service providers at identity provider side. For that, the following two parameters will be required to know for each provider:

  • SAML Assertion Consumer Endpoint URL. Technically, it is composed by SpectX for each provider as follows:

    <redirectUri>/AUTH/v1.0/saml/login/<id>.

    This can be obtained by finding corresponding “SAML Auth#<id> spAssertionUrl” in the SpectX StatusPage.

  • service provider EntityId, which is the value specified for wgui.samlAuth.<id>.spEntityId parameter in configuration.

Alternatively, these settings can be imported into identity provider configuration using metadata URL, which is composed as

<redirectUri>/AUTH/v1.0/saml/metadata/<id>.

Another method to get it is to find entry “SAML Auth#<id> spMetadataUrl” in a page under StatusPage menu in SpectX WGUI when logged in with administrator privileges.

Google OAuth

When enabled then Google OAuth is displayed as alternative login method at SpectX login screen. Choosing this method starts OAuth 2.0 flow in the the same login screen. User identity (the email address) must be registered as SpectX user to determine the user rights upon successful authentication. If SpectX does not find matching user in its user database and automatic creation of user accounts is disabled then the user interaction flow falls back to default authentication scheme via login screen.

To enable Google OAuth authentication, you need to obtain OAuth 2.0 web application credentials from Google. For that, log in to Google API Console :

  1. create new project “SpectX” by selecting “Create a new project” from the project drop-down
  2. create new OAuth 2.0 credentials:
    • Select Credentials on the sidebar, then select the “OAuth consent” screen tab. Choose an Email Address, specify Product name, and press Save.
    • Navigate to “Create credentials” tab and select “OAuth client ID” from “Create credentials” dropdown list.
    • Under “Application type”, select “Web application”, then specify the name of your OAuth client ID.
    • Set the fully qualified URI of your SpectX instance or its frontend server in “Authorised redirect URIs” (e.g. http://spectx.domain.com:8388/). Specify the same URI also for wgui.googleOAuth.redirectUri configuration parameter.
    • Press “Create”.
  3. register generated OAuth credentials in $SX_HOME/sxConf.json by setting values for client ID and client secret from the “OAuth client” popup as values for wgui.googleOAuth.clientId and wgui.googleOAuth.clientSecret parameters correspondingly. If your users are members of Google-hosted domains, to facilitate additional access control to SpectX you can specify a comma-separated list of corresponding domain names as a value for wgui.googleOAuth.hostedDomains parameter.

Example SpectX configuration excerpt:

...
    wgui.googleOAuth.clientId=123456789023-xcafebebeabcdefghklmnopqrstuxwz.apps.googleusercontent.com
    wgui.googleOAuth.clientSecret=GoOGlEaUtHeNtIcAtIoNcLiEnTsEcReT
    wgui.googleOAuth.redirectUri=http://spectx.domain.com:8388/
    wgui.googleOAuth.hostedDomains=my-company.com
...

Restart the server and you’re done. Remember to register Google user identities in SpectX user database (see Managing Users and Groups). Alternatively, you could have enabled automatic user account creation using the following configuration parameters:

  • wgui.googleOAuth.autoCreateAccount - boolean setting enabling automatic creation of user accounts in SpectX user database when they first log into SpectX with given authentication method. This feature is off by default

  • wgui.googleOAuth.autoCreateApiKey - boolean setting enabling automatic creation of SpectX API key for user accounts which get created automatically when they first log into SpectX with given authentication method. The setting is ignored if wgui.googleOAuth.autoCreateAccount is not set to true. The default value for this setting is false.

Pass-through authentication

Pass-through authentication mechanism is used by setups where front-end servers are performing end user authentication. It makes use of HTTP headers to pass authenticated user identity to SpectX. The user identity must be registered with SpectX to determine the user rights.

If SpectX does not find matching user in its user database and automatic creation of user accounts is disabled then the user interaction flow falls back to default authentication scheme via login screen.

To enable pass-through authentication on SpectX side, specify the name of the HTTP header of user identity as a value of wgui.passThroughAuth.usernameHeader parameter (the header name is case insensitive).

Also make sure that:

  • the SpectX WGUI is accessible at a network level only by the frontend server which performs authentication
  • the frontend server is configured to forward remote client’s IP address to SpectX WGUI in custom header, name of which is set as a value for wgui.remoteIPAddressHeader parameter in SpectX configuration
  • the frontend server must be configured to disallow passing through the header of authenticated user identity in incoming requests.

Example: enabling pass-through of personal identification code extracted from Estonian ID card certificates. Nginx is used as authentication proxy server to SpectX instance running at http://127.0.0.1:8388/. The name of authenticated user identity HTTP header is X-Username. Note that API requests are configured to be passed through as they use separate authentication method.

Nginx configuration:

...
http {
    upstream SpectX {
        server 127.0.0.1:8388;
        # ...
    }
    # get serial
    map $ssl_client_s_dn $ssl_client_s_dn_serial {
       default "";
       ~/serialNumber=(?<serialNumber>[^/]+) $serialNumber;
    }
    # get CN
    map $ssl_client_s_dn $ssl_client_s_dn_cn {
        default "";
        ~/CN=(?<CN>[^/]+) $CN;
    }
    server {
        listen 443 ssl;
        # ...
        ssl_verify_client optional;
        ssl_verify_depth 2;

        location / {
                if ($ssl_client_verify != SUCCESS) {
                        return 403;
                }
                proxy_set_header X-Real-IP $remote_addr;
                proxy_set_header X-Username $ssl_client_s_dn_serial;
                # below row can be enabled if wgui.passThroughAuth.autoCreateAccount is set to true
                # proxy_set_header X-Fullname $ssl_client_s_dn_cn;
                proxy_pass http://SpectX;
                # ...
        }
        location /API/ {
                proxy_set_header X-Real-IP $remote_addr;
                proxy_pass http://SpectX/API/;  #pass through API requests as they have separate authentication
                # ...
        }
   }
}
...

SpectX configuration:

...
    wgui.remoteIPAddressHeader=X-Real-IP
    wgui.passThroughAuth.usernameHeader=X-Username
    wgui.passThroughAuth.fullnameHeader=X-Fullname
...

Remember to register user identities in SpectX user database (see Managing Users and Groups). Alternatively, you could have enabled automatic user account creation using the following configuration parameters:

  • wgui.passThroughAuth.autoCreateAccount - boolean setting enabling automatic creation of user accounts in SpectX user database when they first log into SpectX with given authentication method. This feature is off by default

  • wgui.passThroughAuth.autoCreateApiKey - boolean setting enabling automatic creation of SpectX API key for user accounts which get created automatically when they first log into SpectX with given authentication method. The setting is ignored if wgui.passThroughAuth.autoCreateAccount is not set to true. The default value for this setting is false.

  • wgui.passThroughAuth.fullnameHeader - optional parameter specifying a name of a HTTP header containing users’s full name. The value of the header, if it is present, will be used when creating an account in SpectX database.

SpectX API Authentication

SpectX API implements separate authentication mechanism based on shared secrets. Each SpectX user can be assigned to API Access Key, which must be passed from API client to SpectX in HTTP header api_access_key. See more about using API in Using API.

The API Access keys can be generated and assigned in Admin - Users view by users with admin role.

Logging

SpectX produces logs of the following types:

  • audit log - login, logout, password change, account modification events
  • query execution log - query execution details
  • query execution error log - contains failed/cancelled query execution events with stack traces
  • debug log - containing details of query processing for debugging purposes.

Record format

Audit log contains new line-separated records with the following tab-separated fields (field value length is restricted to not exceed 1000 chars):

  • timestamp in format YYYY-MM-dd HH:mm:ss.SSS Z
  • log_type optional field containing string value audit. Is present only when log destination is set to stdout
  • user’s IP address
  • session ID
  • action name (login/logout/passwordChange etc)
  • username of a user performing the action
  • outcome (ok/failure)
  • authentication type
  • optional descriptive message.

Query execution log contains new line-separated records with the following tab-separated fields (field value length is restricted to not exceed 1000 chars):

  • timestamp in format YYYY-MM-dd HH:mm:ss.SSS Z
  • log_type optional field containing string value execution. Is present only when log destination is set to stdout
  • user’s IP address
  • session ID
  • query ID
  • action name (submit, schedule, exec etc)
  • outcome (ok/cancelled/failure)
  • username of a user performing the action
  • JSON with payload depending on the action (executed script’s path and base64-encoded script content for submit, stats info for exec)
  • descriptive message, if any.

Execution error log record has the same format as execution audit log record with one additional field, which contains the error’s stack trace which spans multiple lines. This type of logging is performed only for unsuccessful query execution events.

Debug log contains new line-separated records with following tab-separated fields:

  • timestamp in format YYYY-MM-dd HH:mm:ss.SSS Z
  • log record’s log level indicator
  • thread name
  • logger name (java class name)
  • log message (can expand over multiple lines).

Destination

Unless the -q command-line option switch is specified, the server prints all log messages to standard output. In this case the log records can be distinguished by additional log_type field inserted after the timestamp. The field contains values: audit/execution/execution_error. Note that debug log messages do not have log_type field.

In order to enable logging to files, you must specify valid directory path to logging directory in configuration file using wgui.log.dir option. The server then produces daily-rotated log files under that directory, each being put under monthly-rotated directory, which in turn is located in yearly-rotated parent directory:

logs/
└── YYYY/
    ├── MM/
    │   ├── YYYY.MM.DD.debug.log
    │   ├── YYYY.MM.DD.audit.log
    │   ├── YYYY.MM.DD.execution.log
    │   ├── YYYY.MM.DD.execution_error.log
    │   └── ...
    └── ...

If value for wgui.log.rotate parameter is set explicitly to false, the layout of the log directory will be flat, and names of produced log files will not contain timestamps:

logs/
├── debug.log
├── audit.log
├── execution.log
└── execution_error.log

The rotation of log files then can be accomplished by means of external tools (e.g. logrotate) supporting copy-and-truncate log rotation scenarios.

Timestamps in log records printed to stdout are in system default time zone, however timestamps in log file records and log file names are in time zone specified by wgui.log.tz in configuration.

Note that if the default log configuration gets overridden by any external means, the -q command line argument gets unsupported, as well as configuration options wgui.log.dir, wgui.log.rotate and wgui.log.tz.

Verbosity

The verbosity of generic logging is controlled by -v command line switch given to server binary upon start up. If it is not specified, then the active log level is set to WARN by default, and can only be increased in terms of verbosity by means of manipulation with -v switch, as follows:

  • -v sets log level to INFO
  • -vv sets log level to DEBUG
  • -vvv sets log level to TRACE

Note that -v has effect only on debug logging. Audit, query execution and error logging takes place with built-in verbosity.

Relational Database Connectivity

All required JDBC driver executables (.jar) must be installed to SPECTX_HOME/lib directory. SQLite database connectivity driver is included in default installation.

The drivers are loaded according to the list in configuration entry engine.db_table.allowed_jdbc_drivers (driver names are separated by colon symbols):

engine.db_table.allowed_jdbc_drivers=oracle.jdbc.OracleDriver:org.postgresql.Driver

Drivers are loaded at SpectX startup, therefore when adding a new driver restart is needed.

Managing Roles

SpectX Base implements two roles with regards to user rights: Admin and User. These differ in following aspects:

  1. Admin role has full access (read, write, delete, execute) in the System folder of SpectX resources, whereas User role has only execute access. This allows to create Admin managed datastores and views: User level cannot see or change the content (credentials or change data uri’s).
  2. Admin role has full access to every User level personal folders.
  3. Admin role has access to User Admin view (and hence can change users properties).

In larger organizations the roles of managing source data (collection/storage) and writing analytics queries may be separated. In the case when analysts should not have direct access to data then they should be assigned User roles and respective datastores should be placed into the System folder.

Furthermore, when you want to isolate the roles of data structure definition and analytics then defined patterns should be placed into the System folder. Use the views to accomplish both data access and structure isolation.

Managing Users and Groups

Users can be added, deleted and their properties modified using “Admin - Users & Groups” view. This is enabled only for admin roles.

Following properties of user can be set:

Property Description
Username The username of a user. Read-only.
Full name Full name of a user.
Timezone Determines the timezone of displayed TIMESTAMP type fields in query results. Default is operating system set timezone.
Password Login credentials of SpectX local authentication.
API Access key Authentication token for using SpectX API

Users can also be added to a group. These can either be defined locally (in the Groups tab) or mapped to names defined by other identity management systems (see sx_user_auth_iwa or sx_user_auth_saml authentication schemes above).

Access Control

SpectX allows to customise access rights both to files in SpectX Resource tree (query scripts, datastores, etc) and also the data within the datastores.

Resource Tree

Access to files in SpectX resource tree can be customised using access control list (ACL). ACL is defined only on folders - all files within will inherit the permissions from its parent folder [1]. The ACL and effective permissions can be examined and changed via folder Properties (left click on the file or directory in the resource tree and select Properties).

The list contains records, defining permitted operations for the actors (users, groups or folders):

{'user' | 'group' | 'execPath'} ':' name ':' permissions

where
    name - is the name of user, group or folder in resource tree. Single * can be used as wildcard for all combinations.
    permissions - is the list of permissions or decimal value

The actor user refers to username of an end user. Group refers to group name defined locally or group name mapped from externally supplied groups (via sx_user_auth_iwa or sx_user_auth_saml authentication schemes). Group named $admin refers to all users with Administrator role.

The execPath defines a folder (in the resource tree), from where the executing scripts can access the resources in the folder according to defined permissions.

Following permissions can be set:

Permission hex value Description
l 0x01
List - allows access to metadata of directory contained files: filename, size, last_modified.
Allows displaying directory contained files in the resource tree.
x 0x02 Execute - allows execution of the files in a directory
r 0x04 Read - allows reading the content of files in a directory
w 0x08 Write - allows modification of the files in a directory
c 0x10 Create - allows creation of new files in a directory
d 0x20 Delete - allows removal of files in a directory

Example: give user john all permissions, group team-one permissions to list, read, and execute, and all the rest of users permission to list:

user:john:lrwxcd
group:team-one:lrx
user:*:l

Permissions in the ACL are cumulative - i.e when the actor matches multiple records in the ACL then all permissions defined in these records become effective. For example if we add one more user to the ACL in previous example:

user:john:lrwxcd
group:team-one:lrx
user:*:l
user:jane:rwx

then jane will have list, read, write and execute permissions.

The execPath actor deserves perhaps more detailed explanation. This allows to lock down a folder completely from accessing its content by users (either dirctly in resource tree or via Data Browser using sx:// protocol).

Consider following example. Default permissions of /System/datastores allow listing and execution to all users. Although they can’t see the credentials of the datastores (since they don’t have read permission) they have access to all data. Suppose you need to provide a limited view (for instance excluding some sensitive data) to a group of users. We create a folder /shared/sensitive to keep the scripts implementing the view. We give list and execute permissions to our target group:

group:$admin:lrwxcd           //administrators should have full access for management
group:sensitive:lx            //members of sensitive group can list and execute scripts in /shared/sensitive folder

We also create a folder in /shared/datastores/sensitivedata and lock it down:

group:$admin:lrwxcd           //administrators should have full access for management
user:*:0                      //no direct access to any user
execPath:/shared/sensitive:x  //allow execution from scripts from /shared/sensitive folder

These permissions allow only the scripts in the /shared/sensitive folder to execute datastores in /shared/datastores/sensitivedata. The users can not access datastores directly. Also they can not change or see how sensitive data is manipulated in the scripts in /shared/sensitive. Hence these permissions guarantee that target group can access only allowed data.

When scripts executing in such locked down folders create and/or update files in the same place (for instance by saving results), or include code from other scripts they also respective permissions by execPath. For instance if the script handling sensitive data from previous example needs to include some code from another script module in the same /shared/sensitive folder then we need to give read permission to scripts executing within shared/sensitive folder too:

group:$admin:lrwxcd           //administrators should have full access for management
group:sensitive:lx            //members of sensitive group can list and execute scripts in /shared/sensitive folder
execPath:/shared/sensitive:r  //scripts executing within /shared/sensitive can read from files within it

Datastore

Normally a datastore allows access to all its data behind. The datastore’s Read ACL (rACL) allows to control access within the datastore. You can examine and change rACL in the datastore definition panel (double click on the existing datastore or right click on the datastores directory/New/Data Store to display panel). The record in the rACL defines prefix of uri path which a user, group or folder is allowed to read:

{'user' | 'group' | 'execPath'} ':' name ':' uri_path_prefix

where the uri_path_prefix refers to prefix (i.e the beginning) of the path component of the uri (see section 3.3 of RFC3986).

Consider following example. You are a infosec manager of an e-commerce shop. Your company developes the software in house and the developers need access to logs of test environment but not to production ones. Unfortunately the logs from both production and test are collected to the same Amazon S3 bucket. In this situation you can define following rACL to the datastore:

group:$admin:/          //administrators have access to all data
group:devs:/logs/dev    //developers have access only to logs of dev environment, contained in /logs/dev directory

Notes:

[1]Note that defined permissions of an ACL apply also to the folder itself (i.e not inherited from its own parent).
[2]Admin role can change permissions. Always and everywhere.
[3]User role can not change permissions. (Well, it can in the user’s own space, but that does not make much sense).
[4]Execution prevention may be circumvented when read is allowed - content can be copied, pasted and saved to location where execution is allowed.
[5]Top level folders System, Shared, User, Users permissions are enforced by system can not be changed by anyone.