ssh://

Ssh scheme (ssh://) is using SSHv2 to access files stored in on-premises hosts (more precisely, any host reachable using ssh). It is relatively slow in terms of access speed but very useful for using ad-hoc data analysis. For instance, during incident response analysis, a member of a security team might need logs from a live server, which have not been collected regularly. Live servers are often behind a firewall and configured to allow only minimum necessary protocols. They are blocking protocols for public access (such as HTTP or HTTPS) or custom protocols (like SpectX Source Agent). Getting a firewall connection open for any of these is usually a time-consuming process. However, SSHv2 is commonly allowed in such firewalls, as it is used by the operations staff for server shell access. After an account has been created on the target host, SpectX can immediately analyze logs that the user has access to.

There are a few requirements for the remote host configuration for SpectX to work over ssh:

  1. Sftp subsystem must be enabled in sshd (it usually is)
  2. dd and md5 utilities must be available for execution. SpectX uses dd for getting file chunks, and md5 for calculating ETAGs to provide advanced caching support. Normally these are both available on Linux and Mac OS X out of the box.

Using ssh:// in SpectX

The implementation of ssh:// protocols does not support host-less URI notations, and always requires either a valid ssh server hostname/IP address or datastore name to be specified as host in the URI. If the target port number differs from standard one (22) then it must be explicitly specified in the URI, for example: ssh://secure-host:522.

In most cases, ad-hoc access to data on a server over ssh:// protocol from SpectX does not require using configured a datastore. Namely, a username and password for the account at ssh-server can be specified in the URI in its user info part as follows: ssh://username:password@host. Note that you need to percent-encode both username and password if these contain ‘@’ or ‘:’ symbols, e.g. ssh://username%40domain:password@host. Alternatively, to access a host with public-key authentication, you need to create a file named id_rsa in UI under user/.ssh and put your PEM-encoded private key in it. Then you can access the host using URI like ssh://username@host.

Datastore configuration

UI

Configuration parameters for ssh:// datastore definition:

Name Description
Store Name unique name among all defined DataStores
Host target host hostname or ip address (ipV4 or ipV6)
Port ssh listening port (default: 22)
Username target host account username
Password target host account password. Mandatory if Public Key is not specified
Public Key public ssh key that can be copied and pasted to ssh authorized_keys file. Mandatory if Password is not specified
Root Directory
top level directory at the target host, where the data can be read from. Must be
read accessible by the user. Optional. When empty, / is assumed. If starts with
~/, users’ home directory is assumed.
PathDd
filename with the full path to dd utility executable binary. If empty, command -v dd
will be executed on the target host to determine the location of the dd utility. It gets executed
with bs, skip and count parameters, with input provided to its stdin and output
read from stdout
PathMd5
filename with the full path to md5 utility executable binary. If empty,
command -v md5sum (Linux) or command -v md5 (OSX) is executed on the target host in order
to determine the location of the md5 utility. In the case of md5sum (Linux), it is invoked
as md5sum | cut -c 1-32 and in case of md5 it is invoked as md5 -q. Both are
given inputs at stdin. The output is expected at stdout.
Is cacheable enables caching data by Processing Units
Hot Cache Period limits time related data caching to the period specified
Read ACL specifies blob read ACL

SSH key pairs are generated by SpectX. Private keys are not accessible by SpectX users.

Filesystem

The datastore definition file is of JSON structure of the following format (optional parameters can be omitted):

{
  "type": "SSH",
  "sshStore": {
    "host": "<host>",
    "port": <port>,
    "username": "<username>",
    "passwd": "<password>",
    "privateKey": "<privateKey>",
    "rootDir": "<rootDir>",
    "pathDd": "<pathDd>",
    "pathMd5": "<pathMd5>",
    "isCacheable": <isCacheable>,
    "hotCachePeriod": "<hotCachePeriod>",
    "acl": {<rACL>}
  }
}

where

  • <host> target ssh host hostname or IP address (IPv4 or IPv6). A string. Mandatory parameter.
  • <port> ssh listening port. Integer. Optional, the default is 22.
  • <username> the username on the target host. A string. Mandatory parameter.
  • <passwd> The password. A string. Mandatory if <privateKey> is not specified.
  • <privateKey> Base64-encoded private key in PEM format. The corresponding public key must be in the authorized_keys file on the target host. A string. Mandatory if <password> is not specified.
  • <rootDir> top-level directory at the target host, where the data can be read from. Must be read accessible by the user. A string. Optional. When empty, / is assumed.
  • <pathDd> filename with the full path to dd utility. If empty, command -v dd will be executed on the target host to determine the location of the dd utility. A string. Optional.
  • <pathMd5> filename with the full path to md5 utility executable binary. If empty, command -v md5sum (Linux) or command -v md5 (OSX) is executed on the target host to determine the location of the md5 utility. A string. Optional.
  • <isCacheable> enables caching data by Processing Units. A boolean (“true” or “false”)
  • <hotCachePeriod> limits time-related data caching to the period specified. A time period.
  • <rACL> is a definition of a blob read ACL for the datastore. A map.