http://

Tip

Use TOR exit nodes list to see what traffic is coming into your organization from the TOR network. See the example here.

SpectX supports http:// and https:// protocols.

Using http:// in SpectX

Host-less http:/// always refers to a root directory on a web server running at port 80 on the loopback interface 127.0.0.1 of the machine SpectX is running on. The form https:/// does the same with port 443.

To access external web servers, specify the server name/IP-address in the URI (and a port number if needed):

http://www.spectx.com/ or https://www.spectx.com/ to navigate to the SpectX main page.

To refer to data in a defined data store, its name should be used instead in the URI.

Datastore configuration

UI

The following are the configuration parameters for http:// and https://:

HTTP

Name Description
Store name unique name among all defined DataStores, (mandatory)
Host hostname or ip address (ipV4 or ipV6) of the target host, (mandatory)
Port http listening port, default=80
User-Agent value of http header “User-Agent” to use instead of a globally configured one
Is cacheable enables caching data by Processing Units
Hot cache period limits time related data caching to the period specified
Read ACL specifies blob read ACL

HTTPS

Name Description
Store name unique name among all defined DataStores, (mandatory)
Host hostname or ip address (ipV4 or ipV6) of target web host, (mandatory)
Port https listening port, default: 443
User-Agent value of http header “User-Agent” to use instead of a globally configured one
Basic Auth username username for Http Basic Authentication scheme
Basic Auth password password for Http Basic Authentication scheme
Is cacheable enables caching data by Processing Units
Hot cache period limits time related data caching to the period specified
Read ACL specifies blob read ACL

Filesystem

http:// datastore definition file is in JSON structure of the following format (optional parameters can be omitted):

{
  "type": "HTTP",
  "httpStore": {
    "host": "<host>",
    "port": <port>,
    "userAgent": "user-agent",
    "isCacheable": <isCacheable>,
    "hotCachePeriod": "<hotCachePeriod>",
    "connectTimeout": <connectTimeout>,
    "readTimeout": <readTimeout>,
    "maxErrorRetries": <maxErrorRetries>,
    "maxRedirects": <maxRedirects>,
    "userAgent": "<userAgent>",
    "acl": {<rACL>}
  }
}

where

  • <host> is the hostname or IP address (ipV4 or ipV6) of the target web host. A string. Mandatory parameter
  • <port> HTTP listening port, default 80. An integer
  • <user-agent> value of a HTTP “User-Agent” header to use in communication with target host. If set, the value overrides globally configured one. A String.
  • <isCacheable> enables caching data by Processing Units. A boolean (“true” or “false”)
  • <hotCachePeriod> limits time-related data caching to the period specified. A time period
  • <connectTimeout> is a connection timeout in milliseconds. A timeout of zero is interpreted as an infinite timeout. The default is 10000. A non-negative long integer
  • <readTimeout> is a read timeout in milliseconds. A timeout of zero is interpreted as an infinite timeout. The default is 60000. A non-negative long integer
  • <maxErrorRetries> is number of times the SpectX tries to get access to a requested resource in case it is inaccessible due to network problems until giving up. The default is 3. An integer
  • <maxRedirects> is number of times the SpectX tries to get a requested resource in case it is inaccessible due to network problems until giving up. The default is 3. A non-negative integer
  • userAgent is a value for User-Agent to be used when contacting a target web server. The default value is composed of a string “SpectX” and current software version designator. A string
  • <rACL> is a definition of a blob read ACL for the datastore. A map.

https:// datastore definition file is of JSON structure of the following format (optional parameters can be omitted):

{
  "type": "HTTPS",
  "httpsStore": {
    "host": "<host>",
    "port": <port>,
    "userAgent": "user-agent",
    "basicAuth": "<basicAuth>"
    "hotCachePeriod": "<hotCachePeriod>",
    "connectTimeout": <connectTimeout>,
    "readTimeout": <readTimeout>,
    "maxErrorRetries": <maxErrorRetries>,
    "maxRedirects": <maxRedirects>,
    "userAgent": "<userAgent>",
    "isCacheable": <isCacheable>,
    "hotCachePeriod": "<hotCachePeriod>",
    "acl": {<rACL>}
  }
}

where

  • <host> is the hostname or IP address (ipV4 or ipV6) of the target web host. A string. Mandatory parameter
  • <port> HTTP listening port, default 80. An integer
  • <user-agent> value of a HTTP “User-Agent” header to use in communication with target host. If set, the value overrides globally configured one. A String.
  • <basicAuth> username and password for Http Basic Authentication encoded as per RFC 1945. A string
  • <isCacheable> enables caching data by Processing Units. Optional. Default is “false”. A boolean (“true” or “false”)
  • <hotCachePeriod> limits time-related data caching to the period specified. A time period
  • <connectTimeout> is a connection timeout in milliseconds. A timeout of zero is interpreted as an infinite timeout. The default is 10000. A non-negative long integer
  • <readTimeout> is a read timeout in milliseconds. A timeout of zero is interpreted as an infinite timeout. The default is 60000. A non-negative long integer
  • <maxErrorRetries> is number of times the SpectX tries to get access to a requested resource in case it is inaccessible due to network problems until giving up. The default is 3. An integer
  • <maxRedirects> is number of times the SpectX tries to get a requested resource in case it is inaccessible due to network problems until giving up. The default is 3. A non-negative integer
  • userAgent is a value for User-Agent to be used when contacting a target web server. The default value is composed of a string “SpectX” and current software version designator. A string
  • <rACL> is a definition of a blob read ACL for the datastore. A map.

Advanced Usage

Listing nonexistent files

http:// and https:// protocols are the only ones which do not cause LIST() command to fail if the target file does not exist.

LIST('https://www.spectx.com/why-splunk-is-better').SELECT(uri, length)

results in

uri length
https://www.spectx.com/why-splunk-is-better -1

whereas such a request for non-existent blob or directory over any other protocol would end up with “no such file” error.

Redirects

SpectX always follows redirects (specified in Location header in 301/302/303/307 responses). The max number of redirects it follows for one resource before failing with cyclic error is 10. This number can be changed only in the data store definition file.

Timeouts

Connection timeout is 10 sec, the read timeout is 60 sec. These numbers can be changed only in the data store definition file.

Retries

When a requested http:// or https:// resource cannot be accessed due to network problems (e.g. timeouts or bad routing), SpectX will retry the request up to a configured number of times. The default value is 3 and can be changed only in the data store definition file.

Custom headers

It is possible to specify custom headers for HTTP requests the Data Access sends over http/https protocols to targets having no datastore defined. Header names and their values have to be specified as key-value pairs in the script’s init block. The format of the key is as follows:

_env.<protocol>.[<host-glob-pattern>].header.<header>

Where <protocol> and <host-glob-pattern> are correspondingly the protocol (http or https) and glob pattern matching the host from the URI(s) you are going to use in the script, and <header> is the name of the header to be added to requests. Protocol-specific headers like “content-length” and “host” cannot be added this way. Also, the header “user-agent” has to be set using the key _env.<protocol>.[<host-glob-pattern>].userAgent.

Consider the following example script:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
init(
    '_env.https.[www.whatismybrowser.com].header.x-my-fancy-header': 'dummy value',
    '_env.https.[www.whatismybrowser.com].header.host': 'nope.this.does.not.work',
    '_env.https.[www.+([a-z]).com].header.referer': 'https://www.spectx.com/',
    '_env.https.[?(www.)whatismybrowser.@(com|net)].header.cookie': 'myCookie',
    '_env.https.[*].userAgent': 'My SpectX instance'
);

$pattern = <<<PATTERN
    (BOS DATA{0,500000})?
    "<tr>" DATA "<th>" LD:key "</th>" EOL
    LD "<td>" LD:value "</td>" EOL
    LD "</tr>" EOL
    (EOL|EOF)
PATTERN;

LIST(src:'https://www.whatismybrowser.com/detect/what-http-headers-is-my-browser-sending')
| PARSE(pattern:$pattern);

It produces the following output:

key value
ACCEPT text/html, image/gif, image/jpeg, ; q=.2, */; q=.2
CACHE_CONTROL no-cache
CONNECTION keep-alive
COOKIE myCookie
HOST www.whatismybrowser.com
PRAGMA no-cache
REFERER https://www.spectx.com/
USER_AGENT My SpectX instance
X_MY_FANCY_HEADER dummy value

Security

Insecure https

When accessing https targets, SpectX intentionally ignores all TLS certificate errors

Basic authentication

For ad-hoc queries, username and password for the account at http/https server can be specified in the URI in its user info part: http://username:password@host.

Note that you need to percent-encode both username and password if these contain ‘@’ or ‘:’ symbols, e.g. `http://username%40domain:password@host.

SSRF prevention

To prevent Server-Side Request Forgery type of attacks in cloud environments, a default value for system property com.spectx.da.http.deniedHosts used by Data Access to disable access to specified http/https servers, is set to “169.254.169.254:80”. This disallows making http/https requests to metadata service through SpectX if it is running in the cloud.