s3://

The schemes (s3:// and s3s://) provide access to cloud service supporting Amazon Simple Storage Service (S3) protocol. The latter uses TLS to secure underlying communication with the S3.

Using s3:// in SpectX

The implementation of s3:// and s3s:// protocols do not support host-less URI notations, and always require either a valid S3 bucket name or datastore name to be specified as host in the URI.

A region of an Amazon S3 bucket can be specified directly in the URI in its user info part: s3://region@bucket.

Example:

s3s://eu-west-1@spectx-docs

That way, if the URI points to a defined S3 datastore, the region inserted into the URI overrides one defined for the datastore.

Datastore configuration

UI

Configuration parameters for both s3:// and s3s:// datastore definition:

Name Description
Store name
Unique name among all defined DataStores. Mandatory parameter
Bucket
The name of the target S3 bucket storing blobs. Mandatory parameter
Endpoint
The service endpoint, with or without the protocol (for instance,
https://s3.example.com or s3.example.com). Optional. Use this to define a
non-standard (non-Amazon S3) service endpoint.
Region
The name of the region of the bucket. Optional, required only for Amazon S3,
defaulting to “us-west-2”.
Path style access
If enabled, path-style access is used instead of virtual-hosted-style for
routing requests to the bucket (see more on AWS docs site). Optional.
Use IAM Role
IAM Role instead of access key authentication. Optional
Access Key Id
The access key ID for S3 API authentication. Optional.
Secret Access Key
The secret access key for S3 API authentication. Optional.
Directory delimiter
Directory separator in a file name. Optional. When empty, default “/”
is assumed.
Is cacheable
Enables caching data by Processing Units
Hot cache period
Limits time related data caching to the period specified
Connect Timeout
Connect timeout. A timeout of zero is interpreted as an infinite timeout.
Default is 10s. A time interval evaluating to integer amount of milliseconds.
Read Timeout
Read timeout. A timeout of zero is interpreted as an infinite timeout. Default
is 60s. A time interval evaluating to integer amount of milliseconds.
Max Error Retries
Max number of times the SpectX tries to get access to a requested resource
in case it is inaccessible due to network problems until giving up. The default
is 3. A non-negative integer
ACL
Specifies blob ACL

Filesystem

s3:// and s3s:// datastore definition files are of JSON structure of the following formats correspondingly (optional parameters can be omitted):

{
  "type": "S3",
  "s3Store": {
    "bucket": "<bucket>",
    "region": "<region>",
    "endpoint": "<endpoint>",
    "useIamRole": <useIamRole>,
    "accessKey": "<accessKey>",
    "secretKey": "<secretKey>",
    "pathStyleAccess": <pathStyleAccess>,
    "directoryDelimiter": "<directoryDelimiter>",
    "isCacheable": <isCacheable>,
    "hotCachePeriod": "<hotCachePeriod>",
    "connectTimeout": <connectTimeout>,
    "readTimeout": <readTimeout>,
    "maxErrorRetries": <maxErrorRetries>,
    "userAgent": "<userAgent>",
    "acl": {<ACL>}
  }
}
{
  "type": "S3S",
  "s3sStore": {
    "bucket": "<bucket>",
    "region": "<region>",
    "endpoint": "<endpoint>",
    "useIamRole": <useIamRole>,
    "accessKey": "<accessKey>",
    "secretKey": "<secretKey>",
    "pathStyleAccess": <pathStyleAccess>,
    "directoryDelimiter": "<directoryDelimiter>",
    "isCacheable": <isCacheable>,
    "hotCachePeriod": "<hotCachePeriod>",
    "connectTimeout": <connectTimeout>,
    "readTimeout": <readTimeout>,
    "maxErrorRetries": <maxErrorRetries>,
    "userAgent": "<userAgent>",
    "acl": {<ACL>}
  }
}

where

  • <bucket> is the name of the target S3 bucket storing blobs. A string. Mandatory parameter.
  • <region> is the name of the region of the bucket. Optional, required only for Amazon S3, defaulting to “us-west-2”. A string.
  • <endpoint> is the service endpoint either with or without the protocol (e.g. https://s3.example.com or s3.example.com). Optional. Use to define a non-standard service endpoint. The default is empty.
  • <useIamRole> use IAM Role instead of access key authentication. The default is “false”. A boolean (“true or “false”).
  • <accessKey> the access key ID for S3 API authentication. Ignored if <useIamRole> is “true”. Optional.
  • <secretKey> the secret access key for S3 API authentication. Is ignored if <useIamRole> is “true”. Optional.
  • <pathStyleAccess> if enabled, path-style access is used instead of virtual-hosted-style for routing requests to the bucket (see more on AWS docs site). Default is “false”. A boolean (“true or “false”).
  • <directoryDelimiter> directory separator in a file name. Optional, when empty then default “/” is assumed. A string
  • <isCacheable> enables caching data by Processing Units. Optional. Default is “false”. A boolean (“true” or “false”)
  • <hotCachePeriod> limits time-related data caching to the period specified. A time period.
  • <connectTimeout> is a connection timeout. A timeout of zero is interpreted as an infinite timeout. The default is 10s. A time interval evaluating to integer amount of milliseconds.
  • <readTimeout> is a read timeout. A timeout of zero is interpreted as an infinite timeout. The default is 60s. A time interval evaluating to integer amount of milliseconds.
  • <maxErrorRetries> A max number of times the SpectX tries to get access to a requested resource in case it is inaccessible due to network problems until giving up. The default is 3. A non-negative integer.
  • <userAgent> is a value for software agent name to be used when communicating with the cloud. Default value is composed of a string “SpectX” and the current version. A string.
  • <anonymousTtl> is the TTL for an anonymous access token to be used by processing units for processing blob content during query execution, in milliseconds. The default is 30000. A non-negative long integer.
  • <ACL> is a definition of a blob ACL for the datastore. A map.