s3://

The schemes (s3:// and s3s://) provide access to cloud service supporting Amazon Simple Storage Service (S3) protocol. The latter uses TLS to secure underlying communication with the S3.

Using s3:// in SpectX

The implementation of s3:// and s3s:// protocols do not support host-less URI notations, and always require either a valid S3 bucket name or datastore name to be specified as host in the URI.

A region of an Amazon S3 bucket can be specified directly in the URI in its user info part: s3://region@bucket.

Example:

s3s://eu-west-1@spectx-docs

That way, if the URI points to a defined S3 datastore, the region inserted into the URI overrides one defined for the datastore.

Datastore configuration

UI

Configuration parameters for both s3:// and s3s:// datastore definition:

Name Description
Store name Unique name among all defined DataStores. Mandatory parameter
Bucket The name of the target S3 bucket storing blobs. Mandatory parameter
Endpoint The service endpoint, with or without the protocol (e.g. https://s3.example.com or s3.example.com). Optional. Use this to define a non-standard service endpoint.
Region The name of the region of the bucket. Optional, required only for Amazon S3, defaulting to “us-west-2”.
Path style access If enabled, path-style access is used instead of virtual-hosted-style for routing requests to the bucket (see more on AWS docs site). Optional.
Use IAM Role IAM Role instead of access key authentication. Optional
Access Key Id The access key ID for S3 API authentication. Optional.
Secret Access Key The secret access key for S3 API authentication. Optional.
Directory delimiter Directory separator in a file name. Optional. When empty, default “/” is assumed.
Is cacheable Enables caching data by Processing Units
Hot cache period Limits time related data caching to the period specified
Read ACL Specifies blob read ACL

Filesystem

s3:// and s3s:// datastore definition files are of JSON structure of the following formats correspondingly (optional parameters can be omitted):

{
  "type": "S3",
  "s3Store": {
    "bucket": "<bucket>",
    "region": "<region>",
    "endpoint": "<endpoint>",
    "useIamRole": <useIamRole>,
    "accessKey": "<accessKey>",
    "secretKey": "<secretKey>",
    "pathStyleAccess": <pathStyleAccess>,
    "directoryDelimiter": "<directoryDelimiter>",
    "isCacheable": <isCacheable>,
    "hotCachePeriod": "<hotCachePeriod>",
    "connectTimeout": <connectTimeout>,
    "readTimeout": <readTimeout>,
    "maxErrorRetries": <maxErrorRetries>,
    "userAgent": "<userAgent>",
    "acl": {<rACL>}
  }
}
{
  "type": "S3S",
  "s3sStore": {
    "bucket": "<bucket>",
    "region": "<region>",
    "endpoint": "<endpoint>",
    "useIamRole": <useIamRole>,
    "accessKey": "<accessKey>",
    "secretKey": "<secretKey>",
    "pathStyleAccess": <pathStyleAccess>,
    "directoryDelimiter": "<directoryDelimiter>",
    "isCacheable": <isCacheable>,
    "hotCachePeriod": "<hotCachePeriod>",
    "connectTimeout": <connectTimeout>,
    "readTimeout": <readTimeout>,
    "maxErrorRetries": <maxErrorRetries>,
    "userAgent": "<userAgent>",
    "acl": {<rACL>}
  }
}

where

  • <bucket> is the name of the target S3 bucket storing blobs. A string. Mandatory parameter.
  • <region> is the name of the region of the bucket. Optional, required only for Amazon S3, defaulting to “us-west-2”. A string.
  • <endpoint> is the service endpoint either with or without the protocol (e.g. https://s3.example.com or s3.example.com). Optional. Use to define a non-standard service endpoint. The default is empty.
  • <useIamRole> use IAM Role instead of access key authentication. The default is “false”. A boolean (“true or “false”).
  • <accessKey> the access key ID for S3 API authentication. Ignored if <useIamRole> is “true”. Optional.
  • <secretKey> the secret access key for S3 API authentication. Is ignored if <useIamRole> is “true”. Optional.
  • <pathStyleAccess> if enabled, path-style access is used instead of virtual-hosted-style for routing requests to the bucket (see more on AWS docs site). Default is “false”. A boolean (“true or “false”).
  • <directoryDelimiter> directory separator in a file name. Optional, when empty then default “/” is assumed. A string
  • <isCacheable> enables caching data by Processing Units. Optional. Default is “false”. A boolean (“true” or “false”)
  • <hotCachePeriod> limits time-related data caching to the period specified. A time period.
  • <connectTimeout> is a connection timeout in milliseconds. A timeout of zero is interpreted as an infinite timeout. The default is 10000. A non-negative long integer.
  • <readTimeout> is a read timeout in milliseconds. A timeout of zero is interpreted as an infinite timeout. The default is 60000. A non-negative long integer.
  • <maxErrorRetries> is number of times the SpectX tries to get access to a requested resource in case it is inaccessible due to network problems until giving up. The default is 3. An integer.
  • userAgent is a value for software agent name to be used when communicating with the cloud. Default value is composed of a string “SpectX” and the current version. A string.
  • <anonymousTtl> is the TTL for an anonymous access token to be used by processing units for processing blob content during query execution, in milliseconds. The default is 30000. A non-negative long integer.
  • <rACL> is a definition of a blob read ACL for the datastore. A map.