wasb://

The schemes (wasb://, wasbs://) provide access to the Microsoft Azure Blob Storage. The latter uses TLS to secure underlying communication with MS Azure.

Using wasb:// in SpectX

The implementation of wasb:// and wasbs:// protocols do not support host-less URI notations, and always require either a valid Azure bucket name or datastore name to be specified as host in the URI.

An anonymous Azure container in a custom domain can be accessed using URI of the form wasb://container@endpoint.uri, where container is the name of the target container, and endpoint.uri is a URI that points to the custom domain.

Example:

wasbs://armtemplates@hditutorialdata.blob.core.windows.net

Datastore configuration

UI

Configuration parameters for both wasb:// and wasbs:// datastore definition:

Name Description
Store name
unique name among all defined DataStores. Mandatory parameter
Container
name of a target Azure Blob Storage container storing blobs.
Mandatory parameter
Account
name of Azure storage account to use for connection to the bucket.
If not specified then anonymous access is assumed.
Access Key
storage account access key. Required if account is specified.
Mutually exclusive with Shared Access Token.
Shared Access Token
Shared Access Token. Required if account is specified.
Mutually exclusive with Access Key.
Endpoint URI Suffix
Is cacheable
enables caching data by Processing Units
Hot cache period
limits time related data caching to the period specified
Connect Timeout
connect timeout. A timeout of zero is interpreted as an infinite timeout.
Default is 10s. A time interval evaluating to integer amount of
milliseconds.
Read Timeout
read timeout. A timeout of zero is interpreted as an infinite timeout.
Default is 60s. A time interval evaluating to integer amount of
milliseconds.
Max Error Retries
a max number of times the SpectX tries to get access to a requested
resource in case it is inaccessible due to network problems until giving
up. The default is 3. A non-negative integer
ACL
specifies blob ACL

Filesystem

wasb:// and wasbs:// datastore definition files are of JSON structure of the following formats correspondingly (optional parameters can be omitted):

{
  "type": "WASB",
  "wasbStore": {
    "container": "<container>",
    "account": "<account>",
    "key": "<key>",
    "sasToken": "<sasToken>",
    "endPointUriSuffix": <endPointUriSuffix>,
    "directoryDelimiter": "<directoryDelimiter>",
    "isCacheable": <isCacheable>,
    "hotCachePeriod": "<hotCachePeriod>",
    "connectTimeout": <connectTimeout>,
    "readTimeout": <readTimeout>,
    "maxErrorRetries": <maxErrorRetries>,
    "userAgent": "<userAgent>",
    "acl": {<ACL>}
  }
}
{
  "type": "WASBS",
  "wasbsStore": {
    "container": "<container>",
    "account": "<account>",
    "key": "<key>",
    "sasToken": "<sasToken>",
    "endPointUriSuffix": <endPointUriSuffix>,
    "directoryDelimiter": "<directoryDelimiter>",
    "isCacheable": <isCacheable>,
    "hotCachePeriod": "<hotCachePeriod>",
    "connectTimeout": <connectTimeout>,
    "readTimeout": <readTimeout>,
    "maxErrorRetries": <maxErrorRetries>,
    "userAgent": "<userAgent>",
    "acl": {<ACL>}
  }
}

where

  • <container> name of a target Blob Storage container storing blobs. A string. Mandatory parameter
  • <account> name of MS Azure storage account to use for connection to the bucket. Optional, if not specified then anonymous access is assumed. A string
  • <key> storage account access key. Required if the account is specified. Cannot be specified if <sasToken> is specified. A string
  • <sasToken> Shared Access Signature token. Required if the account is specified. Cannot be specified if <key> is specified. A string
  • <endPointUriSuffix> Endpoint URI suffix. Optional. A string
  • <directoryDelimiter> directory separator in a file name. Optional, when empty then default “/” is assumed. A string
  • <isCacheable> enables caching data by Processing Units. Optional. Default is “false”. A boolean (“true” or “false”)
  • <hotCachePeriod> limits time-related data caching to the period specified. A time period
  • <connectTimeout> is a connection timeout. A timeout of zero is interpreted as an infinite timeout. The default is 10s. A time interval evaluating to integer amount of milliseconds.
  • <readTimeout> is a read timeout. A timeout of zero is interpreted as an infinite timeout. The default is 60s. A time interval evaluating to integer amount of milliseconds.
  • <maxErrorRetries> A max number of times the SpectX tries to get access to a requested resource in case it is inaccessible due to network problems until giving up. The default is 3. A non-negative integer
  • <userAgent> is a value for software agent name to be used when communicating with the cloud. The default value consists of the string “SpectX” and the software version. An optional string
  • <anonymousTtl> is a TTL for an anonymous access token to be used by processing units for processing blob content during query execution, in milliseconds. The default is 30000. A non-negative long integer
  • <ACL> is a definition of a blob ACL for the datastore. A map.