w.. ds_wasb:

wasb://

The schemes (wasb://, wasbs://) provide access to the Microsoft Azure Blob Storage. The latter uses TLS to secure underlying communication with MS Azure.

Using wasb:// in SpectX

The implementation of wasb:// and wasbs:// protocols do not support host-less URI notations, and always require either a valid Azure bucket name or datastore name to be specified as host in the URI.

An anonymous Azure container in a custom domain can be accessed using URI of the form wasb://container@endpoint.uri, where container is the name of the target container, and endpoint.uri is a URI that points to the custom domain.

Example:

wasbs://armtemplates@hditutorialdata.blob.core.windows.net

Datastore configuration

UI

Configuration parameters for both wasb:// and wasbs:// datastore definition:

Name Description
Store name unique name among all defined DataStores. Mandatory parameter
Container name of a target Blob Storage container storing blobs. Mandatory parameter
Account name of MS Azure storage account to use for connection to the bucket. If not specified then anonymous access is assumed
Access Key storage account access key. Required if account is specified
Endpoint URI Suffix Endpoint URI suffix
Is cacheable enables caching data by Processing Units
Hot cache period limits time related data caching to the period specified
Read ACL specifies blob read ACL

Filesystem

wasb:// and wasbs:// datastore definition files are of JSON structure of the following formats correspondingly (optional parameters can be omitted):

{
  "type": "WASB",
  "wasbStore": {
    "container": "<container>",
    "account": "<account>",
    "key": "<key>",
    "endPointUriSuffix": <endPointUriSuffix>,
    "directoryDelimiter": "<directoryDelimiter>",
    "isCacheable": <isCacheable>,
    "hotCachePeriod": "<hotCachePeriod>",
    "connectTimeout": <connectTimeout>,
    "readTimeout": <readTimeout>,
    "maxErrorRetries": <maxErrorRetries>,
    "maxRedirects": <maxRedirects>,
    "userAgent": "<userAgent>",
    "acl": {<rACL>}
  }
}
{
  "type": "WASBS",
  "wasbsStore": {
    "container": "<container>",
    "account": "<account>",
    "key": "<key>",
    "endPointUriSuffix": <endPointUriSuffix>,
    "directoryDelimiter": "<directoryDelimiter>",
    "isCacheable": <isCacheable>,
    "hotCachePeriod": "<hotCachePeriod>",
    "connectTimeout": <connectTimeout>,
    "readTimeout": <readTimeout>,
    "maxErrorRetries": <maxErrorRetries>,
    "userAgent": "<userAgent>",
    "acl": {<rACL>}
  }
}

where

  • <container> name of a target Blob Storage container storing blobs. A string. Mandatory parameter
  • <account> name of MS Azure storage account to use for connection to the bucket. Optional, if not specified then anonymous access is assumed. A string
  • <key> storage account access key. Required if the account is specified. A string
  • <endPointUriSuffix> Endpoint URI suffix. Optional. A string
  • <directoryDelimiter> directory separator in a file name. Optional, when empty then default “/” is assumed. A string
  • <isCacheable> enables caching data by Processing Units. Optional. Default is “false”. A boolean (“true” or “false”)
  • <hotCachePeriod> limits time-related data caching to the period specified. A time period
  • <connectTimeout> is a connection timeout in milliseconds. A timeout of zero is interpreted as an infinite timeout. The default is 10000. A non-negative long integer
  • <readTimeout> is a read timeout in milliseconds. A timeout of zero is interpreted as an infinite timeout. The default is 60000. A non-negative long integer
  • <maxErrorRetries> is number of times the SpectX tries to get access to a requested resource in case it is inaccessible due to network problems until giving up. The default is 3. An integer
  • userAgent is a value for software agent name to be used when communicating with the cloud. The default value consists of the string “SpectX” and the software version. A string
  • <anonymousTtl> is a TTL for an anonymous access token to be used by processing units for processing blob content during query execution, in milliseconds. The default is 30000. A non-negative long integer
  • <rACL> is a definition of a blob read ACL for the datastore. A map.