wasb://¶
The schemes (wasb://
, wasbs://
) provide access to the Microsoft Azure Blob Storage. The latter uses TLS to
secure underlying communication with MS Azure.
Using wasb:// in SpectX¶
The implementation of wasb://
and wasbs://
protocols do not support host-less URI notations, and always
require either a valid Azure bucket name or datastore name to be specified as host in the URI.
An anonymous Azure container in a custom domain can be accessed using URI of the form wasb://container@endpoint.uri
,
where container
is the name of the target container, and endpoint.uri
is a URI that points to the custom domain.
Example:
wasbs://armtemplates@hditutorialdata.blob.core.windows.net
Datastore configuration¶
UI¶
Configuration parameters for both wasb://
and wasbs://
datastore definition:
Name | Description |
---|---|
Store name | unique name among all defined DataStores. Mandatory parameter
|
Container | name of a target Azure Blob Storage container storing blobs.
Mandatory parameter
|
Account | name of Azure storage account to use for connection to the bucket.
If not specified then anonymous access is assumed.
|
Access Key | storage account access key. Required if account is specified.
Mutually exclusive with Shared Access Token.
|
Shared Access Token | Shared Access Token. Required if account is specified.
Mutually exclusive with Access Key.
|
Endpoint URI Suffix | |
Is cacheable | enables caching data by Processing Units
|
Hot cache period | limits time related data caching to the period specified
|
Connect Timeout | connect timeout. A timeout of zero is interpreted as an infinite timeout.
Default is 10s. A time interval evaluating to integer amount of
milliseconds.
|
Read Timeout | read timeout. A timeout of zero is interpreted as an infinite timeout.
Default is 60s. A time interval evaluating to integer amount of
milliseconds.
|
Max Error Retries | a max number of times the SpectX tries to get access to a requested
resource in case it is inaccessible due to network problems until giving
up. The default is 3. A non-negative integer
|
ACL | specifies blob ACL
|
Filesystem¶
wasb://
and wasbs://
datastore definition files are of JSON structure of the following formats correspondingly
(optional parameters can be omitted):
{
"type": "WASB",
"wasbStore": {
"container": "<container>",
"account": "<account>",
"key": "<key>",
"sasToken": "<sasToken>",
"endPointUriSuffix": <endPointUriSuffix>,
"directoryDelimiter": "<directoryDelimiter>",
"isCacheable": <isCacheable>,
"hotCachePeriod": "<hotCachePeriod>",
"connectTimeout": <connectTimeout>,
"readTimeout": <readTimeout>,
"maxErrorRetries": <maxErrorRetries>,
"userAgent": "<userAgent>",
"acl": {<ACL>}
}
}
{
"type": "WASBS",
"wasbsStore": {
"container": "<container>",
"account": "<account>",
"key": "<key>",
"sasToken": "<sasToken>",
"endPointUriSuffix": <endPointUriSuffix>,
"directoryDelimiter": "<directoryDelimiter>",
"isCacheable": <isCacheable>,
"hotCachePeriod": "<hotCachePeriod>",
"connectTimeout": <connectTimeout>,
"readTimeout": <readTimeout>,
"maxErrorRetries": <maxErrorRetries>,
"userAgent": "<userAgent>",
"acl": {<ACL>}
}
}
where
<container>
name of a target Blob Storage container storing blobs. A string. Mandatory parameter<account>
name of MS Azure storage account to use for connection to the bucket. Optional, if not specified then anonymous access is assumed. A string<key>
storage account access key. Required if the account is specified. Cannot be specified if<sasToken>
is specified. A string<sasToken>
Shared Access Signature token. Required if the account is specified. Cannot be specified if<key>
is specified. A string<endPointUriSuffix>
Endpoint URI suffix. Optional. A string<directoryDelimiter>
directory separator in a file name. Optional, when empty then default “/” is assumed. A string<isCacheable>
enables caching data by Processing Units. Optional. Default is “false”. A boolean (“true” or “false”)<hotCachePeriod>
limits time-related data caching to the period specified. A time period<connectTimeout>
is a connection timeout. A timeout of zero is interpreted as an infinite timeout. The default is 10s. A time interval evaluating to integer amount of milliseconds.<readTimeout>
is a read timeout. A timeout of zero is interpreted as an infinite timeout. The default is 60s. A time interval evaluating to integer amount of milliseconds.<maxErrorRetries>
A max number of times the SpectX tries to get access to a requested resource in case it is inaccessible due to network problems until giving up. The default is 3. A non-negative integer<userAgent>
is a value for software agent name to be used when communicating with the cloud. The default value consists of the string “SpectX” and the software version. An optional string<anonymousTtl>
is a TTL for an anonymous access token to be used by processing units for processing blob content during query execution, in milliseconds. The default is 30000. A non-negative long integer<ACL>
is a definition of a blob ACL for the datastore. A map.