Data Browser

Data Browser is intended to serve as a starting point for analysis of any kind of ad-hoc data. Explore among source data in your defined datastores or navigate to an URI [1] of your choice until you find desired files. Then simply proceed with querying data or refining auto-generated pattern.

Start by opening up the Data Browser, browsing and clicking on one of the datastores or typing the target URI directly into the address bar.

Click on Refresh to see if the content has changed during browsing. When selecting a file, notice its properties displayed below the navigation bar.

The action bar under the properties includes buttons for preview of the file and a drop-down box for selecting the appropriate character set. NB! The content is displayed according to chosen character set. You can use slider to shift the preview within the file.

When previewing compressed files the first 16 kB of content will be automatically uncompressed for viewing. When focus is slided then compressed content is displayed.

Prepare Query opens a new query tab with autodetect created pattern of the selected file.

Prepare Pattern opens new Pattern Developer tab and sends 16 kB from current offset of the file content to the Data Editor. (Note that the amount can be configured by wgui.dataBrowser.preview_size property, see Configuration for details).

Refer to Adding New Datastore when need to add one.

To start writing patterns and queries for the new dataset, click on Data Browser in the bottom of the resource tree. Navigate to the file you are looking for and click on “Send to pattern developer” or “Prepare query”.

Selecting multiple data sources

Working with only one selected data file is most often not enough. You can specify one or more files from the same or different locations directly in the PARSE or LIST commands.

[1]

Note that you may also include connection specific parameters in URIs, for example:

MicroSoft Azure blobstore public (anonymous) access:
wasb://flightdelay@hditutorialdata.blob.core.windows.net/flightdelays.hql
    where "flightdelay" is the name of the container and "hditutorialdata.blob.core.windows.net" is the endPointUriSuffix

Google Store public (anonymous) access:
gs://gcp-public-data-landsat/LC08/PRE/044/034/LC80440342016259LGN00/LC80440342016259LGN00_MTL.txt
    where "gcp-public-data-landsat" is the name of the bucket

Amazon S3 Store  public (anonymous) access:
s3://us-east-1@big-data-benchmark/pavlo/text/tiny/crawl/part-00000
    where "us-east-1" is the region, and "big-data-benchmark" is the name of the bucket

Warning! Do not specify credentials directly in the URI! It will result in credentials being included in SpectX audit logs, unprotected.