Input Data Browser¶
To query local data quickly and efficiently, enter the full file path into the search bar then press enter.
When a user clicks
Input data a new window appears. Here users can browse datastores or directly query data by
entering the file path or URL into the search bar.
There is also a slider available that allows users to navigate data within the file to any other position within its raw content. The box on the right of it allows the sliding position to be entered manually. Once the slider position has been changed, all the buttons described above operate with the virtual slice of the file with the new starting position and length. These values get also reflected in the URI of the file in the search bar.
“Hex” checkbox allows choosing between raw and hex modes for the preview. The encoding dropdown menu offers several encoding formats the viewed content to be represented in.
For compressed data and compressed archive member entries, the button
Download Plaintext gets displayed. It allows
downloading decompressed content of the file or archive member to the local hard disk. As well, a checkbox “Raw”
controls how the content is displayed in preview - decompressed or not (“raw”).
See more on supported compression formats in Compressed data
Search Bar Modes¶
The input data browser has four modes of displaying the file and directory hierarchies. The mode is automatically chosen depending on the content of the search bar:
- if it is empty, a default view appears, which displays a list of defined datastores in two columns,
Type. The list can be filtered by entering character sequences to be matched with either name or type of target datastore into small filtering text boxes in the corresponding column.
- if it contains a URI of a directory, then the directory content view is displayed. It shows all the directory items, their sizes (in bytes) and times of last modification if known. Note that the size of a directory is datastore type-specific, and may not necessarily be a sum of sizes of all its entries; in most cases, it is usually the size of meta information for the directory saved on the disk in the datastore. The default sort order shows alphabetically sorted list of directory names (preceded with a slash) followed by an alphabetically sorted list of files. The displayed list can be re-sorted by clicking on one of the three columns and then choosing the sort direction.
- if it contains an URI with glob patterns in it, the search result view is displayed. The view, comparing to the previous one, provides one more column for the type of the found directory entry (“file” or “dir”), and contains the entry’s path in the datastore instead of its name in the directory. Note that glob patterns can be specified only for the entries in the current directory.
- finally, if it contains a URI of a file, the file details view is displayed. In addition to the file name, its size
and last modification date it provides a column
Content Available, which contains
truefor files whose content can be read by SpectX, or any other string specifying a reason for content unavailability reported by the datastore.
Selecting multiple data sources¶
You may also include connection-specific parameters in URIs, for example:
Microsoft Azure blob store public (anonymous) access: wasb://email@example.com/flightdelays.hql where "flightdelay" is the name of the container and "hditutorialdata.blob.core.windows.net" is the endPointUriSuffix Amazon S3 Store public (anonymous) access: s3://us-east-1@big-data-benchmark/pavlo/text/tiny/crawl/part-00000 where "us-east-1" is the region and "big-data-benchmark" is the name of the bucket
Do not specify credentials directly in the URI! It will result in credentials being included in SpectX audit logs, unprotected.