SpectX allows to execute structured queries directly on raw data files. In order to do so it needs to know which files to look to retrieve data. To provide query engine with structured data the retrieved raw bytes need to be interpreted according to expected structure (the pattern) and transformed. So, in general the query is processed in following stages:
Of course there are exceptions to that. For instance querying stored results, which are already structured, do not require parsing. Or similarly when using commands generating structured tuple stream data.
SpectX queries can be written as a script. This allows composing complex analysis tasks, consisting of many queries and manipulating retrieved data. The possibility to write scripts that are easy to read and understand is no less important.
SpectX processes data as a snapshot. With each query execution, data is read from specified input resources. Query processing takes place in a distributed manner. The query is disassembled to a set of smaller simultaneously executed tasks. The tasks are grouped to a sequence of stages, each with a certain purpose. The exact number of stages may vary, depending on the nature of the query.
Additionally, SpectX allows defining Views to capture input resources and format definitions, but also to access optimizations. It forms the basis for separating roles for data resource management and analytics.