Architecture

Features

Authentication

At the moment, the only authentication supported is one through header “x-auth-key”, and it is supposed to be used for authorising API access (both using master key and/or container-specific keys). No separate authorisation of users for accessing containers contents is currently supported. There are two possible groups of API calls (system info and sourceAgent protocol), each requiring its own key to be present in requests. API calls targeting locked files does not require specific authentication as these consume randomly generated short-lived lock ids.

File locking

The API provides access to a mechanism for locking files for restricted period of time. It is implemented by means of holding target file channel open until the lock expires or gets explicitly requested to be destroyed. Once lock is created, it gets assigned a random id, which may then be used to query the file content without additional authentication. All consequent lock creation requests to the same file reuse the previously opened file channel. It is guaranteed that the content is available if file gets deleted during the lock lifetime (however, it does not help if the file gets truncated). Upon the lock destruction, the server returns statistics on real, cpu and usr times spent for handling the IO operations for given lock.

GzipScan

A support for parallel processing of ordinary gzip files is provided by a GzipScan module. When enabled (not by default), the module scans gzip files in source data directories in each configured container in the background, and creates specific indices for zlib blocks of certain length in configured directory. The module enhances SourceAgent’s API by exposing endpoints which provide access to indexed gzip chunks for parallel processing.

Advanced caching support

The sourceAgent calculates Etags not for files, but for requested file chunks. Conditional requests for file chunks with Etags calculated previously for smaller chunks may result in responses containing only identified delta if the data is found unchanged. This approach allows the API clients to implement advanced caching mechanisms based on chunk Etags.

Statistics

The sourceAgent gathers execution and disk IO stats for every request and its own background activities. The stats related to request processing get logged along with request and response data, and are also available for clients via the API in aggregated form.

Capabilities

TLS

Given the nature of the product, where it is supposed to be used on-premises in secured networking environments, the TLS support is not something we plan to provide in future releases as we move towards payload encryption. At the moment, technically both JDK (slow) and Tomcat Native SSL/TLS (OpenSSL/BoringSSL) providers are supported. If the latter is available, it gets used instead of one coming with JDK/JRE.

No certificate-based client authentication is supported at the moment.

It is possible to set up the server to use either self-signed certificate generated at each startup, or any other one provided in configuration.

Database

In circumstances where slow mounted network drives are used, it is possible to configure the server to perform periodical scanning of filesystems on such drives and store results in internal database. Such approach speeds up processing of directory listing and file search requests, as then the database becomes the source for metadata.