How can I count events over time intervals in Apache logs?ΒΆ

Getting an overview of a frequency - i.e the count of a parameter during a period of time - is essential for many. System administrators want to know, what the request rate is applied on the web servers; the rate of successful-failed login attempts might tell a lot to the infosec team, etc. Computing this is very simple with SpectX - it is just another aggregation query. All you need is a time series and desired parameter fields in your source data. The time series field should be truncated to a desired value of the period (i.e second, minute, day, etc.), aggregation function applied to the field containing a desired parameter and grouping applied to the period.

Truncating the time field is convenient using TIMESTAMP operator for truncation. Of course, you can also use the TIME_SUBMOD() function instead of an operator.

Let’s take an Apache Access log for example and compute the hourly request count. We’ll use the data from SpectX built-in examples (see extracting examples), more precisely the /user/examples/views/ view which embeds pattern and location for turning /user/examples/data/ into a selectable tuple stream. Record timestamp is exposed under the name timestamp and response code as response:

//using dotted cmdline syntax:
 .select(timestamp[%1 h] as truncTime, count(*) as reqCnt)

//same using SQL style:
    TIME_SUBMOD(timestamp, 60*60*1000) as truncTime,
    count(*) as reqCnt
FROM @[/user/examples/views/]
GROUP BY truncTime;

We could also make use of conditional count function (COUNT(expr), see Aggregate functions) to count successful and rejected responses:

 .select(timestamp[1 h] as truncTime,   //note that '%' truncation operator is default and can be omitted
    count(response=200) as success,
    count(response!=200) as rejected)