How can I count events over time intervals in Apache logs?¶
Getting an overview of a frequency - i.e the count of a parameter during a period of time - is essential for many. System administrators want to know, what the request rate is applied on the web servers; the rate of successful-failed login attempts might tell a lot to the infosec team, etc. Computing this is very simple with SpectX - it is just another aggregation query. All you need is a time series and desired parameter fields in your source data. The time series field should be truncated to a desired value of the period (i.e second, minute, day, etc.), aggregation function applied to the field containing a desired parameter and grouping applied to the period.
Let’s take an Apache Access log for example and compute the hourly request count. We’ll use the data from SpectX built-in examples (see extracting examples), more precisely the /user/examples/views/my_webserver_access_logs.sx view which embeds pattern and location for turning /user/examples/data/apache_access_log.sx.gz into a selectable record stream. Record timestamp is exposed under the name timestamp and response code as response:
//using dotted cmdline syntax: @[/user/examples/views/my_webserver_access_logs.sx] .select(timestamp[%1 h] as truncTime, count(*) as reqCnt) .group(truncTime); //same using SQL style: SELECT TIME_SUBMOD(timestamp, 60*60*1000) as truncTime, count(*) as reqCnt FROM @[/user/examples/views/my_webserver_access_logs.sx] GROUP BY truncTime;
We could also make use of conditional count function (COUNT(expr), see Aggregate functions) to count successful and rejected responses:
@[/user/examples/views/my_webserver_access_logs.sx] .select(timestamp[1 h] as truncTime, //note that '%' truncation operator is default and can be omitted count(response=200) as success, count(response!=200) as rejected) .group(truncTime)