Nginx Custom Access Log

X-Forwarded-For header added to a Nginx access log, is an example of a field with a sequence of variable number of values. I.e each record may have a different number of values in that field.

Example A line from access log of Nginx instance configured to log using Syslog:

Mar 14 21:34:30 webfrontend1 nginx[8491]: 192.168.0.24 - - [14/Mar/2016:23:34:25 +0200] "GET //db/scripts/setup.php HTTP/1.1" 404 474 "-" "-" 117.169.75.66, 10.24.5.39

Hint

You can find sample log file by navigating with Input Data Browser to s3s://spectx-docs/formats/log/nginx/nginx-access-xff-syslog.log

Parse

Since Nginx uses Apache combined format for logging we can re-use its pattern . We can also borrow the Syslog header pattern. We’ll use subpatterns to define these. The only new element here is the x-forwarded-for field, for which we’re going to use ARRAY matcher.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
$syslogHdr = TIMESTAMP('MMM d HH:mm:ss', tz='UTC'):syslog_time ' ' LD:server ' ' LD:proc ('[' INT:pid ']')? ':' ' ';

$apache_combined =
(IPADDR:client_ip | [! \n]+):host
' ' ('-' | NSPACE:ident)                          // Apache access log format is vulnerable to
' ' ('-' | (DATA{1,8096}:auth >>(' [' HTTPDATE))) // log poisoning attack via remote user field
' ' '[' HTTPDATE:timestamp ']'
' ' (('\"' [A-Z-_]+:verb ' ' LD{0,8096}:uri ' HTTP/' FLOAT:httpversion '\"') | DQS:invalid_request)
' ' INTEGER:response
' ' (LONG:bytes | '-')
(' ' DQS:referrer (' ' DQS:agent)?)?;

$x_forwarded_for = ' ' ARRAY{IPADDR:ip ','? ' '?}*:xff;

$syslogHdr $apache_combined $x_forwarded_for EOL;

Here on line 13 we declare ARRAY, which matches sequence of ip-addresses followed by comma and space. Both are marked as optional since the last IP-address is not followed by either of them.

On the last line, we declare our main pattern: the record consists of Syslog header followed by apache combined format fields and x-forwarded-for field. The record is terminated by a line break.

Parsing example record above we get:

syslog_time server proc pid client_ip host ident auth timestamp verb uri httpversion invalid_request response bytes referrer agent xff
2020-03-14 21:34:40.000 +0000 webfrontend1 nginx 8491 192.168.0.24 192.168.0.24 NULL NULL 2016-03-14 21:34:31.000 +0000 GET //mysql/scripts/setup.php 1.1 NULL 404 477 - - [117.169.75.66, 10.24.5.39]

Query

Let’s find the top 5 countries from where requests are made from:

1
2
3
4
5
6
7
LIST(src:'s3s://spectx-docs/formats/log/nginx/nginx-access-xff-syslog.log')
| parse(pattern:FETCH('https://raw.githubusercontent.com/spectx/resources/master/examples/patterns/nginx/nginx-access-xff-syslog.sxp'))
| select(CC(xff[0]), cnt:count(*))
| group(@1)
| sort(cnt DESC)
| limit(5)
;

Note in line 3 we use the first element of the parsed array of xff field to compute client IP-country code (as opposed to use the client_ip field which contains our web frontend server IP-address).

Hint

You can download full code of the patterns and queries at https://github.com/spectx/resources/tree/master/examples/patterns/nginx