Parsing Multiline RecordsΒΆ

Multiline records are fairly common in logs. Typical example is when application error log includes an exception with stack trace. For instance the log record could consist of a timestamp, log severity and message which may be multiline:

2015.10.03 16:32:50     INFO    connecting to db ...
2015.10.03 16:32:51     ERROR   com.spectx.webconsole.jsp.data.SQLTimeSeriesCache -- SQLTimeSeries remote fetch failed
org.postgresql.util.PSQLException: Connection refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
        at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:66)
        at org.postgresql.jdbc2.AbstractJdbc2Connection.<init>(AbstractJdbc2Connection.java:125)
        at org.postgresql.jdbc3.AbstractJdbc3Connection.<init>(AbstractJdbc3Connection.java:30)
        at org.postgresql.jdbc3g.AbstractJdbc3gConnection.<init>(AbstractJdbc3gConnection.java:22)
        at org.postgresql.jdbc4.AbstractJdbc4Connection.<init>(AbstractJdbc4Connection.java:30)
        at org.postgresql.jdbc4.Jdbc4Connection.<init>(Jdbc4Connection.java:24)
        at org.postgresql.Driver.makeConnection(Driver.java:393)
        at org.postgresql.Driver.connect(Driver.java:267)
        at java.sql.DriverManager.getConnection(DriverManager.java:582)
        at java.sql.DriverManager.getConnection(DriverManager.java:154)
        at org.logicalcobwebs.proxool.Prototyper.buildConnection(Prototyper.java:159)
        at org.logicalcobwebs.proxool.ConnectionPool.getConnection(ConnectionPool.java:211)
        at org.logicalcobwebs.proxool.ProxoolDriver.connect(ProxoolDriver.java:89)
        at java.sql.DriverManager.getConnection(DriverManager.java:582)
        at java.sql.DriverManager.getConnection(DriverManager.java:207)
Caused by: java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
        at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:432)
        at java.net.Socket.connect(Socket.java:529)
        at java.net.Socket.connect(Socket.java:478)
        at java.net.Socket.<init>(Socket.java:375)
        at java.net.Socket.<init>(Socket.java:189)
        at org.postgresql.core.PGStream.<init>(PGStream.java:62)
        at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:76)
        ... 23 more

Turns out that parsing this seemingly complex message is very simple. Just think of the way the record ending is defined: instead of EOL the record can be terminated also either by beginning of next record or end of file. In our example the record begins with a timestamp and ends with an EOL followed by timestamp (beginning of next record). So, let’s define a pattern which does that:

1
2
3
4
5
6
7
8
$hdr = TIMESTAMP('yyyy.MM.dd HH:mm:ss'):time;   // each record begins with a timestamp, and we need to refer to it
                                                // in multiple places, so let's define it as header

$hdr                                            // our records all begin with timestamp
DATA:message                                    // and are followed by data until end of record
>>((EOL $hdr) | EOS)                            // with look forward we can match our defined record end sequence
                                                // without consuming them (we have to leave timestamp for next record)
(EOL|EOS);                                      // finally we need to consume last EOL or EOS (in case of last record)