Parsing TOR Exit Nodes OnlineΒΆ

The up-to-date TOR Exit nodes list is published at https://check.torproject.org/exit-addresses. Use DataBrowser to navigate to this address, and press Prepare Pattern. The data populated to Data Editor window is something like this:

ExitNode 0011BD2485AD45D984EC4159C88FC066E5E3300E
Published 2016-12-15 19:19:03
LastStatus 2016-12-15 20:02:21
ExitAddress 162.247.72.201 2016-12-15 20:05:24
ExitNode 006CC1DD17754582618DE2539DAAFE0A96962583
Published 2016-12-15 21:17:01
LastStatus 2016-12-15 22:02:23
ExitAddress 198.50.159.155 2016-12-15 22:02:52

we can see that it consists of four fields: ExitNode, Published timestamp, LastStatus timestamp and ExitAddress ip v4 address and date. Fields are separated by line feeds.

The pattern below (just replace autodected pattern with this one) will arrange fields into one record:

(
 'ExitNode ' LD:exitNode EOL
 'Published ' TIMESTAMP('yyyy-MM-dd HH:mm:ss'):published EOL
 'LastStatus ' TIMESTAMP('yyyy-MM-dd HH:mm:ss'):lastStatus EOL
)?  //the fields above can be missing, therefore we make sequence group optional
'ExitAddress ' IPV4:exitAddress ' ' TIMESTAMP('yyyy-MM-dd HH:mm:ss'):addrTime EOL
exitNode published lastStatus exitAddress addrTime
0011BD2485AD45D984EC4159C88FC066E5E3300E 2016-12-15 21:19:03.000 +0200 2016-12-15 22:02:21.000 +0200 162.247.72.201 2016-12-15 22:05:24.000 +0200
006CC1DD17754582618DE2539DAAFE0A96962583 2016-12-15 23:17:01.000 +0200 2016-12-16 00:02:23.000 +0200 198.50.159.155 2016-12-16 00:02:52.000 +0200