Analyzing Activity From Blacklisted IP addresses and TOR Exit Nodes¶
From time to time, national CERTs publish domain lists, ip-addresses associated with unlawful activity. Suppose you wanted to verify, what sort of traffic is generated by such ip-addresses on your website.
EXAMPLE 1. Our task is:
- STEP 1. Verify if there has been any action in our website from these ip-addresses
- STEP 2. If yes then determine which services have been affected and if it is malicious
- STEP 3. If yes then determine which of our user accounts may have been compromised
- Our hypothetical web service includes an Apache web server exposing a static website (producing an Apache access log)
- and a login service (producing an application log).
For blacklisted IPs, let’s take a real-life example. In 2016/2017, the US-CERT published an advisory GRIZZLY STEPPE. This includes a list of indicators with ip-addresses and domain names. It’s not easy to parse and shows the complexity of real life tasks.
STEP 1. Our task is to cross reference ip-addresses from CSV-formatted list against the access logs of our website to find out all requests made from these ip-addresses.
Essentially, we need to perform a join of our logs and the suspect list from US-CERT. The first step is turning the latter into a selectable record stream:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
// a CSV format is pretty straightforward to parse: $srcp = <<<PATTERN_END LD*:indicatorValue //parse as string because of weird formatting, see Note1 below ',' LD*:type ',' LD*:comment ',' LD*:role ',' LD*:attackPhase ',' LD*:observedDate ',' LD*:handling ',' LD*:description EOL PATTERN_END; /* Note1: for some reason US-CERT guys have made decision to populate ip-addresses and domain names in a somewhat nontraditional form like: 192[.]168[.]0[.]1 We need to replace the [.] with dots before translating them into respective types of IPV4, IPV6 or STRING. This is what our user defined function $getHost(iVal) does: */ $hostPattern = <<<PATTERN_END ( // use alternative group to choose between types: IPV4:clientIpv4 | // note that each of these alternatives will be IPV6:clientIpv6 | // outputted as tuple structure [! \n]+:host ) EOS PATTERN_END; // function to handle funny formatted ip-addresses, uses $hostPattern above $getHost(iVal) = PARSE($hostPattern, REPLACE($iVal, '[.]', '.') ); // execute main query: PARSE(pattern:$srcp, src:'https://www.us-cert.gov/sites/default/files/publications/JAR-16-20296A.csv') .select($getHost(indicatorValue) as hostVal, *) //use $getHost function to parse ip-addresses //from indicatorValue field as tuple structure .filter(indicatorValue != 'INDICATOR_VALUE') //get rid of header row .select(hostVal[clientIpv4] as ipv4, //populate ipv4, ipv6 and domain names to different hostVal[clientIpv6] as ipv6, //resultset columns hostVal[host] as host, type, comment, role, attackPhase, observedDate, handling, description );
Let’s save this as /user/uscert-grizzly-steppe.sx so that we can use this as a view - it keeps our main analysis code simpler and we can later re-use it for other queries.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
// declare the stream of our webserver log @access_logs = PARSE( pattern:$[/user/examples/patterns/apache_access.sxp], src:'https://docs.spectx.com/_downloads/access.log' ); // declare the stream from suspected listings @suspected_list = @[/user/uscert-grizzly-steppe.sx] .filter(type = 'IPV4ADDR' // use only ipv4 addresses for now AND description LIKE 'It is recommended that network administrators review traffic%') .select(ipv4); // execute main query: @access_logs .filter(clientIpv4 IN (@suspected_list)) .select(timestamp, clientIpv4, cc(clientIpv4), asn_name(clientIpv4), uri, response, referrer, agent)
It is worth mentioning that you don’t have to download the published list, SpectX does that for you every time you execute the query. So you don’t have to worry about checking for updates, downloads etc.
STEP 2. Examine the access logs to identify the services affected. Let’s see what kind of requests have been made:
|2017-06-24 22:47:03||SE||/googleanalytics.js||200||Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0|
|2017-06-24 22:47:03||SE||/_static/jquery.js||200||Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0|
|2017-06-24 22:47:04||SE||/_static/websupport.js||200||Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0|
|2017-06-24 22:47:09||SE||/favicon.ico||404||Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0|
|2017-06-24 22:47:18||SE||/_static/fonts/RobotoSlab-Regular.ttf||200||Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0|
|2017-10-19 12:14:27||US||/wp-login.php||404||Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A|
It looks like the IP from SE has just grabbed some of the pages from the website. But the one from the US has accessed our login service! Although the request was denied, it is still definitely worth digging further. Let’s save the result, this captures the intermediate state of this investigation and also enables proceeding with further queries without executing this one:
1 2 3 4 5
// execute main query: @access_logs .filter(clientIpv4 IN (@suspected_list)) .select(timestamp, clientIpv4, cc(clientIpv4), asn_name(clientIpv4), uri, response, referrer, agent) .save('/user/suspected_ips.sxt');
STEP 3. We now need to look into our login service logs (re-use the work done in another analysis of login activities). Applying filter on IPs from the saved result of the STEP2 will select login events with these IPs.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
$srcp = <<<PATTERN_END ( TIMESTAMP('yyyy-MM-dd HH:mm:ss.SSS Z'):dateTime LD:userName IPV4:ipAddr DQS:userAgent UPPER:result )(fs='\t') EOL PATTERN_END; @logins = PARSE(pattern:$srcp, src:'http://docs.spectx.com/_downloads/psw_scan.log.sx.gz'); /* prepare suspect ip list */ @suspectIps = @[/user/doc/suspect_ip.sxt] //stream from saved resultset .select(clientIpv4, count(*)).group(@1) //aggregation gives us unique ip-addresses .select(clientIpv4); @logins .filter(ipAddr in (@suspectIps)) .select(dateTime, userName, cc(ipAddr), result, userAgent);
|2017-05-03 15:53:41||palco||US||3215||FAILED||Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A|
Curiosly, this user didn’t come up as compromised from our previous analysis. Let’s see what are his activities by adding an OR condition to our query:
1 2 3
@logins .filter(ipAddr in (@suspectIps) OR userName = 'palco') .select(dateTime, userName, cc(ipAddr), result, userAgent);
|2017-05-03 15:35:47||palco||FR||3215||SUCCESS||Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 7.0; InfoPath.3; .NET CLR 3.1.40767; Trident/6.0; en-IN)|
|2017-05-03 15:53:41||palco||US||6939||FAILED||Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A|
What we can observe is that: i) there is one more request from the same ASN as blacklisted ip, ii) all requests share the same userAgent. Most likely the first record belongs to genuine user (using Windows) and the rest are attempts to take over the account.
EXAMPLE 2. In a very similar way you can check if your logs contain ip-addresses from the TOR exit nodes. You already have a view to retrieve the list in /user/examples/views/tor_nodes_online.sx (see how to extract examples).
Now let’s create the query. But this time using the join command to cross-reference IPaddresses in our access log with the TOR exit nodes:
1 2 3 4 5 6
PARSE( pattern:$[/user/examples/patterns/apache_access.sxp], src:'https://docs.spectx.com/_downloads/access.log' ) .join(@[/user/examples/views/tor_nodes_online.sx] on left.clientIpv4 = right.exitAddress) .select(timestamp, clientIpv4, cc(clientIpv4), asn_name(clientIpv4), uri, response, referrer, agent, exitNode);
As above, you can continue looking into logins if something comes up.