Analyzing Activity From Blacklisted IP addresses and TOR Exit Nodes¶
From time to time, national CERTs publish lists of domains, ip-addresses associated with unlawful activity. Suppose you wanted to verify what sort of traffic is generated in your website by such ip-addresses.
EXAMPLE 1. Our task is:
- STEP 1. Verify if there has been any action in our website from these ip-addresses
- STEP 2. If yes then determine which services have been affected and if it is malicious
- STEP 3. If yes then determine which of our user accounts may have been compromised
- Our hypothetical web service includes an Apache web server exposing a static website (producing Apache access log)
- and a login service (producing an application log).
For blacklisted ip’s let’s take an example from real life. In 2016/2017, the US-CERT published an advisory GRIZZLY STEPPE. This includes a list of indicators with ip-addresses and domain names. Parsing it is not the simplest example and shows the complexity of real life tasks.
STEP 1. Our task is to cross reference the ip-addresses from CSV-formatted list against the access logs of our website to find out all requests made from these ip-addresses.
Essentially we need to perform a join between our logs and the suspect list from US-CERT. First step is to turn the latter into a selectable tuple stream:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
// a CSV format is pretty straightforward to parse: $srcp = <<<PATTERN_END LD*:indicatorValue //parse as string because of weird formatting, see Note1 below ',' LD*:type ',' LD*:comment ',' LD*:role ',' LD*:attackPhase ',' LD*:observedDate ',' LD*:handling ',' LD*:description EOL PATTERN_END; /* Note1: for some reason US-CERT guys have made decision to populate ip-addresses and domain names in a somewhat nontraditional form like: 192[.]168[.]0[.]1 We need to replace the [.] with dots before translating them into respective types of IPV4, IPV6 or STRING. This is what our user defined function $getHost(iVal) does: */ $hostPattern = <<<PATTERN_END ( // use alternative group to choose between types: IPV4:clientIpv4 | // note that each of these alternatives will be IPV6:clientIpv6 | // outputted as tuple structure [! \n]+:host ) EOS PATTERN_END; // function to handle funny formatted ip-addresses, uses $hostPattern above $getHost(iVal) = PARSE($hostPattern, REPLACE($iVal, '[.]', '.') ); // execute main query: PARSE(pattern:$srcp, src:'https://www.us-cert.gov/sites/default/files/publications/JAR-16-20296A.csv') .select($getHost(indicatorValue) as hostVal, *) //use $getHost function to parse ip-addresses //from indicatorValue field as tuple structure .filter(indicatorValue != 'INDICATOR_VALUE') //get rid of header row .select(hostVal[clientIpv4] as ipv4, //populate ipv4, ipv6 and domain names to different hostVal[clientIpv6] as ipv6, //resultset columns hostVal[host] as host, type, comment, role, attackPhase, observedDate, handling, description );
Let’s save this as /user/uscert-grizzly-steppe.sx so that we can use this as a view - it keeps our main analysis code simpler and we can re-use it also for other queries.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
// declare the stream of our webserver log @access_logs = PARSE( pattern:$[/user/examples/patterns/apache_access.sxp], src:'https://docs.spectx.com/_downloads/access.log' ); // declare the stream from suspected listings @suspected_list = @[/user/uscert-grizzly-steppe.sx] .filter(type = 'IPV4ADDR' // use only ipv4 addresses for now AND description LIKE 'It is recommended that network administrators review traffic%') .select(ipv4); // execute main query: @access_logs .filter(clientIpv4 IN (@suspected_list)) .select(timestamp, clientIpv4, cc(clientIpv4), asn_name(clientIpv4), uri, response, referrer, agent)
It is worth to mention, that you don’t have to download the published list, SpectX does that for you every time you execute the query. So you don’t have to worry about checking for updates, downloadings etc.
STEP 2. Examine access logs to identify services affected. Ok, let’s see what requests have been made:
|2017-06-24 22:47:03||SE||/googleanalytics.js||200||Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0|
|2017-06-24 22:47:03||SE||/_static/jquery.js||200||Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0|
|2017-06-24 22:47:04||SE||/_static/websupport.js||200||Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0|
|2017-06-24 22:47:09||SE||/favicon.ico||404||Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0|
|2017-06-24 22:47:18||SE||/_static/fonts/RobotoSlab-Regular.ttf||200||Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0|
|2017-10-19 12:14:27||US||/wp-login.php||404||Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A|
Looks like the ip from SE has just grabbed some of the pages from website. But the one from US has accessed our login service! Although the request was denied it is still definitely worthwile to dig further. Let’s save the result, this captures the intermediate state of this investigation and also enables proceeding with further queries without executing this one:
1 2 3 4 5
// execute main query: @access_logs .filter(clientIpv4 IN (@suspected_list)) .select(timestamp, clientIpv4, cc(clientIpv4), asn_name(clientIpv4), uri, response, referrer, agent) .save('/user/suspected_ips.sxt');
STEP 3. We now need to look into our login service logs (re-use the work done in another login activities analysis). Applying filter on ip’s from saved result from STEP2 will select login events with these ip’s
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
$srcp = <<<PATTERN_END ( TIMESTAMP('yyyy-MM-dd HH:mm:ss.SSS Z'):dateTime LD:userName IPV4:ipAddr DQS:userAgent UPPER:result )(fs='\t') EOL PATTERN_END; @logins = PARSE(pattern:$srcp, src:'http://docs.spectx.com/_downloads/psw_scan.log.sx.gz'); /* prepare suspect ip list */ @suspectIps = @[/user/doc/suspect_ip.sxt] //stream from saved resultset .select(clientIpv4, count(*)).group(@1) //aggregation gives us unique ip-addresses .select(clientIpv4); @logins .filter(ipAddr in (@suspectIps)) .select(dateTime, userName, cc(ipAddr), result, userAgent);
|2017-05-03 15:53:41||palco||US||3215||FAILED||Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A|
Curiosly, this user didn’t come up as compromised from our previous analysis. Let’s see what are his activities: just add an OR condition to our query:
1 2 3
@logins .filter(ipAddr in (@suspectIps) OR userName = 'palco') .select(dateTime, userName, cc(ipAddr), result, userAgent);
|2017-05-03 15:35:47||palco||FR||3215||SUCCESS||Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 7.0; InfoPath.3; .NET CLR 3.1.40767; Trident/6.0; en-IN)|
|2017-05-03 15:53:41||palco||US||6939||FAILED||Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A|
What we can observe is that: i) there is one more request from the same ASN as blacklisted ip, ii) all requests share the same userAgent. Most likely the first record belongs to genuine user (using Windows) and the rest are attempts to take over the account.
EXAMPLE 2. In very similar way you can check if your logs contain ip-addresses from TOR exit nodes. You already have view to retrieve the list in /user/examples/views/tor_nodes_online.sx (see how to extract examples).
Now let’s create the query. But this time using join command to cross reference ip-addresses in our access log with TOR exit nodes:
1 2 3 4 5 6
PARSE( pattern:$[/user/examples/patterns/apache_access.sxp], src:'https://docs.spectx.com/_downloads/access.log' ) .join(@[/user/examples/views/tor_nodes_online.sx] on left.clientIpv4 = right.exitAddress) .select(timestamp, clientIpv4, cc(clientIpv4), asn_name(clientIpv4), uri, response, referrer, agent, exitNode);
You can continue with looking into logins as above, if something comes up.