Analyzing Activity From Blacklisted IP addresses and TOR Exit NodesΒΆ

Sample scripts: scripts.tar.gz Sample data: sample_data.tar.gz

From time to time, national CERTs publish domain lists, ip-addresses associated with unlawful activity. Suppose you wanted to verify, what sort of traffic is generated by such ip-addresses on your website.

EXAMPLE 1. Our task is:

  • STEP 1. Verify if there has been any action in our website from these ip-addresses
  • STEP 2. If yes then determine which services have been affected and if it is malicious
  • STEP 3. If yes then determine which of our user accounts may have been compromised
Our hypothetical web service includes an Apache web server exposing a static website (producing an Apache access log)
and a login service (producing an application log).

For blacklisted IPs, let’s take a real-life example. In 2016/2017, the US-CERT published an advisory GRIZZLY STEPPE. This includes a list of indicators with ip-addresses and domain names. It’s not easy to parse and shows the complexity of real life tasks.

STEP 1. Our task is to cross reference ip-addresses from CSV-formatted list against the access logs of our website to find out all requests made from these ip-addresses.

Essentially, we need to perform a join of our logs and the suspect list from US-CERT. The first step is turning the latter into a selectable record stream:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
// a CSV format is pretty straightforward to parse:
$srcp = <<<PATTERN_END
  LD*:indicatorValue    //parse as string because of weird formatting, see Note1 below
  ',' LD*:type
  ',' LD*:comment
  ',' LD*:role
  ',' LD*:attackPhase
  ',' LD*:observedDate
  ',' LD*:handling
  ',' LD*:description
EOL
PATTERN_END;

/* Note1: for some reason US-CERT guys have made decision to populate ip-addresses and domain
   names in a somewhat nontraditional form like: 192[.]168[.]0[.]1
   We need to replace the [.] with dots before translating them into respective types of IPV4,
   IPV6 or STRING. This is what our user defined function $getHost(iVal) does:
*/

$hostPattern = <<<PATTERN_END
    (                      // use alternative group to choose between types:
      IPV4:clientIpv4 |    // note that each of these alternatives will be
      IPV6:clientIpv6 |    // outputted as tuple structure
      [! \n]+:host
    )
    EOS
PATTERN_END;

// function to handle funny formatted ip-addresses, uses $hostPattern above
$getHost(iVal) = PARSE($hostPattern, REPLACE($iVal, '[.]', '.') );

// execute main query:
PARSE(pattern:$srcp, src:'https://www.us-cert.gov/sites/default/files/publications/JAR-16-20296A.csv')
 .select($getHost(indicatorValue) as hostVal, *) //use $getHost function to parse ip-addresses
                                                 //from indicatorValue field as tuple structure
 .filter(indicatorValue != 'INDICATOR_VALUE')    //get rid of header row
 .select(hostVal[clientIpv4] as ipv4,            //populate ipv4, ipv6 and domain names to different
    hostVal[clientIpv6] as ipv6,                 //resultset columns
    hostVal[host] as host,
    type,
    comment,
    role,
    attackPhase,
    observedDate,
    handling,
    description
 );

Let’s save this as /user/uscert-grizzly-steppe.sx so that we can use this as a view - it keeps our main analysis code simpler and we can later re-use it for other queries.

Now, back to our main query. Apache access log pattern from examples becomes handy now. We use IN operator to only select the IPs from the list:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
// declare the stream of our webserver log
@access_logs = PARSE(
                 pattern:$[/user/examples/patterns/apache_access.sxp],
                 src:'https://docs.spectx.com/_downloads/access.log'
               );
// declare the stream from suspected listings
@suspected_list = @[/user/uscert-grizzly-steppe.sx]
 .filter(type = 'IPV4ADDR'  // use only ipv4 addresses for now
     AND description LIKE 'It is recommended that network administrators review traffic%')
 .select(ipv4);

// execute main query:
@access_logs
 .filter(clientIpv4 IN (@suspected_list))
 .select(timestamp, clientIpv4, cc(clientIpv4), asn_name(clientIpv4), uri, response, referrer, agent)

It is worth mentioning that you don’t have to download the published list, SpectX does that for you every time you execute the query. So you don’t have to worry about checking for updates, downloads etc.

STEP 2. Examine the access logs to identify the services affected. Let’s see what kind of requests have been made:

timestamp cc uri response agent
2017-06-24 22:47:03 SE /googleanalytics.js 200 Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0
2017-06-24 22:47:03 SE /_static/jquery.js 200 Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0
2017-06-24 22:47:04 SE /_static/websupport.js 200 Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0
2017-06-24 22:47:09 SE /favicon.ico 404 Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0
2017-06-24 22:47:18 SE /_static/fonts/RobotoSlab-Regular.ttf 200 Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0
2017-10-19 12:14:27 US /wp-login.php 404 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A

It looks like the IP from SE has just grabbed some of the pages from the website. But the one from the US has accessed our login service! Although the request was denied, it is still definitely worth digging further. Let’s save the result, this captures the intermediate state of this investigation and also enables proceeding with further queries without executing this one:

1
2
3
4
5
// execute main query:
@access_logs
 .filter(clientIpv4 IN (@suspected_list))
 .select(timestamp, clientIpv4, cc(clientIpv4), asn_name(clientIpv4), uri, response, referrer, agent)
 .save('/user/suspected_ips.sxt');

STEP 3. We now need to look into our login service logs (re-use the work done in another analysis of login activities). Applying filter on IPs from the saved result of the STEP2 will select login events with these IPs.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
$srcp = <<<PATTERN_END
(
  TIMESTAMP('yyyy-MM-dd HH:mm:ss.SSS Z'):dateTime
  LD:userName
  IPV4:ipAddr
  DQS:userAgent
  UPPER:result
)(fs='\t')
EOL
PATTERN_END;

@logins = PARSE(pattern:$srcp, src:'http://docs.spectx.com/_downloads/psw_scan.log.sx.gz');

/* prepare suspect ip list */
@suspectIps =
 @[/user/doc/suspect_ip.sxt]              //stream from saved resultset
 .select(clientIpv4, count(*)).group(@1)  //aggregation gives us unique ip-addresses
 .select(clientIpv4);

@logins
 .filter(ipAddr in (@suspectIps))
 .select(dateTime, userName, cc(ipAddr), result, userAgent);
dateTime userName cc ASN result userAgent
2017-05-03 15:53:41 palco US 3215 FAILED Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A

Curiosly, this user didn’t come up as compromised from our previous analysis. Let’s see what are his activities by adding an OR condition to our query:

1
2
3
@logins
 .filter(ipAddr in (@suspectIps) OR userName = 'palco')
 .select(dateTime, userName, cc(ipAddr), result, userAgent);
dateTime userName cc ASN result userAgent
2017-05-03 15:35:47 palco FR 3215 SUCCESS Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 7.0; InfoPath.3; .NET CLR 3.1.40767; Trident/6.0; en-IN)
2017-05-03 15:53:41 palco US 6939 FAILED Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A
2017-05-03 15:54:05 palco US 2686 FAILED Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A
2017-05-03 16:14:05 palco US 7922 FAILED Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A
2017-05-03 16:16:32 palco US 6939 SUCCESS Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A

What we can observe is that: i) there is one more request from the same ASN as blacklisted ip, ii) all requests share the same userAgent. Most likely the first record belongs to genuine user (using Windows) and the rest are attempts to take over the account.

EXAMPLE 2. In a very similar way you can check if your logs contain ip-addresses from the TOR exit nodes. You already have a view to retrieve the list in /user/examples/views/tor_nodes_online.sx (see how to extract examples).

Now let’s create the query. But this time using the join command to cross-reference IPaddresses in our access log with the TOR exit nodes:

1
2
3
4
5
6
PARSE(
     pattern:$[/user/examples/patterns/apache_access.sxp],
     src:'https://docs.spectx.com/_downloads/access.log'
)
.join(@[/user/examples/views/tor_nodes_online.sx] on left.clientIpv4 = right.exitAddress)
.select(timestamp, clientIpv4, cc(clientIpv4), asn_name(clientIpv4), uri, response, referrer, agent, exitNode);

As above, you can continue looking into logins if something comes up.