Analyzing Activity From Blacklisted IP addresses and TOR Exit NodesΒΆ

Sample scripts: scripts.tar.gz Sample data: sample_data.tar.gz

From time to time, national CERTs publish lists of domains, ip-addresses associated with unlawful activity. Suppose you wanted to verify what sort of traffic is generated in your website by such ip-addresses.

EXAMPLE 1. Our task is:

  • STEP 1. Verify if there has been any action in our website from these ip-addresses
  • STEP 2. If yes then determine which services have been affected and if it is malicious
  • STEP 3. If yes then determine which of our user accounts may have been compromised
Our hypothetical web service includes an Apache web server exposing a static website (producing Apache access log)
and a login service (producing an application log).

For blacklisted ip’s let’s take an example from real life. In 2016/2017, the US-CERT published an advisory GRIZZLY STEPPE. This includes a list of indicators with ip-addresses and domain names. Parsing it is not the simplest example and shows the complexity of real life tasks.

STEP 1. Our task is to cross reference the ip-addresses from CSV-formatted list against the access logs of our website to find out all requests made from these ip-addresses.

Essentially we need to perform a join between our logs and the suspect list from US-CERT. First step is to turn the latter into a selectable tuple stream:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
// a CSV format is pretty straightforward to parse:
$srcp = <<<PATTERN_END
  LD*:indicatorValue    //parse as string because of weird formatting, see Note1 below
  ',' LD*:type
  ',' LD*:comment
  ',' LD*:role
  ',' LD*:attackPhase
  ',' LD*:observedDate
  ',' LD*:handling
  ',' LD*:description
EOL
PATTERN_END;

/* Note1: for some reason US-CERT guys have made decision to populate ip-addresses and domain
   names in a somewhat nontraditional form like: 192[.]168[.]0[.]1
   We need to replace the [.] with dots before translating them into respective types of IPV4,
   IPV6 or STRING. This is what our user defined function $getHost(iVal) does:
*/

$hostPattern = <<<PATTERN_END
    (                      // use alternative group to choose between types:
      IPV4:clientIpv4 |    // note that each of these alternatives will be
      IPV6:clientIpv6 |    // outputted as tuple structure
      [! \n]+:host
    )
    EOS
PATTERN_END;

// function to handle funny formatted ip-addresses, uses $hostPattern above
$getHost(iVal) = PARSE($hostPattern, REPLACE($iVal, '[.]', '.') );

// execute main query:
PARSE(pattern:$srcp, src:'https://www.us-cert.gov/sites/default/files/publications/JAR-16-20296A.csv')
 .select($getHost(indicatorValue) as hostVal, *) //use $getHost function to parse ip-addresses
                                                 //from indicatorValue field as tuple structure
 .filter(indicatorValue != 'INDICATOR_VALUE')    //get rid of header row
 .select(hostVal[clientIpv4] as ipv4,            //populate ipv4, ipv6 and domain names to different
    hostVal[clientIpv6] as ipv6,                 //resultset columns
    hostVal[host] as host,
    type,
    comment,
    role,
    attackPhase,
    observedDate,
    handling,
    description
 );

Let’s save this as /user/uscert-grizzly-steppe.sx so that we can use this as a view - it keeps our main analysis code simpler and we can re-use it also for other queries.

Now to our main query. Apache access log pattern from examples comes handy now. We use IN operator to select only ip-s from the list:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
// declare the stream of our webserver log
@access_logs = PARSE(
                 pattern:$[/user/examples/patterns/apache_access.sxp],
                 src:'https://docs.spectx.com/_downloads/access.log'
               );
// declare the stream from suspected listings
@suspected_list = @[/user/uscert-grizzly-steppe.sx]
 .filter(type = 'IPV4ADDR'  // use only ipv4 addresses for now
     AND description LIKE 'It is recommended that network administrators review traffic%')
 .select(ipv4);

// execute main query:
@access_logs
 .filter(clientIpv4 IN (@suspected_list))
 .select(timestamp, clientIpv4, cc(clientIpv4), asn_name(clientIpv4), uri, response, referrer, agent)

It is worth to mention, that you don’t have to download the published list, SpectX does that for you every time you execute the query. So you don’t have to worry about checking for updates, downloadings etc.

STEP 2. Examine access logs to identify services affected. Ok, let’s see what requests have been made:

timestamp cc uri response agent
2017-06-24 22:47:03 SE /googleanalytics.js 200 Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0
2017-06-24 22:47:03 SE /_static/jquery.js 200 Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0
2017-06-24 22:47:04 SE /_static/websupport.js 200 Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0
2017-06-24 22:47:09 SE /favicon.ico 404 Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0
2017-06-24 22:47:18 SE /_static/fonts/RobotoSlab-Regular.ttf 200 Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0
2017-10-19 12:14:27 US /wp-login.php 404 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A

Looks like the ip from SE has just grabbed some of the pages from website. But the one from US has accessed our login service! Although the request was denied it is still definitely worthwile to dig further. Let’s save the result, this captures the intermediate state of this investigation and also enables proceeding with further queries without executing this one:

1
2
3
4
5
// execute main query:
@access_logs
 .filter(clientIpv4 IN (@suspected_list))
 .select(timestamp, clientIpv4, cc(clientIpv4), asn_name(clientIpv4), uri, response, referrer, agent)
 .save('/user/suspected_ips.sxt');

STEP 3. We now need to look into our login service logs (re-use the work done in another login activities analysis). Applying filter on ip’s from saved result from STEP2 will select login events with these ip’s

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
$srcp = <<<PATTERN_END
(
  TIMESTAMP('yyyy-MM-dd HH:mm:ss.SSS Z'):dateTime
  LD:userName
  IPV4:ipAddr
  DQS:userAgent
  UPPER:result
)(fs='\t')
EOL
PATTERN_END;

@logins = PARSE(pattern:$srcp, src:'http://docs.spectx.com/_downloads/psw_scan.log.sx.gz');

/* prepare suspect ip list */
@suspectIps =
 @[/user/doc/suspect_ip.sxt]              //stream from saved resultset
 .select(clientIpv4, count(*)).group(@1)  //aggregation gives us unique ip-addresses
 .select(clientIpv4);

@logins
 .filter(ipAddr in (@suspectIps))
 .select(dateTime, userName, cc(ipAddr), result, userAgent);
dateTime userName cc ASN result userAgent
2017-05-03 15:53:41 palco US 3215 FAILED Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A

Curiosly, this user didn’t come up as compromised from our previous analysis. Let’s see what are his activities: just add an OR condition to our query:

1
2
3
@logins
 .filter(ipAddr in (@suspectIps) OR userName = 'palco')
 .select(dateTime, userName, cc(ipAddr), result, userAgent);
dateTime userName cc ASN result userAgent
2017-05-03 15:35:47 palco FR 3215 SUCCESS Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 7.0; InfoPath.3; .NET CLR 3.1.40767; Trident/6.0; en-IN)
2017-05-03 15:53:41 palco US 6939 FAILED Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A
2017-05-03 15:54:05 palco US 2686 FAILED Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A
2017-05-03 16:14:05 palco US 7922 FAILED Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A
2017-05-03 16:16:32 palco US 6939 SUCCESS Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A

What we can observe is that: i) there is one more request from the same ASN as blacklisted ip, ii) all requests share the same userAgent. Most likely the first record belongs to genuine user (using Windows) and the rest are attempts to take over the account.

EXAMPLE 2. In very similar way you can check if your logs contain ip-addresses from TOR exit nodes. You already have view to retrieve the list in /user/examples/views/tor_nodes_online.sx (see how to extract examples).

Now let’s create the query. But this time using join command to cross reference ip-addresses in our access log with TOR exit nodes:

1
2
3
4
5
6
PARSE(
     pattern:$[/user/examples/patterns/apache_access.sxp],
     src:'https://docs.spectx.com/_downloads/access.log'
)
.join(@[/user/examples/views/tor_nodes_online.sx] on left.clientIpv4 = right.exitAddress)
.select(timestamp, clientIpv4, cc(clientIpv4), asn_name(clientIpv4), uri, response, referrer, agent, exitNode);

You can continue with looking into logins as above, if something comes up.