Continuing my series on the talks I attended at 2013 Security B-Sides, this one from Dan Hubbard (CTO OpenDNS) and Frank Denis (@thinkumbrella) called, “Building a Security Graph” demonstrated some clever analysis and insights. The OpenDNS team leveraged the massive amount of free data coming to them from machines all over the internet issuing DNS requests to OpenDNS to analyze the security posture of the internet.
For the benefit of any non-Nerds who may have drifted in, DNS is the service on the internet that translates names (i.e. www.yahoo.com) to IP addresses that the computers want. In their own words, “At OpenDNS, terabytes of data flow in and out everyday.” They have applied creativity and solid data science skills to transform the data using into security discoveries, predictive intelligence and tools.
They took the data and constructed various visualizations of the data and did statistical analysis of it in order to get a feel for the prevalence of vulnerabilities out there in the wild. The answer, not surprisingly, is that there is rather a lot of questionable activity going on. On their website, they note about 0.1% of all queries are infected. When you visit, OpenDNS’ labs.umbrella.com website, you will see two meters on the bottom right hand side of the home page, one for the number of requests they have received and another for the number of infected requests.
How predominant is Cross-Site Request Forgery (CSRF)?
As the data to which they have access is the name requests, that shapes the sort of analysis they can do with regard to security assessment. Any attack that involves some other domain (i.e. attacker) will show up in the data as domain correlations. CSRF is an obvious example. Any attack where you have to see the guts of the request/response traffic in order to assess it as such will presumably not be amenable to their analysis.
They messed about with mathematical correlations for ascertaining such information as CSRF vulnerability and did topological/statistical analysis of the internet as it was presented to them by this huge body of DNS requests. CSRF (Cross Site Request Forgery) involves tricking the user/browser into issuing requests to another domain besides the one to which they think they have connected (this other domain being the attacker’s website). So by analysing the pattern of DNS requests, one can presumably see patterns of requests that strongly suggest CSRF going on, i.e. correlations of requests to one domain followed immediately by requests to another. OpenDNS does not see the actual guts of the CSRF attack; they just see name requests that strongly imply its existence.
If you are looking for some information on how to find CSRF in your applications, there is a section on that in this whitepaper.
I have to confess, the coffee wasn’t kicking in just yet when I was attending this one and so I cannot offer any very extensive mathematical or other analysis of it. I can say simply that it was interesting to see the graphs they did of internet topology and number of requests. You can learn more on their website and blogs.
One of the points that leapt out at me was the issue of domain generation algorithms. I hadn’t really thought of that. When speaking of names, one thinks of such things as load balancing, squatting, running out of IPv4 addresses, stuff like that. I should have thought of that simply by looking at the various auto-generated caller-ID’s I see in the 6 or 7 phone spam calls I get every day.