Yesterday, I attended the North Texas ISSA Meeting with my friends Michelle and Ryker. The talk carried the fascinating title Threatscape 2012: Finding Advanced Persistent Threats with ‘Big Data’ Analysis and Correlation, presented by A.N. Ananth, CEO of Prism Microsystems, and one of the leaders of Global DataGuard. (I think the actual presenter was not the person listed, though I might have missed that.) Among the lessons I “learned”:
- Doing recon on a target is what makes the APT “advanced”, especially if you use LinkedIn to figure out more about a company.
- “0 day” attacks are the Ebola and bird flu of information security.
- 100 Gb (sic) of data is Big Data.
- All the traffic to your web site is Big Data. But we’ve been dealing with Big Data like that since the 1970s, so we know how to handle it.
- Log data from 25 servers is Big Data, because Big Data doesn’t really have anything to do with how much data you have. (This theme came up a lot.
- Attackers only care about customer records, so clearly the talk focused heavily on real-world experience with the APT.
- The Verizon DBIR1 mostly only covers North America.
- Log analysis is advanced behavior correlation analytics.
- You must do regression testing on your network behavior adaptive learning modeling.
- Comparing one million points of data is big data but you can use metadata for higher level indicators.
- Detecting Stuxnet would have been obvious because it’s a new process.
- Databases suck for analyzing security data.
I swear I’m not winking at you. That’s an eye twitch from TPSD (Terrible Presentation Stress Disorder). Because lunchtime has arrived and I feel a little frisky now that I’ve had my Dr Pepper, let’s take just a few examples to demonstrate the already-obvious cluelessness of this presentation.
First, do a bit of basic research before you cite anything. The Verizon DBIR states in the Executive Summary on page 2:
We also welcome the Australian Federal Police (AFP), the Irish Reporting & Information Security Service (IRISS), and the Police Central eCrimes Unit (PCeU) of the London Metropolitan Police. These organizations have broadened the scope of the DBIR tremendously with regard to data breaches around the globe.
In addition, Verizon performs data breach investigations around the world, not just domestically. Page 12 shows the countries in which confirmed breaches occurred as part of the analyzed caseload. I don’t believe we published data showing the specific geographic distribution of cases, but the statement that the DBIR mostly covers only North America has no support.
Second, I’ve written a good bit here about the APT, and others have done so far more extensively. The APT – nation-state threat actors with significant cyber capabilities, usually meaning “China” or similar when used cluefully – doesn’t care so much about customer records. After all, if you own a significant chunk of the national debt of the United States, credit card numbers are small fry. When you start talking about research plans, sensitive business documents, source code, and the like, now you’ve started to address the target assets. That’s not to say that nobody cares about customer data, of course. Look at the tremendous amount of fraud coming from all over the world, largely but not exclusively centered in Russia and Eastern Europe, not to mention the “hacktivism” related breaches in 2011.
Third, while basic security measures would certainly prevent most common breaches, this doesn’t hold true for truly advanced attacks (by definition). If you think that any system that simply monitors for new processes would detect Stuxnet in an obvious manner, either you don’t really know much about enterprise monitoring or malware, or you are lying. I will charitably assume the former and recommend that you sit down and buy a beer for an actual incident responder or malware analyst to get the real story.
Finally, Big Data absolutely does have something to do with the volume of your data, though that’s not the only factor involved. The presenters correctly stated in the midst of their chaotic confusion that Big Data means data for which traditional RDBMS and similar systems just don’t work. That doesn’t mean 100 data points or even 100 gigabytes (again, charitably assuming a typo here). It means that you have so much data arriving so quickly and in such different forms (schemas) that you can’t simply stream it into a traditional database. This differs significantly from data science and analytics in which we try to find patterns and anomalies in the data, sometimes with advanced methods like machine learning and distributed computation. These two concepts aren’t identical, they’re orthogonal: you may perform analytics on smaller data sets, or you may have a very large data set that maps to well-understood models. The phrase “regression testing on your network behavior adaptive learning modeling” is gobbledygook.
1: Disclosure: I work for the Verizon RISK team that produces the DBIR, though I joined after the publication of the 2012 edition and had no hand in it.