Tag Archives: Hadoop

Kent doctrine for security intelligence analysis

I’ve said before that log management matters, but log analysis matters more. Extracting and communicating useful information (analysis) requires collecting and storing your security data as well as processing the data quickly. But having all the data available won’t matter to anybody except auditors if you don’t use it in ways that inform good decisions. Mike Rothman of Securosis expressed this exceptionally well in his preview of the upcoming RSA Conference:

You will see a bunch of vendors talking about their new alerting engines taking advantage of these cool new data management tactics, but at the end of the day, it’s not how something gets done – it’s still what gets done.
So a Hadoop-based backend is no more inherently helpful than that 10-year-old RDBMS-based SIEM you never got to work. You still have to know what to ask the data engine to get meaningful answers. Rather than being blinded by the shininess of the BigData backend focus on how to use the tool in practice. On how to set up the queries to alert on stuff that maybe you don’t know about.

To paraphrase Socrates, unexamined data are not worth collecting. So analysis methodology and critical thinking skills matter. Rothman is spot on with this: the value of big data tech comes when you need to grow past the capabilities that traditional SIEM and RDBMS provide. By way of analogy: if you don’t understand algebra, then don’t take a course in calculus until you have the basic prerequisites down. You’ll just frustrate yourself and waste your tuition dollars.

Sherman Kent

Provided by CIA

In this vein, then, I appreciated the pointer from the OSINT and analysis training firm Treadstone 71 to a CIA paper on the background and work of Sherman Kent, the “father of intelligence analysis”.

He promoted an analytic doctrine that boils down to nine key points, listed in the CIA paper above. That doctrine applies across domains, not just for the sorts military and geopolitical analysis we expect from government intelligence agencies. I highly recommend that everyone read at least that section of the paper, but here are some applications for those of us involved in security intelligence analysis, especially in the private sector.

  1. Focus on Policymaker Concerns: What keeps your management up at night? Hopefully security isn’t the only thing, of course. So assuming that your CxOs understand the general threat landscape, analysts need to ensure that they track relevant areas that can lead to useful changes and decisions at strategic and tactical levels.
  2. Avoidance of a Personal Policy Agenda: Many analysts focus on threats that concern them for reasons outside of their organization. Maybe they disagree with the politics of the Occupy movement and overemphasize threats to entirely unrelated organizations, or worry about APT China because of Sinophobia rather than a reasoned assessment of the situation. Or maybe they want to drive decision makers to a particular tech solution. Even worse, they may use their analyses as weapons for corporate political plays. Doing that represents a disservice to the organization and an unprofessional approach.
  3. Intellectual Rigor: This area stands as-is: “Estimative judgments are based on evaluated and or­ganized data, substantive expertise, and sound, open-­minded postu­lation of assumptions. Uncertainties and gaps in in­formation are made explicit and accounted for in making predictions.”
  4. Conscious Effort to Avoid Analytic Biases: None of us can completely avoid cognitive bias, but we can make sure we understand it and try to correct for it where possible. That principally means application of the scientific method. As previously noted, whether or not faith and dogma have a place in one’s personal life, they certainly do not in one’s professional analyses.
  5. Willingness to Consider Other Judgments: Fight for your ideas, but “playing devil’s advocate” should rest on a better intellectual basis than simply spreading FUD. Recognize that others may in fact know more than you do or have insights that can help you.
  6. Systematic Use of Outside Experts: In addition to seeking out and understanding the work of other analysts, don’t restrict yourself solely to your field or even industry. Work with a community and keep bringing in fresh concepts from other disciplines.
  7. Collective Responsibility for Judgment: Eventually, your team will produce a report. You may not have agreed with everything that went into it, but that’s the way the sausage gets made. Once that report goes to its audience, support it. Throwing the rest of your analysis team under the bus by telling the audience “I told them so” doesn’t actually make you look smarter. It makes you look unprofessional. That doesn’t mean that you should ignore all criticism; rather, it means that you should be willing to take lumps with the rest of the group. If someone asks you for your opinion, give it – but clarify that it doesn’t represent the considered opinion of the rest of the team.
  8. Effective communication of policy-support information and judgments: Analysts need three core skills: domain expertise, critical thinking skills, and communication ability. This includes targeting your analysis to the level appropriate to your audience. You must be able to summarize your findings in understandable and accurate ways. And you must be able to handle points of uncertainty properly.
  9. Candid Admission of Mistakes: You won’t always be right. Admit it, and review past work to see what you can learn for improvement the next time. “Try again. Fail again. Fail better.”

Security intelligence analysts should learn from previous work, instead of simply trusting in their own domain expertise and innate intelligence. Dr. Kent led the way, and even we non-spooks can still learn from his work.

3 reasons why big data matters for SIEM

"Nesting Dolls" by Andy Ihnatko“Big data” isn’t just a buzzword, and it doesn’t just mean “big piles o’ bits”. It’s jargon, but it has a particular meaning:

Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.

Alternately, “big data” refers to data of such volume that storage, management, processing, and analysis present engineering challenges beyond traditional IT solutions. If it fits in, say, a traditional RDBMS setup like MySQL or Oracle, then it may be a lot of data, but it’s not “big data”.

This new tech has lots of useful applications in social policy, business intelligence, science, and IT, among others. In the SIEM world, we’ve got to start looking at applying some of this tech where it makes sense, for at least a few specific reasons:

  1. Traditional SQL databases don’t fit the data model. We don’t necessarily care in most SIEM implementations about meeting the ACID standard. Shoehorning our needs into what exists holds us back.
  2. Big data tech (specifically, NoSQL database design) allows us to focus on the area of CAP that really matters to us: Partition Tolerance. Of the remaining two, we can usually settle for Availability and eventual Consistency.
  3. IT organizations consistently experience significant budget pressure as organizations focus on reducing expenses. This applies even more to security, where we provide loss avoidance rather than growing top line revenues. We need architecture that allows us to use cheaper, commodity hardware while still enabling us to maintain appropriate performance.

We haven’t reached the point yet where we need to focus too strongly on particular aspects of “big data”. Do you need Hadoop? What analysis tasks fit map-reduce algorithms? Should you try to leverage Amazon EC2 or another cloud provider? As Jon Oltsik writes:

While “big data” will intersect with security intelligence, the actual “big data” technology aspects are irrelevant. CISOs need the analytics capabilities but really don’t care what’s under the hood. Let’s focus on data analysis and situational awareness and avoid a debate about OLAP, Massively-Parallel Processing (MPP), and Hadoop.

Those things will matter when building an implementation (e.g. to a vendor). SIEM users, though, should generally focus on what capabilities they actually want, such as data sources and analysis methods.

Oltsik’s piece makes another cogent point about security intelligence:

Security intelligence demands more data. Early SIEMs collected event and log data then steadily added other data sources like NetFlow, packet capture, Database Activity Monitoring (DAM), Identity and Access Management (IAM), etc. Large enterprises now regularly collect gigabytes or even terabytes of data for security intelligence, investigations, and forensics. Many existing tools can’t meet these scalability needs.

Users will see this as the real driving force: to do the job effectively, the SIEM has to do more than just bring in firewall, IDS, and operating system logs. And it needs to support better exploratory data analysis, rather than just reporting and notifications.

I don’t know of many vendors that currently have products built on this approach, though I don’t doubt we’ll see a lot of them hurriedly slapping the label on their material even when it doesn’t fit: witness the APT debacle.

Hadoop and PCAP analysis

'Traffic lights' by Vit BrunnerLarge-scale PCAP Data Analysis Using Apache Hadoop: looks fascinating:

Traffic to the DNS root servers has increased and K-root produces terabytes of raw packet capture (PCAP) files every month. We were looking for a scalable and fast approach to analyse this data. In this article I will explain how we use Apache Hadoop and why we open-sourced our PCAP implementation for it.

Nice technique, but I’d like to understand a little better what sort of analysis they performed once they had the platform up and running.