“Big data” isn’t just a buzzword, and it doesn’t just mean “big piles o’ bits”. It’s jargon, but it has a particular meaning:
Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.
Alternately, “big data” refers to data of such volume that storage, management, processing, and analysis present engineering challenges beyond traditional IT solutions. If it fits in, say, a traditional RDBMS setup like MySQL or Oracle, then it may be a lot of data, but it’s not “big data”.
This new tech has lots of useful applications in social policy, business intelligence, science, and IT, among others. In the SIEM world, we’ve got to start looking at applying some of this tech where it makes sense, for at least a few specific reasons:
- Traditional SQL databases don’t fit the data model. We don’t necessarily care in most SIEM implementations about meeting the ACID standard. Shoehorning our needs into what exists holds us back.
- Big data tech (specifically, NoSQL database design) allows us to focus on the area of CAP that really matters to us: Partition Tolerance. Of the remaining two, we can usually settle for Availability and eventual Consistency.
- IT organizations consistently experience significant budget pressure as organizations focus on reducing expenses. This applies even more to security, where we provide loss avoidance rather than growing top line revenues. We need architecture that allows us to use cheaper, commodity hardware while still enabling us to maintain appropriate performance.
We haven’t reached the point yet where we need to focus too strongly on particular aspects of “big data”. Do you need Hadoop? What analysis tasks fit map-reduce algorithms? Should you try to leverage Amazon EC2 or another cloud provider? As Jon Oltsik writes:
While “big data” will intersect with security intelligence, the actual “big data” technology aspects are irrelevant. CISOs need the analytics capabilities but really don’t care what’s under the hood. Let’s focus on data analysis and situational awareness and avoid a debate about OLAP, Massively-Parallel Processing (MPP), and Hadoop.
Those things will matter when building an implementation (e.g. to a vendor). SIEM users, though, should generally focus on what capabilities they actually want, such as data sources and analysis methods.
Oltsik’s piece makes another cogent point about security intelligence:
Security intelligence demands more data. Early SIEMs collected event and log data then steadily added other data sources like NetFlow, packet capture, Database Activity Monitoring (DAM), Identity and Access Management (IAM), etc. Large enterprises now regularly collect gigabytes or even terabytes of data for security intelligence, investigations, and forensics. Many existing tools can’t meet these scalability needs.
Users will see this as the real driving force: to do the job effectively, the SIEM has to do more than just bring in firewall, IDS, and operating system logs. And it needs to support better exploratory data analysis, rather than just reporting and notifications.
I don’t know of many vendors that currently have products built on this approach, though I don’t doubt we’ll see a lot of them hurriedly slapping the label on their material even when it doesn’t fit: witness the APT debacle.


