Tag Archives: Andrew Hay

Data analysis will change the world

One of my favorite infosec thinkers, Andrew Hay, had a pair of recent posts that have given me lots to chew on.

First, he asked:

This provoked a wide-ranging conversation about what that means. We’ll find tremendous value in applying big data techniques to security data. (Actually, I think data analysis will change the world, but that’s a bit larger scope than this post can comfortable handle.) We can then start to bring in additional data feeds past what traditional SIEMs handle. Think along the lines of more OSINT, network flows, and possibly even business data. At that point, you can really start to grasp the qualitative and quantitative improvements to data protection.

The next day, he wrote an article in which he asked an oft-heard data analysis question: Where’s my ‘Minority Report’ dashboard?. We have to unpack that a little, though, because the data analysis scenes involved a few different useful things.

First, and perhaps most memorably, Cruise’s character used a gesture-based interface to work with the data he had available. As Hay notes, this tech has started to push down into consumer electronics like game consoles, but not generally into business applications like SIEM. While this might seem natural, we will have to move beyond the standard desktop metaphor and start to think of data as objects. It certainly won’t happen completely intuitively, but the long existence of similar ideas in various cultures (think mudras and sign language) and scientific research into the connection between words and gestures seems to indicate that we still have a lot of potential here.

Second, note how many disparate data feeds he had available. Apart from the fictional visualizations from the “precogs” (for which we can use surveillance video as a stand-in), he had social profiles, financial records, and more. While most of the entities we need to visualize aren’t always so human, we can assume some of the analogues I mentioned above for deploying “big data” tech. Data mining and machine learning will help here, particularly in knowledge discovery to hypothesize and test for correlations among the various data.

Third, the system latency seemed absurdly low. Try running a DB query on unstructured, near-realtime data, and tell me if it happens that immediately. While we’ve seen significant leaps in these areas, we need lots more advancement. Much of the tech today has started to move back towards a batch processing model rather than direct interaction and exploration, for example. Don’t think of this as just an engineering problem, because latency greatly matters when talking about trying to analyze data at anything remotely resembling the speed of thought.

Finally, the analyst clearly had excellent spatial reasoning skills. As younger generations continue to move into adulthood, we’ll likely see more applications of spatial reasoning. This means more research into data dimensionality: human brains don’t really visualize high-dimensional spaces very well, so we need to improve our models and analysts. It might turn out, for example, that we need to conceive of data as a hypercube as we drill down into specific nodes. Analysts already need to understand the foundations of graph theory when working in a lot of knowledge domains.

The future of data analysis excites me, and I really geek out over the possibilities. This has fractal-type potential: no matter whether we’re looking at data science from the MBA-typical “thirty-thousand foot view” or ångström altitude, we can find ways to change the world. (And if you’re working on this stuff and want some cross-domain thinking, let’s talk.)

Chroming up the facts: SIEM and IR presentation

Chroming it up doesn't actually make it go faster

I recently had the opportunity to watch the Trends in SIEM and Incident Response presentation from Narayan Makaram with HP (ArcSight), Anthony Di Bello with Guidance (EnCase), and Andrew Hay with The 451 Group. The topic addressed the specific nexus of my professional interests: log analysis and correlation for detecting and responding to incidents. While I’ve followed Hay on Twitter for a long time, I also have worked with both of the sponsoring
products for years.

Trends

The presentation identifies several primary organizational trends:

  • trying to close the gap between compromise, detection, and response
  • taking a proactive approach
  • emphasis on lessons learned through increased visibility
  • response automation key to address relentless threats

(I suppose “relentless” is the new “persistent”.)

Hay did a great job addressing issues, largely based on the 2011 Verizon DBIR. Less than 1% of organizations detect data breaches through log analysis, a number which frankly frightens me. We spend millions of dollars on log management for compliance, and then we don’t use them properly. Given how often logs shed light on an incident in hindsight (69%, according to the same study), we know that they contain the proper data and indications. At best, we just don’t know how to make sense of them, and at worst, we don’t even look. (Guess which I believe happens more often.)

On a similar note, around 28% of surveyed organizations use threat intelligence right now. This looks like a massive opportunity to me: sharing data, understanding indicators and how to use them appropriately, and generally climbing the incident response learning curve faster. Threat intel providers and analysts have a huge field of untapped potential awaiting – so, as Hay says, we need to be less Paul Blart – Mall Cop and more Tom Cruise – Minority report.

Di Bello (with Guidance Software) made some important points related to speed of response. He uses a traditional IR timeline, where a call to a help desk leads over several days to a low-level analyst going onsite for data gathering before eventually a senior analyst looks at the data and performs manual forensic analysis. We can’t stick with this model: automated data gathering based on solid alerting and event analysis can speed this up. It’s a great model for the future, and many organizations have started trying to lead the way in this trend. He discusses several example use cases, like suspicious network traffic or DLP alerts.

Inconsistent data

Unfortunately, I found the quality of the rest of the presentation highly variable.  Given their audience, they should take care to confirm the consistency of their data and ensure that their conclusions follow appropriately from the evidence presented. I understand the need for marketing in order for the sponsors to get value from the event, but puffery shouldn’t override the value for the listeners. That disappointed me, as I also use ArcSight heavily in my day-to-day operational analyses and like the product. I also use EnCase Enterprise, though less frequently and with much less satisfaction.

I just present two examples here, but they illustrate the issue that persisted through the entire presentation. This really detracted from the overall value, and I hope that future iterations will focus on the great value of this approach. The message matters and I would like to see it handled well.

For example, the HP speaker had a slide titled “Cybercrime Keeps Growing”. Among other well-publicized security breaches, he listed Google: “Accounts affected: Unknown” and “12.5 billion market cap lost”. This statistic makes me cry, and not for the intended reason. First, which data breach? The most public one that occurs to me would be the Aurora incident, and while that got a lot of press due to the details and geopolitical implications, I don’t believe they lost substantial investor confidence due to that. Second, given the economy of the last few years, attributing any market capitalization loss to this one incident ignores lots of other factors. And third, over what time period did this loss supposedly occur?

All the other listed incidents list specific costs, either financial or relating to a “processing license” revocation. With a bit of time spent on Google (ironically), I can’t find any support for that statement other than ArcSight presentations. And their mention of RBS WorldPay doesn’t seem to note that the PCI Council recertified them not long after. Also: I can’t imagine anybody who would take time out of their day for a presentation on this topic who doesn’t understand the overall risk. These sorts of slides have no value in presentations to this type of audience.

Time to respond also got some discussion, and here the Guidance representative exaggerated wildly. He claimed that EnCase Enterprise can get data from a system to confirm a compromise in seconds. In response to an audience question on this, he repeated the point. I don’t believe that this is the case except for large values of “seconds” (e.g. an hour is 3600 seconds, but that doesn’t seem to have been his intent). Even gathering metadata from memory, not to mention data on persistence mechanisms and core OS files, causes enough of a performance hit that it takes time. By itself, that’s not a knock on EnCase, but on the presentation here. That doesn’t even take into account the licensing limitations with EnCase Enterprise that greatly reduce the number of hosts from which the system can gather data simultaneously, typically in multiples of five.

These examples illustrate the feeling I had throughout, at least after Hay’s segment: not only did it consistent almost entirely of sales pitches, they didn’t even really consider the type of audience who would attend. That said, I’d welcome any corrections to my statements above. Nothing convinces an analyst like data, after all.

(Disclosure: I work for Heartland Payment Systems, also mentioned in the presentation. As always, my opinions here are my own and don’t necessarily reflect those of my employer. And I will re-emphasize that I have received no compensation or other inducements for my opinions on the products mentioned in this post.)