Tag Archives: Data Science

Announcement: Career move

For some time, I’ve considered moving away from my current role as a CIRT team leader. As much as I enjoy exercising the analysis bits of my skillset, the operational demands of monitoring and response take a toll on my personal / family life. Despite how much I’ve enjoyed building up the incident response team at my previous employer, the time for a change has arrived. When I looked at my skills with as objective an eye as I could muster, data science seemed to make the most sense. I have an academic background in statistics, applied mathematics, and computer science. My professional background includes UNIX system administration and usage as well a long history with programming (hacking rather than software engineering). Just as importantly, I’ve developed domain expertise in privacy, digital forensics, and network security. I also speak fluent, professional-quality Spanish and have a burning desire to work on stuff that matters in the long view.

As I started to explore the possibility space here, an opportunity arose that I couldn’t ignore. So I am incredibly pumped to announce that, later this month, I will be joining the Verizon Business RISK team, working on security research and intelligence with Chris Porter and Wade Baker. In a very real sense, this also means I will return to one of my very first employers, as I was a GTE employee who stuck around for several years after we merged with Bell Atlantic to form Verizon.

Because of that, a few specific things I’d planned here won’t happen, or at least not anytime soon. I will continue to write this blog actively, but obviously some sorts of writing will fit better into my work at VZ. This blog will temporarily slow down when I officially enter my new role: while I have assurances that it doesn’t conflict with any of my new employer’s policies, I’d like to review the social media policy in detail to avoid stepping into any open manholes.

Thanks for reading and for everyone’s support. This change just means we’re stepping on the gas!

Scope expansion for data science

"Connecting to the Interweb Tubes" by Nick WheelerI’ve discussed my interest in data science and big data quite a bit on Twitter. This partly has to do with my contention that good SIEM and log analysis work should overlap significantly with data science, among other fields. It also has to do with my ongoing search for fulfillment in finding ways to work on stuff that matters (i.e. not pure infosec).

So then today I just asked the question straight out:

I got a bit of feedback from some of my usual Twitter crowd, encouraging me to simply grow the scope of this site. I have two concerns: one, will the (relatively small) existing reader base get frustrated with posts that have, at best, a tangential relationship to security? Two, will any new readers pigeonhole the blog – or me – as an information security blog, passing over the data content?

The sorts of things I intend to start including, whether here or elsewhere, include technical discussion of data analysis, walkthroughs of techniques as I’m exploring them myself, and applications in other fields. As an example, right now I have some processes running to analyze refugee trends based on data provided by the United Nations High Commissioner for Refugees.

Any thoughts, suggestions, or other pointers?

Data analysis will change the world

One of my favorite infosec thinkers, Andrew Hay, had a pair of recent posts that have given me lots to chew on.

First, he asked:

This provoked a wide-ranging conversation about what that means. We’ll find tremendous value in applying big data techniques to security data. (Actually, I think data analysis will change the world, but that’s a bit larger scope than this post can comfortable handle.) We can then start to bring in additional data feeds past what traditional SIEMs handle. Think along the lines of more OSINT, network flows, and possibly even business data. At that point, you can really start to grasp the qualitative and quantitative improvements to data protection.

The next day, he wrote an article in which he asked an oft-heard data analysis question: Where’s my ‘Minority Report’ dashboard?. We have to unpack that a little, though, because the data analysis scenes involved a few different useful things.

First, and perhaps most memorably, Cruise’s character used a gesture-based interface to work with the data he had available. As Hay notes, this tech has started to push down into consumer electronics like game consoles, but not generally into business applications like SIEM. While this might seem natural, we will have to move beyond the standard desktop metaphor and start to think of data as objects. It certainly won’t happen completely intuitively, but the long existence of similar ideas in various cultures (think mudras and sign language) and scientific research into the connection between words and gestures seems to indicate that we still have a lot of potential here.

Second, note how many disparate data feeds he had available. Apart from the fictional visualizations from the “precogs” (for which we can use surveillance video as a stand-in), he had social profiles, financial records, and more. While most of the entities we need to visualize aren’t always so human, we can assume some of the analogues I mentioned above for deploying “big data” tech. Data mining and machine learning will help here, particularly in knowledge discovery to hypothesize and test for correlations among the various data.

Third, the system latency seemed absurdly low. Try running a DB query on unstructured, near-realtime data, and tell me if it happens that immediately. While we’ve seen significant leaps in these areas, we need lots more advancement. Much of the tech today has started to move back towards a batch processing model rather than direct interaction and exploration, for example. Don’t think of this as just an engineering problem, because latency greatly matters when talking about trying to analyze data at anything remotely resembling the speed of thought.

Finally, the analyst clearly had excellent spatial reasoning skills. As younger generations continue to move into adulthood, we’ll likely see more applications of spatial reasoning. This means more research into data dimensionality: human brains don’t really visualize high-dimensional spaces very well, so we need to improve our models and analysts. It might turn out, for example, that we need to conceive of data as a hypercube as we drill down into specific nodes. Analysts already need to understand the foundations of graph theory when working in a lot of knowledge domains.

The future of data analysis excites me, and I really geek out over the possibilities. This has fractal-type potential: no matter whether we’re looking at data science from the MBA-typical “thirty-thousand foot view” or ångström altitude, we can find ways to change the world. (And if you’re working on this stuff and want some cross-domain thinking, let’s talk.)