Tag Archives: Forensics

Thoughts on digital forensics research

I always enjoy seeing crossover between statistics and computer science. In fact, one of my very first jobs involved using S+ (the closed-source precursor to R) writing code to support a textbook my professor was writing at the time. These days, machine learning usually comes to people’s mind for that mix, but occasionally digital forensics can make use of these techniques as well.

Stochastic forensics

At Black Hat a few weeks ago, I attended a presentation by Jonathan Grier on stochastic forensics. I had visions of Markov models of user activity and malware Monte Carlo simulations.

As it turns out, this wasn’t too far off. Essentially, the idea is that we can infer certain data from a system by looking at its collective characteristics. In other words, we can measure across a large number of individual members and observe the behavior of the body as a whole to draw conclusions. The initial case related to data exfiltration. A client organization wanted to prove that a user had copied a large number of files containing proprietary data to an external drive. Windows doesn’t normally track this information except to a very limited extent in file access times, but even that only records the last time a file was accessed. So subsequent access to a file will overwrite that time stamp and destroy any previous record. For an individual file, then, we might have significant difficulty proving that someone had copied it. Worse, the user in this case had legitimate access to the data, so any single data point would prove nothing.

By taking a statistical view of the system, however, particularly looking at entire directory trees, we can plot a histogram of last access times and compare to control data (from other directory trees not under suspicion). The observed pattern for normal usage might look one way, with most files not touched and recent accesses limited to a small set of files. But if a tree has been copied wholesale, such as via a drag-and-drop operation or zipping it up or some other recursive copy, then the access times would look different. You would (hopefully) have a clear delineation of all the files accessed at some particular time and then a sort of power law distribution following from that showing normal access patterns.

As I listened to the presentation, I noted a few weaknesses in his approach: manipulation of time stamps, for example, or perhaps other feasible explanations for this type of pattern. In particular, the file system simulator he wrote as an initial test did not strengthen his argument at all, because it essentially only verified the model he coded into the simulation rather than tell us something useful about “real” systems. In addition to explaining improved testing methods he used later, his responses mostly mollified me: this won’t always work, so you need to test carefully on a system (e.g. test to see if the AV software overwrites time stamps, etc.) And the most it will give you is circumstantial evidence pointing to the fact that something happened on the system at that time. Perhaps the user took a legitimate backup, for example. But now you have something to investigate further.

Forensic research

This led me to muse on the nature of research in information security. Sometimes we have a tendency toward the perfectionist fallacy: if it’s not perfect, then it’s worthless. In forensics in particular, this occurs for understandable because we have definitive standards of proof to meet (e.g. “preponderance of the evidence” in civil trials or “beyond a reasonable doubt” in criminal trials). So of course we really do need to look at the weaknesses of a system or an approach.

But if we find weaknesses, that shouldn’t be the end of the story. Instead, perhaps it can point the way for future research: if you think antivirus scanning will overwrite the time stamps, then test and report it. If you think that comparing access timestamp patterns only identifies anomalies, then say so and identify what sorts of anomalies might generate this pattern. Partial results can still provide value, even if not as much as we’d like. And of course further testing to invalidate a hypothesis or show problems with an approach provides great research value.

The research on stochastic forensics I discussed above will not revolutionize digital forensics. No long-standing large-scale theories will topple. On the other hand, we have an incremental result for other researchers to consider and try to validate or invalidate. We also have an idea that we can try to apply in other areas like network forensics.

Most scientific research advances, not in great leaps of intuition and revolutions that wipe the slate clean with an entirely new look at things, but in small evolutionary steps that work us closer to our goals of knowledge and information. We must treat our discipline as a science and not just an art to emulate the progress other fields of science have enjoyed.

Two Things: SIEM and DFIR edition

"Two Stick" by lucianvenutianThanks to Hacker News, I ran across the charming and thought-provoking concept of Two Things:

“You know, the Two Things. For every subject, there are really only two things you really need to know. Everything else is the application of those two things, or just not important.”

You also might think of these things as first principles, though these might represent something even more basic. After spending some time thinking about it, I came up with the following. Feel free to add your own or point out what I’ve missed.

Two things for DFIR:

  1. The bad guys always leave evidence behind.
  2. You aren’t looking for it in time.

Two things for SIEM:

  1. Log analysis matters more than log management.
  2. SIEM analysts eventually become DBAs. (Bejtlich‘s Principle)

I don’t know whether anybody else has called it that before, but I sure wish I could find the canonical reference for Bejtlich’s Principle.

Aside

Here are some articles worth reading, but which I didn’t get to discuss in more detail due to time constraints. Hopefully I’ll get around to some of the themes later. Reflections on the Oral Argument in United States v. Jones, … Continue reading

Theory versus practice: threat-centrism

Al Gore: THAT IS AN IMPROPER USE OF INTERNET TECHNOLOGYI currently work in a threat-centric role, in the sense that we detect and respond to threats as they occur. We handle malware, log analysis, and network & system forensics. So I use “threat” in a concrete sense: bits that represent the actions of outside parties who may do harm to our enterprise.

At the same time, many security roles (including an opening I’m considering at my company) focus on an “information security architecture” team. These roles often handle vulnerability assessment, data leakage prevention, and general issues of design, planning, and policy. Note that the incident response team usually exists separate from architecture, which is where I have to make some private assessments.

I’ve started taking the advice of Greg Pendergast by “assessing, to the extent possible, whether you could make this new position your own by working in the threat-centric aspects.”

This concept strikes me as really interesting: how do we work real threat data into architecture? This differs in important ways from threat modelling, in which we design systems to counter different possible threats. In theory, theory and practice are the same, but in practice, they’re completely different.

I’ve got some ideas of how that could work specifically in our enterprise, but generalized answers might be worth considering as well. For example, how do organizations handle the sharing, both inbound and outbound, of threat data? Who handles the overall architecture of security monitoring systems? What log data can you get that analysts may not even realize exists (or could exist)?

The ideas have started to flow and I look forward to seeing what happens next.

BSidesDFW 2011

Awkward hug with @kylemaxwell #BSidesDFW  on TwitpicThis past weekend, we had the local BSides DFW conference. Overall, I’d classify it as a great success, but I also want to analyze a few bits here.

The Good

Microsoft provided a really nice facility at their Dallas Technology Center. We had lots of room, good wireless signal, friendly staff (even including the security guards). I’ve criticized Microsoft heavily for years due to their technology and business practices, so I have to note that they did this very well.

Some of the talks had some first-rate stuff. Andrew Case had a particularly outstanding talk on data exfiltration. I can’t wait to see the slides and maybe mess around with Registry Decoder as well. I certainly intend to submit a talk next year, now that I have a feel for what the conference covers and the sort of audience that shows up. We also had a lock pick village and lots of presence from the EFF as well as a table from Hackers For Charity.

I should note that any security conference with kegs and kegs of beer, drink tickets, and homemade barbecue knows its audience. Being sort of a wimp, I didn’t stay for the after party but I heard it was great. And of course I loved seeing some of my friends, or in some cases meeting them in person for the first time. The volunteers and coordinators did a first-rate job, without question.

The Bad

Really, there wasn’t much. Some of the speakers lacked presentation skills, but I think that many of them simply had never done this before. And as much as I loved the facility, shuttling between the first and fourth floors lacked a bit of convenience.

But those are the largest things I could mention about the conference itself, which I think speaks volumes for how well it actually went.

The Ugly

First, I’ll note that what I say below should not reflect in any way on BSides or the hard-working coordinators who did a great job organizing this conference for no compensation other than grinning faces and a few awkward hugs.

In 2011, and for a very long time before now, overtly sexist presentations have no place whatsoever at a technical conference. One of the speakers gave a presentation in an informal style, which fits BSides perfectly. This isn’t a government-sponsored academic conference on national defense in the cyber domain or something. It’s a community-organized thing that sprouts from the grass roots.

So throwing out a bunch of slides that demean women and treat them as sexualized objects doesn’t work. I’m not a prude, and there’s a place for unsophisticated locker-room humor. This wasn’t it. As one example out of many from the same talk, a deck that includes images like one of panties on a woman’s crotch with the words “ALL YOU CAN EAT” printed on them would get most of us fired from our day jobs, and rightfully so. Showing same-sex affection for titillation and digitally altered images of (clothed) breasts does nothing but demean women and the speaker, though in different ways.

All of this detracted from what would otherwise have been a really good presentation with some interesting things to say. I hope the speaker reconsiders his actions, and I don’t plan to attend his talks in the future. This is not the sort of thing that we want to encourage in any way.

Forensic Challenge 10: Attack Visualization

I noticed with happiness yesterday that the Honeynet Project released Forensic Challenge 10. But unlike other challenges that focused on finding the right answers (hopefully including building some new tools), this one uses the data from FC5 but asks participants to create new visualizations of the attack.

This will present some interesting challenges, I think, since the data consist of system and server logs rather than network data per se. But I also think that these projects work best as a team effort, so I poked at Twitter and pulled together a few folks who’d like to get involved in a collaboration. (Anyone else who might have an interest in working with us, please let me know.) And maybe I’ll finally get some use out of that Visualizing Data book on my desk or even my old GraphViz scripts.