The CIRT gets a call from a concerned sysadmin who sees some ssh connections from an Eastern European country to a DMZ web server. As the investigation kicks off and the CIRT staff starts asking questions, they want to get up to speed as quickly as possible on any background they don’t already have. What does the server do? Does the sysadmin or someone else have historical logs? What network controls already exist? From there, they’ll start piecing together a timeline, finding anomalies, and generally trying to get as complete an accounting of the incident as possible.
At its core, incident response is a learning process: the responders need to learn as much as possible, starting with “known unknowns” and right into the “unknown unknowns”. And in successive incidents, the team will want to speed up that process. I put together a (naïve) diagram showing what we should attempt to achieve over time:
“Steep learning curves” really are ideal in many situations, including this one. We want to climb that line as quickly as possible. Once we pass a threshold, we can begin to contain and eradicate the intrusion. This also helps us provide the appropriate information to the organization’s leadership for the larger questions of response and future changes. But when we take longer than anticipated, or even fall behind an evolving incident – remember, the enemy has a vote – then the gap between the curves starts to incur additional costs to the organization.
Curve 2 deliberately shows a slower start than Curve 1. As we start the process of improving our tool set, workflow, and controls, a few initial stumbles will occur. Maybe you didn’t fully account for some of the deployment complexity, or perhaps the incident occurs in an area of the organization that has minimal instrumentation and management. (In fact, this latter scenario occurs with great frequency for obvious reasons.) But over time, we keep pushing that curve left, getting faster with each iteration. As we do that, we can reduce the impact to the organization, perhaps even moving further back in the kill chain.
I really like this model, but it needs evolution. What’s missing from it?
Thanks to Hacker News, I ran across the charming and thought-provoking concept of Two Things:
“You know, the Two Things. For every subject, there are really only two things you really need to know. Everything else is the application of those two things, or just not important.”
You also might think of these things as first principles, though these might represent something even more basic. After spending some time thinking about it, I came up with the following. Feel free to add your own or point out what I’ve missed.
Two things for DFIR:
- The bad guys always leave evidence behind.
- You aren’t looking for it in time.
Two things for SIEM:
- Log analysis matters more than log management.
- SIEM analysts eventually become DBAs. (Bejtlich‘s Principle)
I don’t know whether anybody else has called it that before, but I sure wish I could find the canonical reference for Bejtlich’s Principle.
We can define an analyst as a function taking data and caffeine as inputs that outputs (hopefully useful) knowledge:
But analysts need more than just good data and properly brewed coffee (or tea, if that’s your thing). We need well-written “internal code”: our thought processes, if you will. As I’ve previously mentioned, too much material focuses on the data and not enough on the processing. If you look for information on log management, you can find endless advice on how to collect your logs, and how to store them. If you look for information on SIEM systems, you can find lots of vendor “marketecture”, compliance guidance, and so forth – but not enough guidance on what to do with the information you find there.
To find what we really need, two things have to happen. First, we need to look outside the IT security echo chamber. Simply repeating the same endless mantras won’t advance the state of the art at all, but looking at other fields with related problems and finding ways to cross-pollinate certainly can bear fruit. In my view, the intelligence community has spent decades working through similar issues. Some really useful references I’ve found lately include Psychology of Intelligence Analysis (which largely discusses “Tools for Thinking” and “Cognitive Biases”). But another document, Basic Counterintelligence Analysis in a Nutshell, has much better applicability to DFIR. Some things work directly, like the section on “Analytic Traps and Mindsets”, others have simply gone out of date, and other concepts have useful analogues. For example, map analysis usually doesn’t reveal very much if invoked in a geographic context (since network links and physical proximity don’t correlate very well), but when you overlay your data on a network map, it certainly can.
So in February, I intend to take the “Basic Counterintelligence Analysis in a Nutshell” document and adapt the ideas in it to network security investigations in particular. But to do this justice takes more than a simple post, so instead of posting that here as originally intended, I’ll spend some time on it and get feedback when it’s ready. This post mostly serves the purpose of getting it out there so that my colleagues, friends, and readers can hold me accountable next month.
Chewbacha revisits the classics
Today, I had the opportunity to listen to the latest installment of Mandiant’s web series “Fresh Prints of Mal-ware”: The Nutts and Boltz of APT Persistence Mechanisms, hosted by Chris Nutt and Jason Rebholz. (The puns are strong with this one!)
The first part of this discussion consisted of some DFIR fundamentals, like looking at the file system timeline. This should include all eight time stamps in Windows / NTFS (file times and system information metadata). Rather than just start “looking for evil,” the investigator needs to start with a question. My favorite, where applicable, is to look at all system activity around the time of whatever other suspicious activity caused me to look at the system in the first place (e.g. network traffic). Another colleague mentioned using Splunk for forensic timeline research. I’ve not used this technique myself but the concept is solid.
The second part discussed persistence mechanisms in more detail, like autoruns and the various locations. On Twitter, the #m_fp discussion pointed me to two resources, one from Silent Runners and another from Trusted Signal. But they spent a good amount of time on DLL search order hijacking also, given that it doesn’t get a lot of attention but they’ve seen it in use by targeted (as opposed to opportunistic) malware.
I think this approach of revisiting fundamentals with a few new twists to keep things fresh works really well, and I hope to see more of this sort of thing from Mandiant (and whomever else!) in the future.
Posted in Conferences
Tagged Autoruns, Chris Nutt, DFIR, Jason Rebholz, Malware, Mandiant, Microsoft, NTFS, Silent Runners, Splunk, Timeline, Trusted Signal