Tag Archives: Log Analysis

Thoughts on network security dashboards

Internet dashboardGiven the modern security environment in which our networks and systems exist, organizations like incident response teams and security operations centers focus on the state of our own systems and networks. Outside of unusually large spikes, such as those from a Slammer-scale worm, global threat level is relatively uninteresting in this context because it isn’t actionable. We might be interested in regular summaries of those global data (e.g. daily/weekly briefings), but not for minute-to-minute dashboards. So the sorts of cruft that many organizations throw into their dashboards really provide zero or even negative value: high-level threat intelligence or generic indicator feeds like those from ThreatExpert or DShield may have their uses, but not here. And the color-coded DHS-style “threat levels” provide even less value, because they tell us nothing about the risks in our own environments.

So what would we want to see on network security dashboards? Focus on the realities of modern network security, which means monitoring high-probability threats rather than what’s easiest to understand by executives or, worse, based on a decade-old threat model. We don’t care about port scans or negatively correlated IDS alerts (e.g. those for which we know the asset is not vulnerable). Firewall logs can provide some value, primarily as a proxy for netflow-type data. In general, if a perimeter security device blocks the attack, we don’t need to display it on a dashboard. The firewall or similar device did its job, and we care about detecting incidents rather than all attacks.

Dashboards need to present data that have two principal qualities: low false positive rates and immediate action available. The data should also allow the analyst to drill down to more detail for maximum utility, because data without context can create confusion and inefficiencies. In a general sense, we can divide up our dashboards into two types based on their appropriate level of visibility.

Primary

Primary dashboards need to be constantly visible and updated relatively frequently. These should include the types of data that would be on the large monitors visible to everyone in a SOC.

  • Anomalous outbound connections or extrusion detection. Outbound flows can help identify compromised systems, once you whitelist normal connections like those from your proxy servers, mail relays, etc.
  • Anomalous VPN logins, meaning those not whitelisted as coming from known-good addresses at normal times. As a first pass, you might show logins from foreign countries depending on your organization’s needs. This might be a candidate for machine learning to identify normal addresses and login times.
  • Host sweep results, such as from a tool like Mandiant Intelligent Response looking for indicators of compromise. Other host-based IDS systems may also provide value here, but they often have very high false-positive rates that limit their usefulness on dashboards.
  • Network-based malware, such as FireEye or similar. Securosis has a particularly useful introduction to this tech.
  • Bandwidth utilization, possibly plotted against normal or past data. Most network management teams already have a tool like MRTG. This helps you see DDOS attacks as they occur.
  • Correlated IDS alerts, hopefully filtered through a SIEM so that you properly integrate asset and vulnerability data. I make a point of only listing high-confidence, high-severity alerts in my dashboards.
  • WAF notifications help identify attacks using port 80 that firewalls will ignore and most regular network IDS do not handle very well. SQL injection is still a significant vector, and you ignore it at your own peril.
  • Social media monitoring may deserve its own separate process, but it can play a role here as part of OSINT monitoring. Think in terms of watching pastebin and Twitter for interesting and relevant hits. For now, I’d consider this highly experimental in most organizations.

Secondary

Secondary dashboards should be available for rapid review so that they can immediately present useful info to an analyst. These include summaries and visualizations that an analyst will want to have immediately available but will not need immediately visible at all times. As an example, my secondary dashboards include:

  • Session lists to show logged-in users, especially on VPN. Windows domain logins can also be useful, though take care with scaling issues.
  • Recent traffic, though think carefully about what to exclude here. I don’t find it useful to have a dashboard showing TCP 80 traffic to web servers, TCP 25 on SMTP relays, or UDP 53 to DNS servers. I keep those logs available, but not displayed like this. Your environment will dictate the sorts of things you want to display here, but you can start by looking at the logic for anomalous outbound connections and removing some of the filters.
  • IDS data also has value on a secondary dashboard, including uncorrelated or lower-priority alerts. You may identify attackers that use ineffective vectors before they find the ones that work.

Conclusion

This should just present a starting point to think about your own near real-time displays and dashboards. Look for what makes sense for your organization and will enable you to detect incidents, rather than possible “security-relevant” data that seems easier to understand at first glance.

Analytical tool usage: getting faster

I don’t know how people get anything done in text editors that lack regex support. But then, I also don’t understand analysts doing full-text log searches instead of using fields (assuming your tool actually parses data into a structured format). And why do analysis of really large logs without parsing them first? Structured data will make it so much faster. So occasionally something happens that reminds me how much tools matter, both in terms of speed and scope. Today’s post will talk a little about getting faster by using the right tools in the right way, using a simple and straightforward log analysis example. Tomorrow’s post will talk a little more about getting better, using an equally simple and straightforward threat intelligence example.

Structured log queries

Let’s take an example here. Perhaps someone has given us a “known bad” IP address (we’ll use 192.168.13.37) and we want to check whether that address appears in our logs over, say, the previous 30 days. We work in an ArcSight shop, so we fire up the Logger interface. An uninformed analyst might simply paste the address into the query box. Such an analyst will almost certainly not get results as quickly as he’d like, and in fact might not get them. This happens because he’s doing a full text search over every row in the database, restricting only by date. (If he hasn’t even done that, then he really needs some coaching.) So the database literally has to search every field, at any start position, trying to match that literal pattern.

But then a hero appears out of the darkness, wielding knowledge for great justice! He looks at the indicator, decides that he will err on the side of caution and look for inbound or outbound traffic related to that address, and quickly writes:

sourceAddress="192.168.13.37" OR destinationAddress="192.168.13.37"

In some circumstances, our hero may encapsulate those two conditions in parentheses and restrict the search to firewall records:

categoryDeviceGroup="/Firewall" AND (sourceAddress="192.168.13.37" OR destinationAddress="192.168.13.37")

This depends on the environment in question, because this would eliminate (say) any web server logs that show activity from that address. I rarely restrict these types of searches like that, but I’ve run into situations where it made sense to do so as part of a larger set of searches.

By the way, assuming we have fairly large firewall records, we definitely do not want to dump them into a directory and just grep for the address:

grep -r 192\.168\.13\.37

As much as I love grep (and its friends sed, awk, et al.), this method will simply not perform as well in this sort of situation, for precisely the reasons stated above. However, with small enough data volume, you can get away with this.

regex for indicator sets

Now our intrepid analyst team receives a new set of intel, only this time we have multiple addresses, as in:


10.31.80.08
10.5.13.37
...
10.0.3.14

Should we run several searches, one after the other? Of course not, because then we’d miss the opportunity to do it all at once. So we load the list into our trusty text editor. If you think this means “Notepad”, or even “Wordpad”, then you need to call upon our hero from the previous section. We should instead regard any “text editor” that does not support regular expressions as a toy unworthy of the name.

Personally, I use gvim (a vi derivative with a GUI, available for most platforms). But I understand that others prefer other power tools like Notepad++ or TextMate. While I look at such tools with suspicion due to their newfangledness, we live in a multilateral society and thus all may join us in free association.

In gvim, I would issue a couple of commands:

:%s/^/sourceAddress\=\"/
:%s/$/\" OR /

This leaves us with the following:

sourceAddress="10.31.80.08" OR
sourceAddress="10.5.13.37" OR
sourceAddress="..." OR
sourceAddress="10.0.3.14" OR

Now we look at the number of lines in the buffer. Assuming we have, say, 13, we can join them all from the top line using the simple command “13J”. Strip off the last ” OR ” and you can paste that line right into the Logger query box.

We’ve seen that knowing how to make full use of our tools can greatly improve our speed at relatively simple tasks. Tomorrow, I’ll have an example of using Maltego to validate and extend threat intelligence, improving our scope. In the meantime, if you see ways to improve any of the above, or have other thoughts on this sort of thing, please comment below or ping me on Twitter.

Scope expansion for data science

"Connecting to the Interweb Tubes" by Nick WheelerI’ve discussed my interest in data science and big data quite a bit on Twitter. This partly has to do with my contention that good SIEM and log analysis work should overlap significantly with data science, among other fields. It also has to do with my ongoing search for fulfillment in finding ways to work on stuff that matters (i.e. not pure infosec).

So then today I just asked the question straight out:

I got a bit of feedback from some of my usual Twitter crowd, encouraging me to simply grow the scope of this site. I have two concerns: one, will the (relatively small) existing reader base get frustrated with posts that have, at best, a tangential relationship to security? Two, will any new readers pigeonhole the blog – or me – as an information security blog, passing over the data content?

The sorts of things I intend to start including, whether here or elsewhere, include technical discussion of data analysis, walkthroughs of techniques as I’m exploring them myself, and applications in other fields. As an example, right now I have some processes running to analyze refugee trends based on data provided by the United Nations High Commissioner for Refugees.

Any thoughts, suggestions, or other pointers?

Adapting intelligence analysis for DFIR

We can define an analyst as a function taking data and caffeine as inputs that outputs (hopefully useful) knowledge:

analyst(data,caffeine) \to knowledge

But analysts need more than just good data and properly brewed coffee (or tea, if that’s your thing). We need well-written “internal code”: our thought processes, if you will. As I’ve previously mentioned, too much material focuses on the data and not enough on the processing. If you look for information on log management, you can find endless advice on how to collect your logs, and how to store them. If you look for information on SIEM systems, you can find lots of vendor “marketecture”, compliance guidance, and so forth – but not enough guidance on what to do with the information you find there.

To find what we really need, two things have to happen. First, we need to look outside the IT security echo chamber. Simply repeating the same endless mantras won’t advance the state of the art at all, but looking at other fields with related problems and finding ways to cross-pollinate certainly can bear fruit. In my view, the intelligence community has spent decades working through similar issues. Some really useful references I’ve found lately include Psychology of Intelligence Analysis (which largely discusses “Tools for Thinking” and “Cognitive Biases”). But another document, Basic Counterintelligence Analysis in a Nutshell, has much better applicability to DFIR. Some things work directly, like the section on “Analytic Traps and Mindsets”, others have simply gone out of date, and other concepts have useful analogues. For example, map analysis usually doesn’t reveal very much if invoked in a geographic context (since network links and physical proximity don’t correlate very well), but when you overlay your data on a network map, it certainly can.

So in February, I intend to take the “Basic Counterintelligence Analysis in a Nutshell” document and adapt the ideas in it to network security investigations in particular. But to do this justice takes more than a simple post, so instead of posting that here as originally intended, I’ll spend some time on it and get feedback when it’s ready. This post mostly serves the purpose of getting it out there so that my colleagues, friends, and readers can hold me accountable next month.

Hunting trips: network traffic log analysis

Log analysis has always struck me as one of those things that gets too much superficial attention without enough attention to detail. That is, we know that we need to do it, but we don’t talk about how we need to do it. At best, we talk about making sure we collect and archive logs. Analysis plays second fiddle, even though in reality logs without analysis provide almost no value to an organization. And you’ll find greatest value in discovery of the earliest stages of an incident rather than in hindsight to understand what went wrong. Unfortunately, less than 1% of data breach investigations in the 2011 Verizon DBIR started with log analysis and review!

The analysis ideas I present below don’t even begin to represent a comprehensive view. And of course every network is different, so you will need to think about your specific needs. But this may get you thinking in directions you hadn’t previously considered. Side benefits include analysts becoming more proficient with their tools, pushing the limits and gaps in their toolset, creating baselines of their environment, and even mentoring via shared hunting trips. These could serve as foundations for SIEM use cases, but here we’re talking about active exploratory usage by an analyst.

Hunting trips in DFIR involve actively looking for possible anomalies or indications of compromise on your network. Even if you don’t find anomalies, you’ll get a better understanding of your baselines. In this post, I’ll talk about hunting through your network traffic logs. Richard Bejtlich talks about hunting through systems as well, but I’ll save that particular discussion for another day. Further, if you do this by having a junior analyst “tag along” with a more experienced analyst (e.g. via screen sharing and chatting), you get the regular benefits of good analysis plus team-building and training.

Egress traffic

First, and most importantly, always keep in mind that we’re only identifying anomalies, not automatically classifying “bad” traffic. Nothing here can positively and without question find evil with no false positives or false negatives. It should, however, increase your efficiency in finding things that violate your policies or possibly indicate a compromise.

Compromised systems may start sending out traffic that doesn’t look like the rest of your traffic. Perhaps an attacker is trying to exfiltrate data, or a bot may simply try to contact its C&C infrastructure. So look carefully at outbound traffic logs from your perimeter firewalls. Good protocol candidates include SSH, SMTP, and IRC (yes, even now in 2011). In fact, examine all non-HTTP traffic from user subnets with suspicion.

Also look for protocol-port mismatches. Do you have HTTP traffic on high ports, or maybe even something like SSH on TCP 80? Attackers often like to overload TCP 80 to slip through loosely secured perimeter networks.

Web traffic has some unique problems. Not only does it involve a constantly changing set of endpoints, protocol evolution means that HTTP isn’t really the top-level protocol in the stack anymore. Development has rapidly left behind simple GETs and PUTs, and things like WebSockets overload ports beyond what you may realize. Still, try to analyze this traffic because so much malicious activity uses this channel.

For outbound surfing, look at your User-Agent strings: lots of spyware browser extensions will show up here. Some malware tries (poorly) to look like regular browsers and you can sometimes find it through misspellings or anomalies like default languages. A good proxy may do this, but mining the data yourself can find new threats. Look at the domains that users hit as well. Check URLs against external APIs but beware. If you get the chance and it fits your network or organization, look at destination geolocation. You may identify suspicious traffic by its destination country – if you sell widgets to farmers in Iowa, then outbound traffic to Eastern Europe or the Asia-Pac region is worth a second look. For both of these areas, applying the principle of Least Frequency of Occurrence can greatly reduce the dataset you actually need to review.

Ingress traffic

Inbound traffic to your web servers should get a close look too, using similar analysis methods as we discussed for outbound web traffic. However, take a close look at your URI query strings to find people attempting SQL injection or other forms of attack (hint: look for really long payloads). You may wish to review user agents here as well, though your mileage may vary if you run a popular web site or one with lots of global exposure. This will have particular effectiveness when analyzing traffic to API servers.

Consider looking at source geolocation as well, though as before, don’t fall into traps. In some organizations, working with your marketing or web analytics team can help you understand things and clarify your assumptions here.

The effectiveness of this part of the review may vary according to your threat model and overall security posture. For example, if you don’t have a good application security program, or if you have few users on your network, this area will matter more than egress traffic. Conversely, if you have very few exposed services, this may not deserve as much effort.

Baselines

Create some network flow baselines. You can’t know what’s anomalous until you know what’s normal. A word of caution here, though: don’t assume your baselines are already secure. You might have an existing but previously-unknown compromise. So spend time with your system administrators to identify traffic flows that don’t have an immediately obvious purpose.

What does traffic in and out of your desktop networks look like? These will necessarily differ significantly from your server networks, which need the same sort of attention. What systems usually talk to each other? Do they contact a particular set of authorized external hosts (e.g. for updates and such), especially with a defined frequency? What’s the traffic distribution across various ports? Does this vary with time of day, or day of the week?

You’ll start to build a framework of known good traffic to exclude from future analyses. As the US military teaches, the more you sweat in preparation, the less you may bleed in battle.

Conclusion

Log management matters, but log analysis matters more. Even if you have a relatively limited dataset available, start with what you have. Like tugging on the proverbial sweater thread, you will find that a little effort at the beginning can quickly unravel more than you initially might have guessed.

In the future, I’ll talk about hunting trips through your systems and other types of security data. But at any time, I welcome your thoughts and suggestions!

Hadoop and PCAP analysis

'Traffic lights' by Vit BrunnerLarge-scale PCAP Data Analysis Using Apache Hadoop: looks fascinating:

Traffic to the DNS root servers has increased and K-root produces terabytes of raw packet capture (PCAP) files every month. We were looking for a scalable and fast approach to analyse this data. In this article I will explain how we use Apache Hadoop and why we open-sourced our PCAP implementation for it.

Nice technique, but I’d like to understand a little better what sort of analysis they performed once they had the platform up and running.