I don’t know how people get anything done in text editors that lack regex support. But then, I also don’t understand analysts doing full-text log searches instead of using fields (assuming your tool actually parses data into a structured format). And why do analysis of really large logs without parsing them first? Structured data will make it so much faster. So occasionally something happens that reminds me how much tools matter, both in terms of speed and scope. Today’s post will talk a little about getting faster by using the right tools in the right way, using a simple and straightforward log analysis example. Tomorrow’s post will talk a little more about getting better, using an equally simple and straightforward threat intelligence example.
Structured log queries
Let’s take an example here. Perhaps someone has given us a “known bad” IP address (we’ll use 192.168.13.37) and we want to check whether that address appears in our logs over, say, the previous 30 days. We work in an ArcSight shop, so we fire up the Logger interface. An uninformed analyst might simply paste the address into the query box. Such an analyst will almost certainly not get results as quickly as he’d like, and in fact might not get them. This happens because he’s doing a full text search over every row in the database, restricting only by date. (If he hasn’t even done that, then he really needs some coaching.) So the database literally has to search every field, at any start position, trying to match that literal pattern.
But then a hero appears out of the darkness, wielding knowledge for great justice! He looks at the indicator, decides that he will err on the side of caution and look for inbound or outbound traffic related to that address, and quickly writes:
sourceAddress="192.168.13.37" OR destinationAddress="192.168.13.37"
In some circumstances, our hero may encapsulate those two conditions in parentheses and restrict the search to firewall records:
categoryDeviceGroup="/Firewall" AND (sourceAddress="192.168.13.37" OR destinationAddress="192.168.13.37")
This depends on the environment in question, because this would eliminate (say) any web server logs that show activity from that address. I rarely restrict these types of searches like that, but I’ve run into situations where it made sense to do so as part of a larger set of searches.
By the way, assuming we have fairly large firewall records, we definitely do not want to dump them into a directory and just grep for the address:
grep -r 192\.168\.13\.37
As much as I love grep (and its friends sed, awk, et al.), this method will simply not perform as well in this sort of situation, for precisely the reasons stated above. However, with small enough data volume, you can get away with this.
regex for indicator sets
Now our intrepid analyst team receives a new set of intel, only this time we have multiple addresses, as in:
10.31.80.08
10.5.13.37
...
10.0.3.14
Should we run several searches, one after the other? Of course not, because then we’d miss the opportunity to do it all at once. So we load the list into our trusty text editor. If you think this means “Notepad”, or even “Wordpad”, then you need to call upon our hero from the previous section. We should instead regard any “text editor” that does not support regular expressions as a toy unworthy of the name.
Personally, I use gvim (a vi derivative with a GUI, available for most platforms). But I understand that others prefer other power tools like Notepad++ or TextMate. While I look at such tools with suspicion due to their newfangledness, we live in a multilateral society and thus all may join us in free association.
In gvim, I would issue a couple of commands:
:%s/^/sourceAddress\=\"/
:%s/$/\" OR /
This leaves us with the following:
sourceAddress="10.31.80.08" OR
sourceAddress="10.5.13.37" OR
sourceAddress="..." OR
sourceAddress="10.0.3.14" OR
Now we look at the number of lines in the buffer. Assuming we have, say, 13, we can join them all from the top line using the simple command “13J”. Strip off the last ” OR ” and you can paste that line right into the Logger query box.
We’ve seen that knowing how to make full use of our tools can greatly improve our speed at relatively simple tasks. Tomorrow, I’ll have an example of using Maltego to validate and extend threat intelligence, improving our scope. In the meantime, if you see ways to improve any of the above, or have other thoughts on this sort of thing, please comment below or ping me on Twitter.






