Tag Archives: OSINT

Brain dump of DFIR and network security research ideas

Maybe I could get more of these done with this.

Maybe I could get more of these done with this.

I’ve seen several people talk about lacking ideas for research projects, often around DFIR or network security. Personally, I have the opposite problem: endless ideas for projects, often with the barest hint of a start, but not enough time to pursue them all. So I thought I’d publish a bit of a brain dump. I actually have made good progress on a few of these, and I have concrete plans around others (beyond just “wouldn’t it be cool if…”), but in any case I’d love to see other people pick them up and run with them.

If you do happen to get interested in any of the following, I wouldn’t mind a quick note to touch base to see about possibilities for collaboration or at least an acknowledgement in whatever you publish. Don’t interpret that as any sort of requirement, though; ideas have no value without execution, so all the hard work hasn’t even begun.

  • Malware
    • Classification across a large corpus
    • Automated IOC extraction and publication
  • Threat Actors
    • Profiling systems, particularly based on OSINT
    • Underanalyzed crime groups (e.g. drug cartels involvement in malware, spam, and fraud)
    • Hacktivism motivations and methods
  • Passwords
    • Cracking lab setups
    • Useful entropy calculations
  • Quantitative analysis of incidents
    • DDOS attacks (hard to get numbers on these)
    • Defacements and low-level leaks
  • Active Defense
    • Honeypots and honeyclients
    • Vocabulary or taxonomy on various methods
    • Callback Trojans in documents
    • C2 / RAT vulnerability research

Structured data frustrations

tumblr_mdaodk4eCr1rsi3mwo1_1280

I do a lot of work with data from the web, frequently (though not exclusively) as part of my work on OSINT gathering. A great deal of these data come from unstructured sources, requiring screen scraping techniques and, sometimes, a bit of head-banging. Not the fun kind, either, but the kind that ends up with me needing a new keyboard.

If you’re publishing data for other people to use freely, please make their lives a little easier. I understand that supporting an API may be serious overkill for many situations. But if you’re going to publish, say, a blog with IP addresses and domain names used by bad guys, at least set it off in a table or with a specific CSS element or something that allows people to grab it in an automated fashion. After all, you’ve already started the process just by publishing your blog in an RSS feed.

Alternately, if you have an API, please make it actually useful. As an example, the Pastebin API doesn’t provide nearly as much read utility versus write access, which they support quite well because they value inbound data far more than outbound. I like the support for listing trending pastes; a nice follow-up might be an API for listing pastes that match a search (preferably with regex support but that might be asking too much). If Pastebin provided API support for this, then they could throttle as needed (e.g. only allowing N searches in X time), while hopefully reducing the load from people trying to grab every single paste. Most of the stuff I run across that way turns out to violate their AUP and have no relevance to my work in any case.

I should have a lot more to say about Pastebin OSINT soon, for what it’s worth, hopefully before the end of December.

Far pointers: threat intel concepts and CIF-Maltego edition

Not Grover, although Andy Grove ran Intel whose segmented architecture made them necessary… wow, was Jim Henson trying to tell us something?

I wrote a post on the Verizon Business Security Blog titled Concepts in Sharing Threat Intelligence. You should read it; I hope you like it. Comments over there, please! It makes my bosses happy when you read and comment on my stuff there. And when they’re happy, I’m happy. And when I’m happy, everybody[1] is happy.

Maltego and CIF

So as part of my recent work on all things CIF, I wrote a Maltego transform with a little help from the fantastic Andrew MacPherson. Assuming you already know how to use both, then you’ll have no trouble with this.

In Maltego, in the menu bar near the top, select Manage > Local Transforms. You can call it whatever you like, such as something imaginative like “CIF lookup”, but be sure to specify the “Input entity type” as an IPv4 address. The transform set doesn’t really matter, I don’t believe, but I put it under “IP owner detail” because that seemed to make the most sense to me. Then point Maltego at the script and it should work. You’ll need to have the CIF client in /usr/local/bin or otherwise change the Popen() call in the script.

I have plans for more Maltego transforms (e.g. VirusTotal), but if you run into any issues with this one, or want something changed, please let me know. This will work just fine with Maltego Community Edition, by the way, but I highly recommend buying a Maltego commercial license if you’re doing anything serious with it. The folks there are incredibly responsive and helpful and they deserve something for all their hard work if you’re using it.

[1]: For small values of “everybody”.

OSINT monitoring with scripts

"Moleskine Concept Diagram 1" by Josh DiMauroMy last post mentioned briefly the difference between “high level” and “low level” threat intelligence.

High level intelligence includes human-understandable information that we can’t immediately parse into specific data, like a warning that “hacktivists” have targeted an organization. In contrast, low level intelligence usually consists of atomic data (network addresses, malware indicators, payment card information, etc.)

However, we should see this as a spectrum rather than a dichotomy: continuous, not discrete. As an example of this, what about monitoring social media from within your SIEM? For example, many analysts have noted the value of Pastebin as an OSINT source. So Xavier Garcia wrote a post on monitoring Pastebin leaks. This served as a basis for Xavier Mertens to post on monitoring Pastebin.com within your SIEM. Maybe you can use this to look for compromised logins on your domain, then correlate against login attempts for those accounts?

This has grown, of course, and so now we have examples of monitoring RSS feeds and tracking tweets from within a SIEM environment. If you tie this to case management (which many of us do within the SIEM, e.g. using ArcSight), then you’ve got a head start on OSINT monitoring. I suspect you could combine this with Yahoo! Pipes to monitor all sorts of loosely-structured data, whether for correlation or integration into your workflow.

Kent doctrine for security intelligence analysis

I’ve said before that log management matters, but log analysis matters more. Extracting and communicating useful information (analysis) requires collecting and storing your security data as well as processing the data quickly. But having all the data available won’t matter to anybody except auditors if you don’t use it in ways that inform good decisions. Mike Rothman of Securosis expressed this exceptionally well in his preview of the upcoming RSA Conference:

You will see a bunch of vendors talking about their new alerting engines taking advantage of these cool new data management tactics, but at the end of the day, it’s not how something gets done – it’s still what gets done.
So a Hadoop-based backend is no more inherently helpful than that 10-year-old RDBMS-based SIEM you never got to work. You still have to know what to ask the data engine to get meaningful answers. Rather than being blinded by the shininess of the BigData backend focus on how to use the tool in practice. On how to set up the queries to alert on stuff that maybe you don’t know about.

To paraphrase Socrates, unexamined data are not worth collecting. So analysis methodology and critical thinking skills matter. Rothman is spot on with this: the value of big data tech comes when you need to grow past the capabilities that traditional SIEM and RDBMS provide. By way of analogy: if you don’t understand algebra, then don’t take a course in calculus until you have the basic prerequisites down. You’ll just frustrate yourself and waste your tuition dollars.

Sherman Kent

Provided by CIA

In this vein, then, I appreciated the pointer from the OSINT and analysis training firm Treadstone 71 to a CIA paper on the background and work of Sherman Kent, the “father of intelligence analysis”.

He promoted an analytic doctrine that boils down to nine key points, listed in the CIA paper above. That doctrine applies across domains, not just for the sorts military and geopolitical analysis we expect from government intelligence agencies. I highly recommend that everyone read at least that section of the paper, but here are some applications for those of us involved in security intelligence analysis, especially in the private sector.

  1. Focus on Policymaker Concerns: What keeps your management up at night? Hopefully security isn’t the only thing, of course. So assuming that your CxOs understand the general threat landscape, analysts need to ensure that they track relevant areas that can lead to useful changes and decisions at strategic and tactical levels.
  2. Avoidance of a Personal Policy Agenda: Many analysts focus on threats that concern them for reasons outside of their organization. Maybe they disagree with the politics of the Occupy movement and overemphasize threats to entirely unrelated organizations, or worry about APT China because of Sinophobia rather than a reasoned assessment of the situation. Or maybe they want to drive decision makers to a particular tech solution. Even worse, they may use their analyses as weapons for corporate political plays. Doing that represents a disservice to the organization and an unprofessional approach.
  3. Intellectual Rigor: This area stands as-is: “Estimative judgments are based on evaluated and or­ganized data, substantive expertise, and sound, open-­minded postu­lation of assumptions. Uncertainties and gaps in in­formation are made explicit and accounted for in making predictions.”
  4. Conscious Effort to Avoid Analytic Biases: None of us can completely avoid cognitive bias, but we can make sure we understand it and try to correct for it where possible. That principally means application of the scientific method. As previously noted, whether or not faith and dogma have a place in one’s personal life, they certainly do not in one’s professional analyses.
  5. Willingness to Consider Other Judgments: Fight for your ideas, but “playing devil’s advocate” should rest on a better intellectual basis than simply spreading FUD. Recognize that others may in fact know more than you do or have insights that can help you.
  6. Systematic Use of Outside Experts: In addition to seeking out and understanding the work of other analysts, don’t restrict yourself solely to your field or even industry. Work with a community and keep bringing in fresh concepts from other disciplines.
  7. Collective Responsibility for Judgment: Eventually, your team will produce a report. You may not have agreed with everything that went into it, but that’s the way the sausage gets made. Once that report goes to its audience, support it. Throwing the rest of your analysis team under the bus by telling the audience “I told them so” doesn’t actually make you look smarter. It makes you look unprofessional. That doesn’t mean that you should ignore all criticism; rather, it means that you should be willing to take lumps with the rest of the group. If someone asks you for your opinion, give it – but clarify that it doesn’t represent the considered opinion of the rest of the team.
  8. Effective communication of policy-support information and judgments: Analysts need three core skills: domain expertise, critical thinking skills, and communication ability. This includes targeting your analysis to the level appropriate to your audience. You must be able to summarize your findings in understandable and accurate ways. And you must be able to handle points of uncertainty properly.
  9. Candid Admission of Mistakes: You won’t always be right. Admit it, and review past work to see what you can learn for improvement the next time. “Try again. Fail again. Fail better.”

Security intelligence analysts should learn from previous work, instead of simply trusting in their own domain expertise and innate intelligence. Dr. Kent led the way, and even we non-spooks can still learn from his work.

Data analysis will change the world

One of my favorite infosec thinkers, Andrew Hay, had a pair of recent posts that have given me lots to chew on.

First, he asked:

This provoked a wide-ranging conversation about what that means. We’ll find tremendous value in applying big data techniques to security data. (Actually, I think data analysis will change the world, but that’s a bit larger scope than this post can comfortable handle.) We can then start to bring in additional data feeds past what traditional SIEMs handle. Think along the lines of more OSINT, network flows, and possibly even business data. At that point, you can really start to grasp the qualitative and quantitative improvements to data protection.

The next day, he wrote an article in which he asked an oft-heard data analysis question: Where’s my ‘Minority Report’ dashboard?. We have to unpack that a little, though, because the data analysis scenes involved a few different useful things.

First, and perhaps most memorably, Cruise’s character used a gesture-based interface to work with the data he had available. As Hay notes, this tech has started to push down into consumer electronics like game consoles, but not generally into business applications like SIEM. While this might seem natural, we will have to move beyond the standard desktop metaphor and start to think of data as objects. It certainly won’t happen completely intuitively, but the long existence of similar ideas in various cultures (think mudras and sign language) and scientific research into the connection between words and gestures seems to indicate that we still have a lot of potential here.

Second, note how many disparate data feeds he had available. Apart from the fictional visualizations from the “precogs” (for which we can use surveillance video as a stand-in), he had social profiles, financial records, and more. While most of the entities we need to visualize aren’t always so human, we can assume some of the analogues I mentioned above for deploying “big data” tech. Data mining and machine learning will help here, particularly in knowledge discovery to hypothesize and test for correlations among the various data.

Third, the system latency seemed absurdly low. Try running a DB query on unstructured, near-realtime data, and tell me if it happens that immediately. While we’ve seen significant leaps in these areas, we need lots more advancement. Much of the tech today has started to move back towards a batch processing model rather than direct interaction and exploration, for example. Don’t think of this as just an engineering problem, because latency greatly matters when talking about trying to analyze data at anything remotely resembling the speed of thought.

Finally, the analyst clearly had excellent spatial reasoning skills. As younger generations continue to move into adulthood, we’ll likely see more applications of spatial reasoning. This means more research into data dimensionality: human brains don’t really visualize high-dimensional spaces very well, so we need to improve our models and analysts. It might turn out, for example, that we need to conceive of data as a hypercube as we drill down into specific nodes. Analysts already need to understand the foundations of graph theory when working in a lot of knowledge domains.

The future of data analysis excites me, and I really geek out over the possibilities. This has fractal-type potential: no matter whether we’re looking at data science from the MBA-typical “thirty-thousand foot view” or ångström altitude, we can find ways to change the world. (And if you’re working on this stuff and want some cross-domain thinking, let’s talk.)

Turning the tables with OSINT

Baby platypuses with fedoras

You never know who's on the case

The SANS ISC originally started as a place to share threat intelligence and analysis, partly based on DShield data and partly based on near-real-time input from the wider network security community. These days, it principally acts as a network security blog with little connection to active threat intel, though it does highlight patch releases.

Earlier in the week, an ISC post on OSINT tactics grabbed my attention. While we await the imminent release of the new version of Maltego (and CaseFile), other tools can help as well. FOCA (Spanish-language site) handles metadata parsing from local documents and simplifies using Google for finding interesting documents on a site, aka “Google hacking“. Apparently, it can also try some direct connections (like HTTP brute-forcing and DNS enumeration).

On a related note, as seen in the comments on that post, Cryptome released a DHS document this year entitled “Publicly Available Social Media Monitoring and Situational Awareness Initiative Update“. This really just lists a lot of publicly available social media sites, tools, and aggregators. What you need might not exist, though, and it’s worth understanding APIs like what Twitter provides. Unfortunately, the Google Social Graph API will go away this spring. I don’t know of any good replacements, but I’d love to find one.

While I have a few concerns related to civil liberties about DHS trolling through all of these, that doesn’t change the fact that your adversaries, regardless of affiliation or organization, will go about this. So while you should think about monitoring your own organization proactively, also consider the possibility and appropriateness of engaging in OSINT against them. Krypt3ia has explained this use of OSINT, though he’s not teaching you to find jihadists. That doesn’t mean, of course, that an intelligent, motivated analyst can’t research techniques and data on his own.

But this can include areas other than getting involved in geopolitical controversy. Perhaps you’re working on an investigation where you have at least some information on the attacker. In some cases, you may choose to take the additional step of gathering further data. (You also might want to consult with your legal counsel, depending on what you choose to do.) I have worked in the past on situations where we identified the attacker in great detail before notifying law enforcement. And because his OPSEC frankly sucked, we could do this through entirely legal and open methods. This distinguishes itself from “hacking back” by restraining itself to gathering information through public data sources only, rather than engaging in any vulnerability exploitation or accessing unauthorized systems and data.

Alternately, consider the uses for other types of investigations. Perhaps you’ve taken an interest in suspicions of local corruption, or you have some reason to believe that somebody on the Internet (e.g. a site, company, or charlatan^W evangelist / expert) has some dirty laundry that needs washing in the open air.

The world has changed, and we can either change with it or hide from it. If you’re already working in network security, only one choice makes any sense.