Tag Archives: Collective Intelligence Framework

Far pointers: threat intel concepts and CIF-Maltego edition

Not Grover, although Andy Grove ran Intel whose segmented architecture made them necessary… wow, was Jim Henson trying to tell us something?

I wrote a post on the Verizon Business Security Blog titled Concepts in Sharing Threat Intelligence. You should read it; I hope you like it. Comments over there, please! It makes my bosses happy when you read and comment on my stuff there. And when they’re happy, I’m happy. And when I’m happy, everybody[1] is happy.

Maltego and CIF

So as part of my recent work on all things CIF, I wrote a Maltego transform with a little help from the fantastic Andrew MacPherson. Assuming you already know how to use both, then you’ll have no trouble with this.

In Maltego, in the menu bar near the top, select Manage > Local Transforms. You can call it whatever you like, such as something imaginative like “CIF lookup”, but be sure to specify the “Input entity type” as an IPv4 address. The transform set doesn’t really matter, I don’t believe, but I put it under “IP owner detail” because that seemed to make the most sense to me. Then point Maltego at the script and it should work. You’ll need to have the CIF client in /usr/local/bin or otherwise change the Popen() call in the script.

I have plans for more Maltego transforms (e.g. VirusTotal), but if you run into any issues with this one, or want something changed, please let me know. This will work just fine with Maltego Community Edition, by the way, but I highly recommend buying a Maltego commercial license if you’re doing anything serious with it. The folks there are incredibly responsive and helpful and they deserve something for all their hard work if you’re using it.

[1]: For small values of “everybody”.

Developing with open-source methodologies

Over the last several months, I’ve become more heavily involved in the Collective Intelligence Framework. Wes Young and the other authors have released different parts of the code under the GNU Lesser General Public License (LGPL), which is more or less BSD-like, and the actual BSD license.

But developing free (open source) software means much more than just “having the code”. It usually implies a particular culture and way of developing software. Unlike years ago when I wrote a bunch of proprietary stuff, open source development happens in a far more collaborative fashion. This project has reminded me of the greater efficiency of a do-ocracy, where things get done by the people who do them. (Somebody smarter than me has probably put that in koan form.)

I read a parable years ago illustrating the difference between an effective and an ineffective open source programmer[1]. Imagine two programmers, Alex and Jamie, each deciding to get involved with a particular project. Alex got excellent grades as a programmer in school and has a lot of self-confidence. Jamie can get through the code but has to work a lot harder to do it. Alex can work on a feature, working in isolation for hours or days, and send some particularly clever code off to the maintainer, only to have it rejected because it doesn’t match the existing coding style and concerns from other programmers about possible side effects. Meanwhile, Jamie has a clue where to start and immediately posts some terrible prototype code to the mailing list. Conversations with other developers in IRC clarify things a little further, and after a few rapid iterations, the team accepts the patch for new feature because they’ve already seen it developed and shaped it early in the process.

In my case, most of CIF exists as Perl code, but I haven’t written Perl “in anger” for quite a few years. So while I muddled my way through the existing code base at first, it took time for me to disentangle the various bits in my head. Of course, the code exhibited a lot of typical characteristics of prototype projects that grew up faster than anyone expected, and those slowed me down a little more. But I relentlessly asked questions on the mailing lists, poked people in IRC, and did my best to ask smart questions.

The process has helped me do more in the rest of my work (since we use CIF) as well as develop and work more effectively with my distributed team. None of us share an office or much geographical proximity. Instead, we collaborate on Skype, IM, and email, with infrequent travel to see each other in meatspace. I’ve even gotten involved with a few other projects, though on a far more limited basis, contributing code or documentation.

Maybe this process doesn’t work for all software projects; I wouldn’t develop code to run a nuclear reactor this way, certainly. But for most of the behind-the-scenes software that drives our culture today, this works out beautifully and allows me to get far more done, far more effectively, and contribute back to the information ecosystem.

[1]: After hours of searching, I can’t find the source. If anyone ever points it out to me, I will happily update this post to give credit.

Introduction to the Collective Intelligence Framework

Just back off dudeCIRTs and related organizations often handle incident detection as well as response. Both of these roles produce and consume threat intelligence in different ways. For example, we often want to correlate our network traffic with OSINT indicators (known bad IP addresses and URLs, MD5 hashes of suspicious files, etc.) I’ve started looking at the Collective Intelligence Framework as a way to fulfill these needs. CIF development is sponsored by the REN-ISAC and National Science Foundation, with most of the coding (and everything else!) handled by Wes Young. Everything is open source for those of us who like – or need – to hack directly on the code.

In this article, I’ll explain CIF, give some usage examples, and discuss test deployment scenarios.

Understanding CIF

From the perspective of a user, CIF allows you to run queries against many data sources at once. If you have other private data sources available, particularly via XML (RSS), JSON, or in a file (e.g. CSV), you can incorporate those, as well as additional OSINT sources. CIF comes preconfigured for:

Use cases include manually querying the database for specific indicators (e.g. “do we have any records for this IP address?”) as well as pulling feeds of various sorts for use by security systems (e.g. “what URLs should we block at the proxy?”). CIF includes concepts of severity and confidence as well as privilege. This allows you to provide feeds of high-confidence public data to some systems while still allowing investigators to query private, unconfirmed data.

Essentially, CIF ingests data – typically on an hourly or data basis, depending on the source – indexes it on the fly for performance reasons, performs correlation analytics (e.g. so that a URL also turns into domain and IP address information), and then makes it available in feeds via various output plugins. These plugins include tables and HTML for viewing by a user, but also IPtables rules, Snort rules, JSON, and CSV for processing by other security systems.

Usage examples

Everything below comes from the Perl client. I haven’t yet dealt with the Python client, much less hacked on it, but that’s coming Soontm.

cif -q infrastructure/malware -c 50 -s medium

gives a fairly large list of IP addresses associated with malware. (I used medium severity and 50% confidence in these examples.)

Even if you don’t use a proxy server, you might find CIF useful for checking suspicious URLs:

cif -q url -c 50 -s medium -p snort

You now have a list of Snort rules to pull into your IDS.

Or if you have your own list of IP addresses to check, such as when an ongoing case has new indicators:

you can put them in a file and query each of them.

for f in `cat hostlist.txt` ; do cif -q $f >> specific-ip.txt; done

This yields another list. You might see a few lines in that example with a “private” restriction and impact as “search”. This happens because, by default, CIF will log every query for a specific indicator. A number of searches, such as from other investigators, may have significance apart from any data. However, if you don’t want CIF to log a query, just use the “-n” parameter.

If you’d like to play with it some more, contact me for an API key and the address of my semi-public CIF server. Twitter or email both work fine.

Appendix: CIF on the Amazon cloud

Amazon Web Services provide a decent platform for testing CIF or running a public instance like mine. The following assumes some familiarity with Linux administration and at least a basic understanding of the Elastic Compute Cloud (EC2).

You can start with a small instance for the installation, but you’ll quickly want to move to a medium instance at least. I run a large instance using the Ubuntu Cloud Guest server image. In general, follow the server install instructions for CIF. You’ll also want to note the specifics for Ubuntu as they contain a few workarounds you will need. Allocate an Elastic IP and register it in DNS someplace, such as with Amazon Route 53. For the Security Group, only add HTTPS and SSH. You won’t need anything else, and I recommend leaving it at this minimal state for security purposes. You’ll also need an Elastic Block Store. While you can start with 10GB, expect that to grow a few GB per week, so you’ll need to resize from time to time or create a larger volume at the beginning. While not required for CIF installation, I can’t recommend enough that you use git to manage config files. Srsly.

When installing Postgres, note that “peer” may appear in the original file instead of “ident sameuser”. Also, I did not use the values in CIF doc, as postgres didn’t like them. I left everything at the defaults except:

work_mem = 512MB
checkpoint_segments = 32

When setting up BIND9, first check /etc/resolv.conf for the IP addresses you should use as forwarders.