Maltrieve: retrieving malware for research

ThreadsAs I continued to hack on mwcrawler over the last month, I found that it didn’t really meet my needs for various reasons: slowness, difficulty of maintaining and adding sources, repeated grabbing of the same URL, and lack of response from the original author. So I’ve rewritten it and released Maltrieve, which (as the name indicates) retrieves malware directly from the sources listed at a number of sites. Improvements listed in the README include:

  • Proxy support
  • Multithreading for improved performance
  • Logging of source URLs
  • Multiple user agent support
  • Better error handling

Right now, Maltrieve only looks at four meta-sources because two of the six in mwcrawler appear offline. But I have at least four more on deck, and mwcrawler didn’t parse all of its meta-sources correctly in any case. I also know of a few bugs that I haven’t figured out how to squash yet, but the core functionality works and it needs a broader audience to bang on it. Thus, I’ve tagged this version “beta-1″. Don’t rely on this for serious production, please.

If you use it, please let me know just so I can bask in the warm glow of productivity. The project itself remains under the GPL, of course. Suggestions, bug reports, etc. also would make me happy, whether via issues and pull requests on Github, contacting me on Twitter, or comments here.

Chinese government attacking American journalism?

What a week: disclosure of compromises at the New York Times, Wall Street Journal, and Washington Post. A Java update released on a Friday evening 18 days early due to active exploitation. Twitter compromised affecting 250k users, including me. I may have more to say about the Twitter compromise later.

Journalists in China

If they don’t respect them there, they won’t respect them here.

I’ve assumed for some time that state-sponsored attackers have long targeted major media outlets, especially those who regularly report on national security issues. While we don’t need to start putting on tinfoil hats, the ill-fated Wikileaks partnership with the NYT should have provided a pretty obvious starting point for people to think about these issues. Even more obviously, at least to me, journalists have had to take OPSEC seriously for a very long time, whether due to drug cartels or US presidents unhappy with political and legal revelations. I wouldn’t characterize these incidents as an assault on our way of life, exactly, because the Fourth Estate has always had conflicts with power. We should become far more suspicious when governments don’t concern themselves with the press, because that says something about their relationships with it or, perhaps, their views of popular opinion.

An extraordinary claim requires extraordinary proof.

Others have criticized the reporting and the completeness of the stories. For what it’s worth, as noted above, I certainly don’t think claiming that governments have tried to attack journalists really presents an extraordinary claim. And I have seen enough evidence first-hand to believe that Chinese-based actors actively exploit networks around the world. Combining the two, we know how the Chinese government regards free speech and a free press.

But if you want us to believe that this represents the greatest transfer of wealth in history and all the other hyperbole that surrounds discussion of “the APT” and “China” and “cyberwar”, you need to present evidence. Declassify it, make it public, show it to the American people. If you’re a news outlet dedicated to informing the public, give us the facts. When the government wants to make a case for war, it discusses specific incidents and presents intelligence. If we face such a great threat, don’t just assert the threat, prove it. (Note: I don’t actually expect any of this to happen.)

Whether the intelligence will amount to proof, however, remains to be seen.

Konig: malware, graph theory, and fuzzy hashes

As a small personal research and learning project, I spent a few hours this weekend writing Konig. This is intended to evolve into a framework for investigating relationships between fuzzy hashes (e.g. a corpus of malware gathered with mwcrawler) using graph-theoretical methods. Underneath, it basically just marries NetworkX and ssdeep.

At the moment, the code is fairly barebones: create the hash library based on files in a particular directory, then construct a graph of the relationships between those files where the similarity exceeds a user-specified threshold. Also, please keep in mind that my Twitter bio for a while just said “I write bad code”, and for good reason: I do. The GUI purely consists of a matplotlib window and needs a lot of work. (I have less experience with interfaces than almost anything, so keep your expectations even lower). I’ve added some very basic information on the properties of the graph (order, density, etc.), as well as the ability to select the connected component that includes a node (file) of interest.

Example output:

kmaxwell@gauss:~/src/konig$ python konig.py -d ~/data/mwcrawler/unsorted/PE32 -t 90 -i PE32.json
Loading saved hash database
Calculating fuzzy hashes for all files in /home/kmaxwell/data/mwcrawler/unsorted/PE32...
Creating graph structure for files with similarity >= 90...
Name:
Type: Graph
Number of nodes: 2932
Number of edges: 265625
Average degree: 181.1903
Graph density: 0.0618185990375
Preparing plot of graph structure...

Konig screenshot

The goals here include refreshing my knowledge of graph theory, as the last time I seriously studied this stuff, I think the OJ Simpson verdict hadn’t come back. Also, this code will help pave the way for some related work I have slated to use mwcrawler and vxcage together. In fact, I really think of Konig as a proof-of-concept implementation to throw away before doing something more useful and robust.

ERMAGERD GUISE WE R NUCULAR WEPINZ NAO

My cluebat - let me show you itSenator John Kerry, now the nominee for Secretary of State, had some attention-getting statements about ‘hackers’. According to the report, he compared the threat of ‘foreign hackers’ to “modern-day, 21st century nuclear weapons”. He also said, more or less correctly, “Every day while we sit here right now certain countries are attacking our systems. They are trying to hack into classified information to various agencies of our government.”

I don’t really have a beef with the second piece there. But let’s be realistic here: in the 20th century, the United States actually did face an existential threat, and it wasn’t terrorists or hackers or child pornographers or some other country buying all our land and debt. The aptly named doctrine of Mutually Assured Destruction ensured that the whole world, and in particular the USA and USSR, knew that the other side threatened them with actual extinction. While nuclear weapons still exist in large numbers and have proliferated to at least eight (well, nine) different countries. So we can’t really suggest that nuclear weapons no longer pose a threat in 2013, though not in the same way as they did in 1962.

Certainly, the threat of cyberespionage that Senator Kerry describes exists. We can easily name a number of states at this moment (not just the typical three or four, either) that engage in this to varying levels of activity and we shouldn’t ignore it. That threat, however, doesn’t begin to compare to the destruction of the human race and possibly Earth’s viability as an environment for living things.

Apparently he also thinks diplomacy will work against this threat. Maybe that will work against some specific threat actors, in concert with other efforts as always required for diplomatic success. I also prefer a policy of talking about issues rather than threatening “kinetic response”, at least in general terms. (How many people have actually died to “cyberattack” thus far?) But espionage, whether online or offline, frequently accompanies and supports diplomacy rather than the other way around. That’s not likely to change anytime soon, and in fact the US would be duplicitous to suggest that the threat only occurs against it, or that Westphalian notions of sovereign nation-states hold the same relevancy for this conflict as they did in the Cold War.

(For a similar take, see Bill Brenner’s article.)

Go home HackMiami you are drunk

IMPORTANT UPDATE 2

The official response:

Track Updates

Recently the HackMiami 2013 Hackers Conference received several complaints from individuals within the information security community regarding the chosen titles of the speaking tracks, NewF%27s and OldF%27s.

These complaints indicated that HackMiami may risk alienating the support of a key demographic within the information security community.

We have discussed the issue at length, and decided that we did indeed plan the track titles in haste, without considering the inclusion or opinions of a very vocal minority. As such, have decided to make some changes.

In addition to the NewF%27s and OldF%27s tracks, we will be creating a new third track that tailors specifically to the audience that was offended by the original oversight, and the track will be called MoralF%27s

The MoralF%27s track will feature talks about hacktivism, digital civil liberties, ethics, legal issues, and free speech.

We hope that this correction satisfies our critics, and we invite them to submit CFPs for this track at:

http://www.hackmiami.com/cfp

Regards,

The HackMiami Conference Team


IMPORTANT UPDATE

Thank you for doing the right thing.


Recently, a colleague from a side project contacted me to ask me to submit a talk on the project to HackMiami. Like everybody else in the Western Hemisphere, I immediately thought “sweet, boondoggle!” Even if my employer (who has nothing to do with this post) wouldn’t pay for the trip, I figured I would pay my own way, because, hey! Miami!

Then I read the CFP and… well.

Just in case they change things later, here are the names and descriptions of their tracks:

Track 1 – NewF#gs – A novice track will be available for new hackers who are learning the ropes. If you have a presentation that you believe would be beneficial to the community and will give n00bs a starting point to advance their skillsets, then this is the track for you. Total presentation time is 50 minutes.

Track 2 – OldF#gs – An advanced track for the old school greybeards looking to show off their latest projects and research. If you have any hot research, code drops, vulnerability disclosures, or advanced attack methodologies that you want to present on, then this is the track for you. Total presentation time is 50 minutes.

Now, I recognize the 4chan meme. There is a place in the world for 4chan memes, and that place is 4chan, not a hacker conference with people of all backgrounds. Without really touching the LGBT issues here (which I acknowledge but lie well outside the scope of this blog), the level of unprofessionalism here would stun a rhinoceros. As my buddy and co-worker Kevin asked, what ideas did this beat out? What was worse than this? Did your first draft have “fresh hos” and “used up hos” for the track names and you rejected that for being disrespectful to women? “We need to be more inclusive, guys.”

And hey, if you think any of us want our names and professional reputations hooked up with those terms, you have lost your ever-lovin’ mind.

In sections of the infosec community, we’re having all these discussions about misogyny, privilege, and anti-harassment policies. And then HackMiami decides to name their tracks after childish homophobic little memes from the seedy underbelly of the Internet. Not cool, dudes. Welcome to my list of “conferences I won’t attend because the organizers are scumbags who annoy the crap out of me”.

On Aaron Swartz and hacktivism

With enough coffee, anything is possible

By now, nearly everybody who would read this blog has probably heard about Aaron Swartz’s suicide. I didn’t know Aaron, though I wish I could have. Many people whom I respect and admire have written eloquently about his life and legacy: Philip Greenspun, Lawrence Lessig, and Tim Berners-Lee. This has left me a lot to think about, from depression (a subject with which I have more personal and intimate familiarity than almost anyone knows) to programming to prosecutorial discretion.

I’ve been thinking for some time on “hacktivism 3.0″, which is a somewhat-misleading term because none of this has truly developed linearly. But if hacktivism has (d)evolved from cDc’s original declaration to Anonymous-style DDOS, it has also grown into full-blown activism “using our powers for good”, changing the world through code and a deep understanding of the technologies that now connect us and define so much of our lives (and not just in the First World). That might mean anything from volunteering at the computer lab at your local library or school to moderating online support communities to running a Tor relay to working with organizations like Citizen Lab.

The need for us – and by us, I mean all hackers – to get involved in making the world a better place is not directly political nor religious and certainly not partisan. I have a deeply ingrained belief that everyone should use their talents, skills, and abilities to try to help people around them. For some, that could mean getting involved in politics or religion, certainly, but for others, it could mean something else.

So don’t wait. Brew a pot of coffee and get to work. If you’ve been considering getting involved with a project, do it. If you already have a cause that matters to you, start doing something you can do. The world needs us right now.

Getting into the guts of mwcrawler

Earlier this week, my buddy Ken Pryor mentioned a project with which I had no prior familiarity:

So I went over and dug into mwcrawler. From the project README:

mwcrawler is a simple python script that parses malicious url lists from well-known websites (i.e. MDL, Malc0de) in order to automatically download the malicious code. It can be used to populate malware repositories or zoos.

It turns out that it really is pretty simple and hackish, which fits my needs perfectly. This is all a very experimental side project just to keep me amused during the (relatively) cold weather here in Texas.

Given how much I already love Github, I forked the project, then made a few improvements to allow for the use of a proxy (for OPSEC reasons) and to specify a dump directory from the command line. Requiring the user to modify source just to change config options works fine for alpha, but a little bit of polish goes a long way. I’ve also started implementing some logging to keep the metadata (like source URLs for each file). And yes, I’ve submitted pull requests, but neither mine nor the user agent randomization patch from Ben Jackson have gotten any response from the project owner. Hopefully that will change now that the holidays have finally run their course.

Now once I have all this data, I wanted to do something with it. Just for messing around, I went with the old standby of ssdeep to find relationships. That doesn’t mean it’s a final step at all; this weekend, I’ll run them through VirusTotal API, for example, to classify known samples by hash, and perhaps also incorporate something like pyew for clustered analysis to pull out interesting features. And it features integration with thug, which I’ve not started running yet. Some bugs still exist, like unhandled exceptions when the script can’t reach the page or dependence on the semi-deprecated Beautiful Soup 3.

But my current tiny little repository includes 227 MB in 344 PE32 executables (not counting other file types like archives and such). As an extremely simple preview, even basic fuzzy hashing as mentioned above creates some interesting clusters (graph generated with awk and Maltego):

mwcrawler-ssdeep