Category Archives: Manifesto

Ethics versus economics for security research

Independent security researchers often have a reputation as narcissistic vulnerability pimps (true or not), but the environment which has evolved around information security largely drives this. This came to a head for me tonight in a Twitter discussion kicked off by Steve Werby:

Creating an exploit can often pay anywhere between 1k and 100k (or possibly more in specific circumstances), depending on the researcher’s choice of market and product (or technology). This even affects areas that many users believe unrelated, like mobile OS jailbreaks, which essentially consist of exploits to gain root control despite the operating system’s best efforts to the contrary.

No equivalent market exists for threat-related research. Freelance malware analysts don’t have similar economic drivers because organizations with an interest in this information generally do the research themselves. You can’t monetize malware or attribution the same way. Put another way, nobody believes that Krebs and Danchev get rich from what they do. I don’t think we can “fix” this with the market, although I’d welcome discussion of ideas or evidence to the contrary. But we need to recognize this when thinking about issues around software security and threat identification.

Believing that security, on its own, adds value often turns into a form of the broken windows fallacy. And creating artificial demand for threat intelligence could lead to all sorts of perverse incentives. Some of the same organizations interested in purchasing vulnerabilities and exploits might have an interest in highly-focused intelligence, such as espionage on particular threat groups, but at this point the line between “offense” and “defense” becomes very fuzzy.

I’d love to hear alternate viewpoints and suggestions on where this can go.

On Aaron Swartz and hacktivism

With enough coffee, anything is possible

By now, nearly everybody who would read this blog has probably heard about Aaron Swartz’s suicide. I didn’t know Aaron, though I wish I could have. Many people whom I respect and admire have written eloquently about his life and legacy: Philip Greenspun, Lawrence Lessig, and Tim Berners-Lee. This has left me a lot to think about, from depression (a subject with which I have more personal and intimate familiarity than almost anyone knows) to programming to prosecutorial discretion.

I’ve been thinking for some time on “hacktivism 3.0″, which is a somewhat-misleading term because none of this has truly developed linearly. But if hacktivism has (d)evolved from cDc’s original declaration to Anonymous-style DDOS, it has also grown into full-blown activism “using our powers for good”, changing the world through code and a deep understanding of the technologies that now connect us and define so much of our lives (and not just in the First World). That might mean anything from volunteering at the computer lab at your local library or school to moderating online support communities to running a Tor relay to working with organizations like Citizen Lab.

The need for us – and by us, I mean all hackers – to get involved in making the world a better place is not directly political nor religious and certainly not partisan. I have a deeply ingrained belief that everyone should use their talents, skills, and abilities to try to help people around them. For some, that could mean getting involved in politics or religion, certainly, but for others, it could mean something else.

So don’t wait. Brew a pot of coffee and get to work. If you’ve been considering getting involved with a project, do it. If you already have a cause that matters to you, start doing something you can do. The world needs us right now.

Structured data frustrations

tumblr_mdaodk4eCr1rsi3mwo1_1280

I do a lot of work with data from the web, frequently (though not exclusively) as part of my work on OSINT gathering. A great deal of these data come from unstructured sources, requiring screen scraping techniques and, sometimes, a bit of head-banging. Not the fun kind, either, but the kind that ends up with me needing a new keyboard.

If you’re publishing data for other people to use freely, please make their lives a little easier. I understand that supporting an API may be serious overkill for many situations. But if you’re going to publish, say, a blog with IP addresses and domain names used by bad guys, at least set it off in a table or with a specific CSS element or something that allows people to grab it in an automated fashion. After all, you’ve already started the process just by publishing your blog in an RSS feed.

Alternately, if you have an API, please make it actually useful. As an example, the Pastebin API doesn’t provide nearly as much read utility versus write access, which they support quite well because they value inbound data far more than outbound. I like the support for listing trending pastes; a nice follow-up might be an API for listing pastes that match a search (preferably with regex support but that might be asking too much). If Pastebin provided API support for this, then they could throttle as needed (e.g. only allowing N searches in X time), while hopefully reducing the load from people trying to grab every single paste. Most of the stuff I run across that way turns out to violate their AUP and have no relevance to my work in any case.

I should have a lot more to say about Pastebin OSINT soon, for what it’s worth, hopefully before the end of December.

Thoughts on STIX

Carrot Bomb

Threats should be properly understood.

At Black Hat, I saw a presentation by Sean Barnum from MITRE on Structured Threat Information eXpression (STIX). This builds on their work with Cyber Observable eXpression (CybOX), another standard that more or less exists at the same level in the “stack” as OpenIOC. I don’t think I can say much right now about the differences and possible advantages of CybOX or OpenIOC over the other, mostly because I don’t think I understand them quite to that level of detail.

But STIX tries to organize that IOC / observable information and give it some context. For example, using STIX/CybOX, you could collect together a set of observables, link them together into an indicator, and then associate that indicator with known TTPs. That TTP would possibly link to other indicators to help with confirmation and attribution to a threat actor or campaign.

I’d like at some point soon to build up a “link database”, something like a triplestore (graph database) that represents all its data in a Subject-Predicate-Object form.

$hash BELONGSTO $malware
$twitterhandle BELONGSTO $person
$ipaddress ISA $sshscanner

(This probably doesn’t look anything like the eventual ontology, and my syntax may or may not fit the RDF standard, but it should illustrate the idea.)

As we build up these links, now we start to have a database of objects (possibly using RDF or similar) that link together in various ways, allowing you to perform appropriate analyses to identify clusters of knowledge. This alternate structure probably isn’t usable for STIX per se, but that doesn’t mean a STIX document couldn’t be an object. For example, if we have a STIX document explaining the details of a piece of malware, you could represent that as

$malware ISDESCRIBEDBY $stixdocument

And then once we get to that $malware node in your graph DB, now we know where to go for the full low-down. This would work the same if we have a document (dossier) describing a particular person.

To visualize it, think of something like a Maltego or Casefile graph, only much larger than you can reasonably manipulate with that tool. Which is not a criticism; I love Maltego and use it almost every day, but it is a tool for a particular set of use cases. Ideally, you could export a subgraph to a Maltego file, or of course the inverse: use Maltego to do a bunch of research on something, then import all that data into your DB for archival and possible linking to other nuggets of information.

I’ve done some very exploratory work on this, mostly just prototype code related to working with Maltego transforms. Of course my involvement with CIF also has provided some great lessons of all sorts. But I think that, as a question of knowledge management, we can do better than just tracking very basic data (like we do with CIF) or spending a lot of time on enumeration and detailed XSDs. Both of those matter a great deal, and I work on them frequently in my day job, but they just scratch the surface of what we can do with modern data and knowledge management techniques.

Developing with open-source methodologies

Over the last several months, I’ve become more heavily involved in the Collective Intelligence Framework. Wes Young and the other authors have released different parts of the code under the GNU Lesser General Public License (LGPL), which is more or less BSD-like, and the actual BSD license.

But developing free (open source) software means much more than just “having the code”. It usually implies a particular culture and way of developing software. Unlike years ago when I wrote a bunch of proprietary stuff, open source development happens in a far more collaborative fashion. This project has reminded me of the greater efficiency of a do-ocracy, where things get done by the people who do them. (Somebody smarter than me has probably put that in koan form.)

I read a parable years ago illustrating the difference between an effective and an ineffective open source programmer[1]. Imagine two programmers, Alex and Jamie, each deciding to get involved with a particular project. Alex got excellent grades as a programmer in school and has a lot of self-confidence. Jamie can get through the code but has to work a lot harder to do it. Alex can work on a feature, working in isolation for hours or days, and send some particularly clever code off to the maintainer, only to have it rejected because it doesn’t match the existing coding style and concerns from other programmers about possible side effects. Meanwhile, Jamie has a clue where to start and immediately posts some terrible prototype code to the mailing list. Conversations with other developers in IRC clarify things a little further, and after a few rapid iterations, the team accepts the patch for new feature because they’ve already seen it developed and shaped it early in the process.

In my case, most of CIF exists as Perl code, but I haven’t written Perl “in anger” for quite a few years. So while I muddled my way through the existing code base at first, it took time for me to disentangle the various bits in my head. Of course, the code exhibited a lot of typical characteristics of prototype projects that grew up faster than anyone expected, and those slowed me down a little more. But I relentlessly asked questions on the mailing lists, poked people in IRC, and did my best to ask smart questions.

The process has helped me do more in the rest of my work (since we use CIF) as well as develop and work more effectively with my distributed team. None of us share an office or much geographical proximity. Instead, we collaborate on Skype, IM, and email, with infrequent travel to see each other in meatspace. I’ve even gotten involved with a few other projects, though on a far more limited basis, contributing code or documentation.

Maybe this process doesn’t work for all software projects; I wouldn’t develop code to run a nuclear reactor this way, certainly. But for most of the behind-the-scenes software that drives our culture today, this works out beautifully and allows me to get far more done, far more effectively, and contribute back to the information ecosystem.

[1]: After hours of searching, I can’t find the source. If anyone ever points it out to me, I will happily update this post to give credit.

China as a threat: a bit of perspective

Gentleman pandaI got a bit of friendly feedback after recently stating on Twitter that I get tired of all the constant drum-beating about China. That includes some notes from friends and colleagues whom I respect but who do not entirely agree with me. I thought I’d clarify my thoughts on the original APT as a result.

First, anybody who doesn’t recognize that China is engaged in a long-term (and heretofore incredibly successful) campaign of information operations against the West just hasn’t paid attention. We have the evidence, and even the PRC’s protestations to the contrary seem carefully constructed simply to parse meanings and split hairs. They engage in normal diplomatic cover speak, and I can’t fault them for that, but we should still recognize it for what it is. Denials of this reality ring as hollow as denials of the immense volume of fraud and related cyber crimes sourced from Eastern Europe and Russia.

That said, however, I believe some of the reaction in recent months has gone overboard. A number of high-profile individuals have had a significant presence in the press lately, and some of them seem to have the impression that the US should treat this as the most significant issue in its relations with the PRC. Given the range of issues that involve two of the most powerful nations in human history, I find this shortsighted. Climate change, energy policy, human rights, and macroeconomic issues all represent legitimate areas of discussion. Information operations (“warfare” if you like, but I don’t) comprise an important part of those issues but should not overshadow things like nuclear weaponry, for example.

At the same time, they indicate that only the “APT” matters and that professional incident responders only think in term of campaigns (rather than intrusions). I disagree: other significant issues do exist within our domains of threat intelligence, information security, and incident response, as well as within the separate scope of Pacific Rim foreign policy. When your rhetoric reaches the point where your professional colleagues start to openly wonder if you’ve become completely Sinophobic, then you should take a step back and ponder whether to dial it down slightly.

Yes, China’s IO campaigns certainly present a significant challenge in a number of ways, including the need for public awareness in the West, but that challenge exists within the context of many other important topics. Let’s not get so zoomed into one adversary and one issue that we lose focus on the rest.

Semantic change: APT, Cyberwar, and Hacking

“It’s just semantics!”

I hate that phrase. Words mean things – and “semantics” is the study of those meanings. Most words can push emotional buttons for us, even when we really just use different words to describe the same thing. Think about the range of words that all essentially mean “fecal matter”, running the entire way from baby talk to medical terminology to vulgarity.

And, over time, the meaning of a word can evolve through semantic change. I’d suppose this happens even more frequently with jargon. So I’ve started to change my tune on a few specific bits of jargon that I encounter daily.

First, one of the most common (and controversial) phrases in 2011: “advanced persistent threat” (APT). From my understanding, this term originated with the US Air Force in 2006 to refer to either “any sophisticated adversary engaged in information warfare in support of long-term strategic goals” or, well, China. I do not like this term at all, because we have much better terms now when discussing general classes of attackers. And now that the US government has publicly discussed the ongoing campaign of intrusions from China, rather than just in classified environments, we no longer need to treat the subject so gingerly. My stance has evolved to the point of eschewing the term completely. If you mean “nation-state actors” in general, say that. If you mean China (or Russia, or Israel, or the US), then say that. If you mean adversaries with significant capability, I suppose “APT” is the marketing buzzword these days, but this usually leads to so much FUD that I’d prefer other terms that don’t carry the same baggage.

This year, I still hear “cyberwar” – maybe with even more frequency than in 2011. In my view, individuals and organizations with specific agendas have fanned the flames here to suit their own purposes. I don’t really like this term, because I believe that we should reserve the term “war” for the sort of large-scale “kinetic” conflict traditionally associated with it. General Robert E. Lee said at the Battle of Fredericksburg that “it is well that war is so terrible, otherwise we should grow too fond of it”. By using the word “war” for something that doesn’t result in the broken lives and bodies we see in places like Afghanistan, Somalia, and Uganda, we desensitize ourselves to that harsh reality. (I speak here in general terms: certainly, there are individuals who use terms like “cyberwar” have an all-too-horrible familiarity with the reality of war in a way I do not.) With all that said, I’ve come to accept this term grudgingly. Certainly, conflict exists between nations and other organizations, and some of those conflicts extend to networks and other digital systems. At one time, this primarily took the form of a secret war, and the vast majority of the public knew nothing about it beyond what they saw in movies. Nobody denies that these conflicts exist now; we just disagree on who does what, what we should call what they do, and of course what will happen in the future. But if I see this term, I will assume you mean the type of serious conflict that leads to things like Titan Rain and Stuxnet – and that you know a thing or two about it, rather than parroting what you heard in a vendor webinar.

Finally: I refuse to give up the word “hacker”. My last CSO once said in a security meeting that “we don’t hire hackers” – only to have several of us cough politely and catch his eye. (“Well, you know what I mean.”) The term certainly has considerable nuance, but I will almost always use it to refer to a particular subculture of geeks and programmers: Linus Torvalds, Richard Stallman, Grace Hopper, Steve Wozniak – not Albert Gonzalez and Kevin Mitnick. Portmanteaus like “hacktivism” grate on me, but at the moment I don’t know of better alternate terms.

I’d like for us to think of something, though.

Don’t be an evangelist

"Christian Evangelism to the ends of the Earth.   "Christian Evangelism to the ends of the Earth" by Chris Yarzab

Don’t be an evangelist. More precisely, don’t be a tech evangelist.

Without taking a religious position of any sort in this post, I’ll point out that the term has some particularly strong associations for most people, whether good or bad. Christians see an “evangelist” as someone talking about the big questions in life from a particular perspective, and to attempt to put your technology advocacy on the same level as that will undoubtedly step on more than a few toes. Equating your set of practices or products to the Gospel seems like hubris to this group. Non-believers likewise see an “evangelist” as someone who, whether with the best of intentions or trying to pick their pockets, will tell them that they need to change their entire viewpoint and do things the way the evangelist believes.

As an “evangelist” in this sense, then, you want everyone around you to listen to you and accept your view. If you want to sell something, or accept a doctrine on faith, that may work in some cases. But if you want people to listen to you, you need to listen to them – and understand that you don’t always have all of the answers or even understand all of the issues.

Rich Mogull recently wrote a great post on expertise and analysis. One paragraph in particular hit home for me, though the whole thing deserves your time and consideration.

One of the critical skills is the ability to change your position when presented with contradictory yet accurate evidence. Dogma is the antithesis of good analysis. Unfortunately I’d say over 90% of analysts take religious positions, and spend more time trying to make the world fit into their intellectual models than fixing their models to fit the world. When you are in a profession where you’re graded on “thought leadership”, it’s all too easy to interpret that as “say something controversial to get attention and plant a flag”.

Don’t try to make the tech world fit your model, because it won’t. Listen to others, and understand that sometimes you’ll need to change your viewpoint because you won’t always be right. Whether or not you believe that faith and dogma have a place in life, this isn’t it.

Data analysis will change the world

One of my favorite infosec thinkers, Andrew Hay, had a pair of recent posts that have given me lots to chew on.

First, he asked:

This provoked a wide-ranging conversation about what that means. We’ll find tremendous value in applying big data techniques to security data. (Actually, I think data analysis will change the world, but that’s a bit larger scope than this post can comfortable handle.) We can then start to bring in additional data feeds past what traditional SIEMs handle. Think along the lines of more OSINT, network flows, and possibly even business data. At that point, you can really start to grasp the qualitative and quantitative improvements to data protection.

The next day, he wrote an article in which he asked an oft-heard data analysis question: Where’s my ‘Minority Report’ dashboard?. We have to unpack that a little, though, because the data analysis scenes involved a few different useful things.

First, and perhaps most memorably, Cruise’s character used a gesture-based interface to work with the data he had available. As Hay notes, this tech has started to push down into consumer electronics like game consoles, but not generally into business applications like SIEM. While this might seem natural, we will have to move beyond the standard desktop metaphor and start to think of data as objects. It certainly won’t happen completely intuitively, but the long existence of similar ideas in various cultures (think mudras and sign language) and scientific research into the connection between words and gestures seems to indicate that we still have a lot of potential here.

Second, note how many disparate data feeds he had available. Apart from the fictional visualizations from the “precogs” (for which we can use surveillance video as a stand-in), he had social profiles, financial records, and more. While most of the entities we need to visualize aren’t always so human, we can assume some of the analogues I mentioned above for deploying “big data” tech. Data mining and machine learning will help here, particularly in knowledge discovery to hypothesize and test for correlations among the various data.

Third, the system latency seemed absurdly low. Try running a DB query on unstructured, near-realtime data, and tell me if it happens that immediately. While we’ve seen significant leaps in these areas, we need lots more advancement. Much of the tech today has started to move back towards a batch processing model rather than direct interaction and exploration, for example. Don’t think of this as just an engineering problem, because latency greatly matters when talking about trying to analyze data at anything remotely resembling the speed of thought.

Finally, the analyst clearly had excellent spatial reasoning skills. As younger generations continue to move into adulthood, we’ll likely see more applications of spatial reasoning. This means more research into data dimensionality: human brains don’t really visualize high-dimensional spaces very well, so we need to improve our models and analysts. It might turn out, for example, that we need to conceive of data as a hypercube as we drill down into specific nodes. Analysts already need to understand the foundations of graph theory when working in a lot of knowledge domains.

The future of data analysis excites me, and I really geek out over the possibilities. This has fractal-type potential: no matter whether we’re looking at data science from the MBA-typical “thirty-thousand foot view” or ångström altitude, we can find ways to change the world. (And if you’re working on this stuff and want some cross-domain thinking, let’s talk.)

GNU Testament

I R WATCHIN U

The tools we use can have a drastic effect on how we think, and thus our productivity. Here’s a mini-case study.

I grew up (literally, not in the MBA-speak sense) using a command line. First CP/M, then MS-DOS, with lots of TRS-80 and GW-BASIC usage along the way. I became proficient in Pascal and then C as a teenager. So although I used Windows and its predecessor before I used Unix, my life changed once I found Solaris and then GNU. Unix made sense to me, largely due to its philosophy of “small pieces loosely joined”. As Neal Stephenson has famously explained, Windows (at least in the not-too-distant past) was more of a toy operating system and Unix got things done.

At my desk in my home office, I have two physical desktops on a KVM switch. One runs Windows, primarily for gaming and very light web browsing. The other runs GNU/Linux plus VirtualBox, primarily for everything else. So what I do on a given evening really depends directly on what system I log into first (even though I may toggle back and forth somewhat). I never use my home Windows system to access any sensitive resources: online banking, for example, happens only from my Linux system — or, better, from a VM within it.

I recognize that some of the dichotomy here comes from choices I’ve made. But at my work office, I find myself more productive if I fire up VMware and log into a Linux VM. I may do exactly the same tasks (data analysis, security device administration, light scripting, etc.), but when I see the familiar xterm window and maybe xeyes staring at me, judging, accusing, I know I want to get down to business.

there is no environment but Unix and Stallman is its prophet