Tag Archives: Google

CFAA and foreign computers

25iht-heng25-articleLarge
As part of some research into “active defense“, I decided to review the actual text of the Computer Fraud and Abuse Act (CFAA). This law has a number of well-documented problems, which I don’t plan to address in this post, partly because IANAL and partly because I want to focus on how the Act describes a “protected computer”:

the term “protected computer” means a computer—
(A) exclusively for the use of a financial institution or the United States Government, or, in the case of a computer not exclusively for such use, used by or for a financial institution or the United States Government and the conduct constituting the offense affects that use by or for the financial institution or the Government; or
(B) which is used in or affecting interstate or foreign commerce or communication, including a computer located outside the United States that is used in a manner that affects interstate or foreign commerce or communication of the United States

(Emphasis mine.) Specifically, I want to think about the implications related to a “computer located outside the United States”. Assuming that such a system doesn’t affect US commerce or communications (whether or not that activity takes place within the US), would it fall under the definition of a protected computer? For example, if a US person gains access to a command-and-control system in another country and takes some action that would otherwise certainly violate the CFAA were the C2 in the United States, perhaps the CFAA does not apply. Or maybe somebody accesses an exploit server or malware host to gather additional information: does the CFAA cover this? (Other statutes, particularly in the host country, may apply, so don’t do anything that might get you thrown in prison, kids. We’re just thinking about what the law may cover.)

Google may have possibly done something akin to this when investigating the Aurora incident. According to the New York Times story after the incident, Google:

managed to gain access to a computer in Taiwan that it suspected of being the source of the attacks. Peering inside that machine, company engineers actually saw evidence of the aftermath of the attacks, not only at Google, but also at at least 33 other companies, including Adobe Systems, Northrop Grumman and Juniper Networks, according to a government consultant who has spoken with the investigators.

(Emphasis mine again.) So, according to this story, Google somehow accessed a system that presumably did not belong to them. Depending on that system’s function, perhaps this didn’t violate the CFAA. Certainly, the USSS or the Department of Justice or Secretary Clinton did not publicly express concern about this. As far as we know, they didn’t shut down the system or otherwise damage it, so while they could have concerns about Taiwanese law if they actually did any of this, they might not have to worry about the CFAA.

This post does not advocate so-called hack back retaliation, but my initial non-lawyerly analysis makes me wonder if other people already depend on this interpretation for various sorts of activities.

Pre-processing threat intelligence

I usually like to think of threat intelligence as either “high level” or “low level”. High level intelligence includes human-understandable information that we can’t immediately parse into specific data, like a warning that “hacktivists” have targeted an organization. In contrast, low level intelligence usually consists of atomic data (network addresses, malware indicators, payment card information, etc.)

This low level threat intelligence often arrives in raw form that requires its own analysis and processing before we can take action on it. Automated actions require a great deal of careful thought before implementation: while we can certainly take a list of, say, IP addresses and put it directly into a firewall, that doesn’t necessarily give the best benefit. What if those addresses mistakenly include internal ranges, or include key business partners? What if the IP addresses have domain names or URLs mixed into them?

Workflow

Because of these issues, I find it useful to map out a workflow for managing incoming threat intelligence:

High-level threat intel workflow

Organize your incoming threat intelligence into use cases, perhaps based on the source. You may have access to data feeds from a trusted partner, like the US-CERT or an ISAC. Certainly, you should be developing intelligence based on your existing caseload, tugging on the threads you find to unravel the bad guys’ sweater. Various OSINT sources like black lists, pastebin monitoring, or social media might also come into play for some organizations, though these often require more advanced capabilities and understanding. Each type of data will usually require its own use case, though as you work through more of them, you will start to consolidate on a core workflow that will make building new use cases

Pre-processing

CSVs and RSS feeds may look highly parseable, and in a very real sense they are. But that doesn’t mean they don’t need clean-up and validation. Cleaning smaller data sets may only require some massaging in a text editor or spreadsheet, or perhaps a small shell script. For larger data sets, though, you will likely need more power. I like Google Refine for its rich abilities to categorize and edit data, though it can break down at really large scales, such as gigabytes of data. In those cases, you might want a powerful statistical package like R – although you should probably instead re-examine your use case to evaluate whether that data set really will work for threat intelligence.

Validation means trying to eliminate as many spurious alerts as possible, so that you don’t classify http://www.google.com/webhp?hl= as a malicious URL or look for malware with the MD5 hash d41d8cd98f00b204e9800998ecf8427e (the result for the null string).  Maltego does this particularly well, not just because of its visualization abilities but also because of its ability to run transforms (“lookups”) on data sets. As an example, imagine you have a set of suspicious IP addresses. Before leaping to conclusions, you need to know who or what those addresses represent. You can dump the addresses into Maltego and quickly determine their reverse DNS, WHOIS contact info, and a good estimate of their geolocation. From here, you can look over the data to determine whether you’ve  possibly thrown your net a little too widely, then look through proxy logs based on the reverse DNS lookup. Conversely, if you have a list of DNS names, you can quickly resolve those to IP addresses for blocking in a firewall or looking up in other data sources.

To take a related example, the other day I found traffic to a set of addresses that needed a closer look:

72.21.203.26
74.125.113.108
74.125.113.109
74.125.115.108
74.125.115.109
74.125.157.108
74.125.157.109
74.125.65.108
74.125.65.109
98.139.215.231

After pasting these into Maltego and looking for reverse DNS, contact info, and geolocation, I had this graph:

Maltego graph based on IP addresses

Click to embiggen

So these turned out to be Gmail and Yahoo! Mail servers plus what turns out to be an AWS server. This makes sense given the fact that I saw TCP 993 traffic (Secure IMAP).

Good pre-processing eliminates a lot of the pain when getting further along in your workflow and allows you to have more confidence when deciding to pre-emptively block an address or incur the performance cost on a system by looking for specific hashes.

Shrinking my Googleprint

As you can imagine, the recent revelations about Google doing bad things with Safari (and now IE too) have driven me to question why we share so much data, though in a larger context. The New York Times recently published a spectacular article about data mining by retailers, for example: a teenager hadn’t yet confessed to her father that she’d gotten pregnant, and he discovered this upon seeing ads from Target for her based on purchases that might have seemed otherwise innocuous. I don’t believe that we’ve reached the end of the road for privacy intrusions, either. Google has a long history of accusations of evil. I’ve tried to make excuses, but once is an accident, twice is a coincidence, and thrice is a conspiracy.

Choice-making

I’ll allow for power differentials here: despite the recent Path fiasco, that doesn’t look like a major issue because users can decide to avoid that network. Similarly, we can choose not to shop at Target or use a “loyalty card”, although residents in small-town areas may have limited choices. But Google pervades too much of the Internet for us to avoid it completely, especially for people like me who have loyally stuck with them for years now. Still, what if we try to reduce our “Googleprint”? As a side note, take a look at the excellently-named Data Liberation Front for moving data out of Google. This post focuses on what to use instead, but the DLF may help a lot of folks along the way.

We can start with some easy things. And fortunately (?), we Morlocks have additional options not open to the Eloi. (That’s part of the problem, I suppose, but fixing that lies way outside the scope of this post.) Even though dropping Google completely would incur a lot of pain, we can look at starting to make changes in important areas.

  • Latitude just cannot continue to work for me. While I only rarely shared my check-ins publicly, I did use it quite a bit to track my location for later analysis. For now, I have suspended that project until I can figure out a better way to do it.
  • Chrome has an obvious substitute in Firefox, albeit inferior in several ways. (Why should I have to choose nice-looking fonts in Linux over privacy? What decade is this?)
  • Search has a number of competitors; DuckDuckGo has gotten a lot of attention lately. And I can stay logged out of Google for the times I do want to use it for searching, then use a private browsing window or even a dedicated alternate browser.
  • Gmail requires effort: a price I will pay. I’ve used Google Apps to host mail for my private domain for years. My wife uses that interface directly, and my account just forwards over to my regular Gmail account, which I’ve had for nearly a decade now. I can move to an alternate hosting provider of some sort. Hushmail looks good at the moment, but I haven’t really started the research. Anti-spam measures seem to prevent me from hosting my mail completely, like via EC2 or similar. Apart from the really nice handling of “conversations” (threads), I don’t think I’d miss too much.
  • Reader doesn’t have an exact analogue anymore with the demise of Bloglines, although I may still find one. However, I will try an alternate workflow here by combining Yahoo! Pipes and Paper.li to get something a little more modern and focused.
  • Plus doesn’t really need an alternative, at least past Twitter. Despite my enthusiasm for it at first, lately that’s waned for different reasons. The gaming community over there has thrived and I’ve found lots of people with whom to discuss my hobby. But lately, I just haven’t played MMORPGs like I did, except for first month of SWTOR, and Mass Effect 3 doesn’t launch for a few more weeks. I might check in there again sometime, but it doesn’t really matter much. Twitter does a pretty decent job as a lightweight replacement, albeit with less deep discussion.
  • Docs has a well-known competitor, Zoho, but a good wiki might fill most of my needs that Evernote can’t already handle. I don’t use this service nearly as often as I did in the past, and only spreadsheets still give me pause.
  • OpenID providers exist all over the web. Even better, I can do that myself.
  • Voice provides a real sticking point. I like the ability to manage my voice and SMS communications with such granularity. Skype doesn’t really do the same thing, and apparently other providers have spotty records. I might dump this one last.
  • Android may have a competitor in iOS, but for me that’s not much of a choice. I don’t like Apple any more than I like Google, and owning thousands of dollars worth of Android systems provides a powerful reason not to switch immediately. I will continue to use this OS for now and watch this space in the future.

Action this day

In any case, I think I’ll start by looking for a new mail provider, as well as setting up a new reading workflow. Firefox will take some additional tweaking before I feel like it can handle the big-time, particularly on Windows where malware protection matters a great deal. Setting up an OpenID provider looks like a fun project all on its own anyway. Therefore, my current choices look like this:

  • Latitude → nothing
  • Chrome → Firefox
  • Search → DuckDuckGo
  • Gmail → Hushmail
  • Reader → Pipes + Paper.li
  • Plus → Twitter
  • Docs → self-hosted wiki plus Evernote (or Zoho)
  • OpenID → self-hosting

Voice and Android will remain as-is for now. But one key difference for the future: I’m willing to pay for services to avoid advertising, as well as to keep promising startups from tanking. In fact, I’d rather pay you an appropriate subscription fee than deal with incessant ads and loss of personal data. Call it the public radio model: I’ve had a membership to my local public radio station for years. I’ve kicked in money to community we sites when they needed it, and I’ve bought stuff from web comics to help them thrive. I happily do the same for service like Kanbanery that provide significant value to me.

I’ll post again in the future with lessons as I learn them, including services I may have forgotten this time around.

Does Google exploiting browsers qualify as evil?

No one could properly characterize me as a Google opponent. I’ve used Google for many, many years, and much of my online activity lives in their ecosystem: Reader, Docs, Plus, Mail, Android, Voice, Currents, Chrome, etc. But the news of Google using a bug/feature in Safari to bypass privacy settings troubles me. At some point we have to draw the line and stop falling back.

A little melodramatic in this context? Sure. But where do we draw the line? By auto-submitting an empty form, Google could set a third-party cookie on a browser even when the user had enabled settings to prevent that. (This is a step I usually take in my browser settings, myself.) From that point, Google can then track users across all sites that use their ads. Apparently, other ad networks do the same thing, though we typically try to hold Google to the higher ethical standard they set for themselves: “don’t be evil”.

To be fair, Google sees it differently. In part, they state:

The Journal mischaracterizes what happened and why. We used known Safari functionality to provide features that signed-in Google users had enabled. It’s important to stress that these advertising cookies do not collect personal information.

I have trouble with the phrase “known Safari functionality”. This sounds to me like they excuse their activity in part based on not using a 0-day browser vulnerability. They also state that they allow users to opt out of the behavior with their Ad Preferences Manager. I find that just as inexcusable, because they basically say that they’ll respect an opt-out setting on their site but not coming directly from the browser.

At a minimum, I need to start evaluating the pain of moving off of Google’s platform.

UPDATE: EFF said it better.

Turning the tables with OSINT

Baby platypuses with fedoras

You never know who's on the case

The SANS ISC originally started as a place to share threat intelligence and analysis, partly based on DShield data and partly based on near-real-time input from the wider network security community. These days, it principally acts as a network security blog with little connection to active threat intel, though it does highlight patch releases.

Earlier in the week, an ISC post on OSINT tactics grabbed my attention. While we await the imminent release of the new version of Maltego (and CaseFile), other tools can help as well. FOCA (Spanish-language site) handles metadata parsing from local documents and simplifies using Google for finding interesting documents on a site, aka “Google hacking“. Apparently, it can also try some direct connections (like HTTP brute-forcing and DNS enumeration).

On a related note, as seen in the comments on that post, Cryptome released a DHS document this year entitled “Publicly Available Social Media Monitoring and Situational Awareness Initiative Update“. This really just lists a lot of publicly available social media sites, tools, and aggregators. What you need might not exist, though, and it’s worth understanding APIs like what Twitter provides. Unfortunately, the Google Social Graph API will go away this spring. I don’t know of any good replacements, but I’d love to find one.

While I have a few concerns related to civil liberties about DHS trolling through all of these, that doesn’t change the fact that your adversaries, regardless of affiliation or organization, will go about this. So while you should think about monitoring your own organization proactively, also consider the possibility and appropriateness of engaging in OSINT against them. Krypt3ia has explained this use of OSINT, though he’s not teaching you to find jihadists. That doesn’t mean, of course, that an intelligent, motivated analyst can’t research techniques and data on his own.

But this can include areas other than getting involved in geopolitical controversy. Perhaps you’re working on an investigation where you have at least some information on the attacker. In some cases, you may choose to take the additional step of gathering further data. (You also might want to consult with your legal counsel, depending on what you choose to do.) I have worked in the past on situations where we identified the attacker in great detail before notifying law enforcement. And because his OPSEC frankly sucked, we could do this through entirely legal and open methods. This distinguishes itself from “hacking back” by restraining itself to gathering information through public data sources only, rather than engaging in any vulnerability exploitation or accessing unauthorized systems and data.

Alternately, consider the uses for other types of investigations. Perhaps you’ve taken an interest in suspicions of local corruption, or you have some reason to believe that somebody on the Internet (e.g. a site, company, or charlatan^W evangelist / expert) has some dirty laundry that needs washing in the open air.

The world has changed, and we can either change with it or hide from it. If you’re already working in network security, only one choice makes any sense.

Musings on personal data mining

"Can House at Nettleton's First Shaft" by Garry

Unless you live in a Montana shack, you’ve heard concerns about governments and corporations mining your personal data for various purposes, not all of which you may like. Surveillance and marketing probably top that list. But, like in most other cases,  we can use the basic approach and technology for good instead of evil.

If a pervasive culture of data gathering and access has already started to exist, what insights could we glean from collecting and mining our own personal data? Some obvious answers include health, social connections, news, purchases, locations, and more. So as a first pass, I’d like to look at doing something like the following:

  • Social media (Twitter, Google+, Delicious, blogging): What am I reading? What am I missing that might be more relevant than some things I read now? Who do I talk to? Where can my expertise be more useful?
  • Email: Am I handling it efficiently? What slips through the cracks? How can I process it more effectively?
  • Browser: Where is that article I read last week? Have there been any follow-ups to that story? Have I missed some relevant data sources? Do I waste too much time on some sites without getting enough value in return?
  • Transactions: Where do I spend my money? Which vendors get most of my money? Where should I cut expenses? Can I make my expense reporting for work more efficient?
  • Location: How much time do I spend in my commute? Would alternate routes be more effective? Could I improve my gas mileage?
  • Productivity: What sorts of tasks in my personal kanban get the most attention? Am I estimating task size properly? What keeps getting left behind? What have I not tracked but should?
  • Health data: Besides the obvious things like vital signs (weight, BP, etc.), how do my various choices correlate with my mental state? What times of the day work best for exercise and increased activity? What affects the quality of my sleep?

The really big value comes when you correlate this stuff. At least two dimensions make immediate sense here: time (maybe via an annotated, filtered timeline) and location (plotting social activity, purchases, etc. on a map). We could find more, of course, but those make good starting points.

Of course, the core idea itself has been around for a while, but we’d want to approach it with security in mind. After all, if you gather all that information in one place, it needs good protection, both at rest and while processing it. This gets even more important when you consider financial data, location over time, and perhaps reading material. Privacy matters, and this entire project focuses on getting the benefits of our own data for ourselves rather than for others.

I have a few ideas of things I want to test over the long weekend, so I should report back next week on early results.

Data flow for personal consumption

This post is mostly for my benefit as I’m sorting out my information flow and consumption. But in addition to the meta-cognition of thinking about what I’m thinking about, I thought I might get some ideas from people. If this seems boring or overly pedantic, feel free to skip it, but I enjoy these sorts of things from time to time.

Input

So, like almost everybody else, I have a surplus of incoming data. The firehose unleashes as soon as I wake up:

  • Work email
  • Personal email
  • Twitter
  • Google+
  • Blogs
  • Reddit / Hacker News / occasional forum usage

Meatspace interactions should probably count here as well, but talking with my wife and kids, or the friendly barista who brews my soy latte, don’t need the same sort of management process. Depending on how much time I spend on the items in that list, or rather how much energy I choose to devote to them, that can become overwhelming. Some of them offer more value or take higher priority. For example, work email gets much more of my attention than Reddit (most days).

Tools

In order to handle that flow, I have several tools with which I’ve grown comfortable (and a few others that I use for experimentation).

This lets me filter and organize diverse inputs, possibly collating them into several tools (e.g. blogs -> RSS feeds -> Google Reader) or even structuring data that may not be presented as such. Yahoo! Pipes in particular may need replacement soon, as I haven’t set up any new projects with it in a while.

Outputs

Sometimes, I want to share what I’ve come across. This might be for fun or it might be due to work needs. Other times, I end up producing something as I integrate and synthesize this information (like in a blog post or internal analysis).

  • Work email
  • Personal email (rare)
  • Blog post
  • Internal document or other work product
  • Sharing (Google+, Twitter)
  • Link blog / social bookmarking

I notice that nothing here really comes from Reddit and Hacker News. That stuff mostly just goes straight to internal consumption; I certainly don’t share back there much except for the occasional comment and really occasional link submission.

Process

I really need to stay focused on continual improvement here, because the real bang for the buck comes from focusing on things that matter. The best example of this? Eliminating almost all Internet fora (message boards) has helped, not just in terms of time spent but also in my general mental state.

However, I make a point of starring things in Twitter or Reader that deserve more attention than I can give at the moment. Emails get flagged for attention so that they show up in my Outlook Tasks, or perhaps get added to my personal kanban. If I’ve read it and think it might be worth someone else’s time, I’ll share it via Delicious. If I think I’d like to invite some discussion on it or find it particularly awesome, I’ll share on Twitter or Google+ (rarely both as I don’t have much intersection between my networks).

When I notice that some class of input seems to require more manual processing than it should, I look for ways to streamline it. That might mean a rule in Outlook or assigning an OIB label, or finding an appropriate method to automate its processing. Like any other optimization process, this usually involves looking for the best bang for the buck — including possibly dropping the input altogether if it doesn’t give enough value.

As part of my job, I often handle incoming threat (or risk) intelligence, including via internal methods like an FS-ISAC alert or via my own open source monitoring. That’s a special case and one I’ll tackle in a future article due to its sensitive and specialized nature.

Spyware Chrome extensions considered harmful

'Carefull what you wish for' by Robbert van der SteegMessing around on Reddit tonight, I found a post that disturbed me greatly – not in the usual sense people mean for Reddit.

According to khoker, the Smooth Gestures extension for Chrome is spyware.


function pl_track(){
if (window.location.protocol == "https:") return;
if (window === window.top)
{
if (!document.getElementById('hummingtrack'))
{
trackerimg=document.createElement('img');
trackerimg.id="hummingtrack";
trackerimg.src="http://www.smoothgesturesapp.com/tracking/tracking_ss.gif?events="+window.location.href.split(/\/+/g)[1]+"&r="+Math.random();
trackerimg.height="1";
trackerimg.width="1";
document.body.appendChild(trackerimg);
}
}
}
setTimeout(pl_track(),1500);

If somebody has a reasonable explanation for this other than ‘spyware’, I’d love to hear it.

The Google Code issue has quite a few comments discussing it further, and you may wish to report it to Google as I did.

For myself, I’ve disabled it for now until this gets resolved. I use my browser for internal corporate stuff as well, and I don’t think anybody needs to know about those sites (though I don’t particularly care about them seeing me waste time on G+ and Reddit :P ). This is very sneaky and potentially illegal. At the least, it’s almost certainly a violation of their terms of service with Google.