slight paranoia: 2011

Monday, December 19, 2011

Sprint recklessly exposed Carrier IQ logged URL data to easy government access

In recent weeks, there has been considerable controversy around Carrier IQ and the data collected by it and the wireless phone companies who have partnered with the firm. Now that class action lawsuits have been filed, and the FTC is reportedly probing the company, one of the most important questions will be: What is the harm?

As I will attempt to argue in this blog post, by allowing Carrier IQ to collect and retain private user data (such as URLs of pages viewed), Sprint recklessly exposed this sensitive information, which would normally require a court order for the government to obtain, to access with a mere subpoena.

Last week, technical experts Ashkan Soltani and Peter Eckersley reported that Carrier IQ's software was, in some cases, collecting keystrokes and the contents of (SMS) text messages. A 19-page report (pdf) released by Carrier IQ confirmed the researchers' claims, putting the blame on a technical bug and accidental overlogging by Sprint or HTC.

For the purpose of this blog post, lets give Carrier IQ the benefit of the doubt. Instead, it is sufficient to focus our attention on one form of intentional data collection that Carrier IQ and its partner Sprint have acknowledged: the URLs of websites visited by handset owners. [There are others kinds of data that the company has intentionally logged too, for example, location data, but we don't know as much about this right now, so I'm focusing my analysis on URLs]

Carrier IQ and Sprint: Yeah, we log URLs

In a letter to Senator Franken (pdf) last week, Carrier IQ acknowledged that its software has been used by one wireless carrier to collect the URLs of webpages viewed by subscribers:

Embedded versions of IQ Agent allow for the collection of URLs if requested by a Network Operator in a profile. These can be collected together with performance metrics so that Network Operators can determine how devices on its network perform for specific web sites... The profile specified by the Network Operator and loaded on the device dictates if this information is actually gathered. The IQ Agent cannot read or copy the content of a website. Only one of Carrier IQ's customers has requested a profile to collect URLs of websites visited on devices on its network.

In its letter to Senator Franken (pdf), Sprint acknowledged that it was the wireless carrier that collected URLs:

Sprint already knows the website of a URL of a website that a user is trying to reach from routing the request on its network. This information may be collected through the Carrier IQ software as part of a profile established to troubleshoot website loading latencies or errors experienced by a population of subscribers.

Let us ignore the fact that in the same letter, Sprint falsely denies collecting users' search query information (the search terms are in the Google/Bing URL), that it failed to disclose that Sprint collects through Carrier IQ the URLs of webpages viewed over encrypted HTTPS connections which it would never learn by watching the network, or, that it probably also gets through Carrier IQ the URLs accessed by handset owners when they are using WiFI and not Sprint's network. While these are interesting points (and show that Sprint is either lying to a Senator, or their legal team is embarrassingly ignorant about technology), they are unnecessary for our analysis.

It is also worth mentioning, although similarly unnecessary for our analysis, that Sprint's Electronic Surveillance Manager revealed in comments at the ISS World surveillance conference in 2009 that Sprint allows its marketing department to look through the logs of URLs viewed by its subscribers:

On the Sprint 3G network, we have IP data back 24 months, and we have, depending on the device, we can actually tell you what URL they went to ... If [the handset uses] the [WAP] Media Access Gateway, we have the URL history for 24 months ... We don't store it because law enforcement asks us to store it, we store it because when we launched 3G in 2001 or so, we thought we were going to bill by the megabyte ... but ultimately, that's why we store the data ... It's because marketing wants to rifle through the data.

Legal protections for URL data under US privacy law

It is beyond a cliche at this point to complain that our primary electronic privacy law dates from 1986, and hasn't been substantially updated since. This law not only differs in the legal protections offered to data based on whether it is is content or non-content, but also, based on what kind of company is holding the data.

As a Sprint customer, I am obviously unhappy about the fact that that the company voluntarily logs and retains the URLs that subscribers visit - which are subsequently available to the government. However, I can get at least a tiny bit of comfort from the fact that the Electronic Communications Privacy Act requires a court order issued under 18 USC 2703(d) before Sprint can be forced to disclose these records to law enforcement agencies.

Furthermore, if Sprint wished to do so, it could probably argue that URLs contain communications content, and thus should only be disclosed pursuant to a probable cause warrant. [DOJ has acknowledged in its Search and Seizure manual that URLs can contain content, at least in context of real-time intercepts via a pen register]. However, given Sprint's general pro-government approach to privacy, I wouldn't expect them to lift a finger to protect their customers.

Carrier IQ and ECPA

What about Carrier IQ? Does the government need a court order to get URLs when held by the company?

To be considered a "remote computing service" (RCS) or an "electronic communication service" (ECS) provider under the Electronic Communications Privacy Act (ECPA), you need to actually provide services to the public. Carrier IQ does not do this -- its customers are wireless carriers. On this point alone, user data held by Carrier IQ is simply not subject to the limited protections of ECPA.

Furthermore, even if we ignore the important requirement relating to providing services to the public, a service provider also has to actually provide the ability to send or receive a users' communication for it to be considered an ECS under the law. See Sega Enterprises Ltd. v. MAPHIA, 948 F. Supp. 923, 930-31 (N.D. Cal. 1996) (video game manufacturer that accessed private email of users of another company's bulletin board service was not a provider of electronic communication service); State Wide Photocopy, Corp. v. Tokai Fin. Servs., Inc., 909 F. Supp. 137, 145 (S.D.N.Y. 1995) (financing company that used fax machines and computers but did not provide the ability to send or receive communications was not provider of electronic communication service).

Since Carrier IQ is merely covertly logging the URLs that consumers are viewing, rather than actually delivering web pages to the end user, they also aren't covered under ECPA.

So what?

As Carrier IQ is neither an RCS or ECS under ECPA, any data held by the company can be obtained by the government with a mere subpoena (and potentially, but I'm not as sure of this, by a civil litigant too, such as a divorce lawyer).

As Sprint opted to have user data sent to Carrier IQ, where it was held for 30-45 days, rather than having the carrier IQ software send the data directly to Sprint's servers, I believe that Sprint recklessly exposed this private information to easy access by the government without a court order. There are plenty of ways that the company could have guaranteed that this data would always remain protected under ECPA -- but it didn't do so.

Likewise, while Sprint claims in its letter to Senator Franken that it tells its customers in its privacy policy that it collects information about the sites that they visit, it never discloses to subscribers that this private data is collected and stored by a third party, or the important way this will enable government access to that data. Sprint needlessly kept its customers in the dark about the ways in which the firm was exposing their data to government access.

In its letter to Senator Franken, Carrier IQ denied getting any requests from law enforcement agencies for user data. Sprint had to issue a much more delicately worded statement: it has not disclosed Carrier IQ data to law enforcement (the reason for this careful wording, I suspect, is the presence of 110 employees in Sprint's Electronic Surveillance team who do nothing but supply user data to law enforcement and intelligence agencies).

Although the recent FOIA response that Muckrock received suggests that the FBI has at least some interest in Carrier IQ data, if we rely on the statements of Carrier IQ and Sprint, then, at least as it relates to URL data, the risks I have described in this blog post are largely theoretical. Even so, it doesn't change the fact that Sprint has demonstrated an extremely cavalier attitude towards user privacy.

In a best case scenario, Sprint's legal team simply didn't consider the ECPA/law enforcement related implications of using Carrier IQ's technology. In a worst case scenario, they knew what they were doing, and didn't care. In either case, the company should be held responsible.

Friday, December 16, 2011

Commerce Dept: export licenses for intercept tech have "exploded" over last 2,3 years

Earlier this year, the Commerce Department's Bureau of Industry and Security held a two-day Conference on Export Controls and Policy. It included a workshop specifically focused on the rules governing the export of encryption technologies (which include intercept equipment). The full transcript can be found here: part 1 (pdf), part 2 (pdf).

As a non-lawyer, and non-expert in export control regulations, I was pretty surprised to learn that the government already strictly regulates the export of covert communications surveillance technology. What this means, of course, is that the Commerce Department already has a list of every foreign buyer of US made covert surveillance technology. Unfortunately, they won't provide this information to the public, and as far as I know, they won't provide it in response to FOIA requests.

In any case, reading through the transcript of the event, the following section caught my eye, as it specifically addressed the regulations that apply to surreptitious listening technology:

Michael Pender: Licenses [for "surreptitious listening" technology] are required for export to all end users, all destinations, and there's a general policy of denial.

The exceptions are for U.S. government agencies or communication-service providers there in the normal course of their business. So, if you're representing a U.S. law-enforcement agency and you're partnering with some other organization in another country and you need to send something out of the county, you know, contact us. Licenses are authorized for that situation.

If you represent a telecommunications company and you receive court orders for wiretaps from the local law enforcement and you have to comply with those court orders, you know, that's one of the few circumstances in which we can grant a license.

And you wouldn't think there would be that many licenses for these products in general in a year, but the rate at which they're coming in has just exploded over the course of the last 2, 3 years. I mean, I think I went from getting one a year to like five times as many, and then again, it's at least doubled or tripled in just the last year.

Friday, November 11, 2011

Twitter's privacy policy and the Wikileaks case

Summary: The federal judge in the Wikileaks case cited in his order a version of Twitter's privacy policy from 2010, rather than the very different policy that existed when Appelbaum, Gonggrijp and Jonsdottir created their Twitter accounts back in 2008. That older policy actually promised users that Twitter would keep their data private unless they violated the company's terms of service. It is unclear how the judge managed to miss this important detail.

Earlier this week, a federal judge in Virginia handed down an order in the high-profile Twitter/Wikileaks case. That order has already been widely covered by the media, so I won't summarize it here.

In ruling that Appelbaum, Gonggrijp and Jonsdottir did not have a reasonable expectation of privacy in the IP addresses that Twitter had collected, the judge specifically highlighted the existence of statements about IP address collection in Twitter's privacy policy.

(from page 3 of the order)

The judge noted that Twitter reveals in its privacy policy that it collects "many types of usage information, including physical location, IP address, browser type, the referring domain ..." To support this claim, the judge cited the "Bringola declaration" (pdf), which is a collection of screenshots from Twitter's website produced by a paralegal working for Appelbaum's lawyer.

The privacy policy reproduced in the Bringola declaration and cited by the judge was effective as of November 16, 2010, and appears to have been the current privacy policy in March of 2011 when the paralegal made the screenshots. That privacy policy included the following "Log Data" section:

Our servers automatically record information ("Log Data") created by your use of the Services. Log Data may include information such as your IP address, browser type, the referring domain, pages visited, your mobile carrier, device and application IDs, and search terms. Other actions, such as interactions with our website, applications and advertisements, may also be included in Log Data. If we haven’t already deleted the Log Data earlier, we will either delete it or remove any common account identifiers, such as your username, full IP address, or email address, after 18 months.

There is a slight problem with relying on a privacy policy created on November 16, 2010 to decide the reasonable expectation of privacy of these three individuals: They created their Twitter accounts several years before the document was written.

According to the useful website howlonghaveyoubeentweeting.com, Appelbaum's Twitter account was created on February 23, 2008, Gonggrijp created his on September 26, 2008, and Jonsdottir created hers on November 14, 2008.

Thankfully, Twitter seems to archive all the old versions of their privacy policy. It would appear that all three individuals would have "agreed to" (ignoring the fact that none of them likely read the thing in the first place) Version 1 of the privacy policy, dated May 14, 2007. The "Log data" section of that policy reads as follows:

When you visit the Site, our servers automatically record information that your browser sends whenever you visit a website ("Log Data" ). This Log Data may include information such as your IP address, browser type or the domain from which you are visiting, the web-pages you visit, the search terms you use, and any advertisements on which you click. For most users accessing the Internet from an Internet service provider the IP address will be different every time you log on. We use Log Data to monitor the use of the Site and of our Service, and for the Site's technical administration. We do not associate your IP address with any other personally identifiable information to identify you personally, except in case of violation of the Terms of Service.

There are a few things worth noting here:

The term "referring domain" appears in privacy policy cited by the judge in his court order, but not in Version 1 of the Twitter privacy policy. This strongly suggests that the judge is citing a newer version of the Twitter policy. The term appears to have been added in Version 2 of the privacy policy, dated November 18, 2009.
In Version 1 of its policy, Twitter promised its users that it would not associate their IP addresses with any other personally identifiable information sufficient to identify them personally, unless they violated the Twitter terms of service. This pro-user sentence was removed in Version 2 of Twitter's privacy policy, one year later.
The government has not alleged that any of the 3 individuals violated Twitter's terms of service. As such, it would appear that they could reasonably rely on Twitter's claims that it wouldn't associate their retained IP address information with their existing account records or any other personally identifiable information.

This is very interesting.

The old version of Twitter's policy that the three individuals "agreed" to also includes the following paragraph about updates to the document:

This Privacy Policy may be updated from time to time for any reason; each version will apply to information collected while it was in place. We will notify you of any material changes to our Privacy Policy by posting the new Privacy Policy on our Site. You are advised to consult this Privacy Policy regularly for any changes.

Note, Twitter didn't say that it would send out emails to users when it updated its privacy policy, instead, it advised users to revisit the site on a regular basis to see if the policy had changed. How this sentence passed the laugh test at Twitter's HQ, I do not know.

In subsequent edits to the policy, Twitter reworded this section, so that it now reads:

We may revise this Privacy Policy from time to time. The most current version of the policy will govern our use of your information and will always be at https://twitter.com/privacy. If we make a change to this policy that, in our sole discretion, is material, we will notify you via an @Twitter update or e-mail to the email associated with your account. By continuing to access or use the Services after those changes become effective, you agree to be bound by the revised Privacy Policy.

Got that? As of Version 2 of Twitter's privacy policy, merely by continuing to use Twitter, you agree to be bound by whatever the company adds to the policy. Oh, and it is up to the company to decide if the changes to the policy are important enough to justify telling users.

I know that I am not the first researcher to point out how stupid privacy policies are, or that no one reads them. Many others have done it, and done so far more eloquently than me. My goal in writing this blog post is simple: Not only is a federal judge ruling that 3 individuals have no reasonable expectation of privacy with regard to the government getting some of their Internet transaction data, but the judge isn't even citing the right version of a widely ignored privacy policy to do so. If the judge were to examine the privacy policy that existed when these three targets signed up for a Twitter account, he might decide that they do in fact have a reasonable expectation of privacy and that the government needs a warrant to get the data.

Wednesday, November 02, 2011

Two honest Google employees: our products don't protect your privacy

Two senior Google employees recently acknowledged that the company's products do not protect user privacy. This is quite a departure from the norm at Google, where statements about privacy are usually thick with propaganda, mistruths and often outright deception.

Google's products do not meet the privacy needs of journalists, bloggers, small businesses (or anyone else concerned about government surveillance).

Last week, I published an op-ed in the New York Times that focused on the widespread ignorance of computer security among journalists and news organizations. Governments often have no need to try and compel a journalist to reveal the identity of their sources if they can simply obtain stored communication records from phone, email and social networking companies.

Will DeVries, Google's top DC privacy lobbyist soon posted a link to the article on his (personal) Google+ page, and added the following comment:

I often disagree with Chris, but when he's right, he's dead right. Journalists (and bloggers, and small businesses) need to take a couple hours and learn to use free, widely available security measures to store data and communicate.

Let me first say that I really respect Will. Many of the people in Google's policy team default to propaganda mode when questioned. Will does not do this - he either speaks truthfully, or declines to comment. I wish companies would hire more people like him, as they significantly boost the credibility of the firm among privacy advocates.

Regarding Will's comment: If Google's products were secure out of the box, journalists would not need to "take a couple hours" to learn to protect their data and communications. Will does not tell journalists to ditch their insecure Hotmail accounts and switch to Gmail, or to ditch their easily trackable iPhones and get an Android device. Likewise, he does not advise people to stop using Skype for voice and video chat, and instead use Google's competing services. He doesn't do that, because if he described these services as more secure and resistant to government access than the competition, he'd be lying.

Google's services are not secure by default, and, because the company's business model depends upon the monetizaton of user data, the company keeps as much data as possible about the activities of its users. These detailed records are not just useful to Google's engineers and advertising teams, but are also a juicy target for law enforcement agencies.

It would be great if Google's products were suitable for journalists, bloggers, activists and other groups that are routinely the target of surveillance by governments around the world. For now, though, as Will notes, these persons will need to investigate the (non-Google) tools and methods with which they can protect their data.

Google business model is in conflict with privacy by design

At a recent conference in Kenya, Vint Cerf, one of the fathers of the Internet and Google's Chief Internet Evangelist spoke on the same panel as me. We had the following exchange over the issue of Google's lack of encryption for user data stored on the company's servers (I've edited it to show the important bits about this particular topic - the full transcript is online here).

Me:

[I]t's very difficult to monetize data when you cannot see it. And so if the files that I store in Google docs are encrypted or if the files I store on Amazon's drives are encrypted then they are not able to monetize it....And unfortunately, these companies are putting their desire to monetize your data over their desire to protect your communications.

Now, this doesn't mean that Google and Microsoft and Yahoo! are evil. They are not going out of their way to help law enforcement. It's just that their business model is in conflict with your privacy. And given two choices, one of which is protecting you from the government and the other which is making money, they are going to go with making money because, of course, they are public corporations. They are required to make money and return it to their shareholders.

Vint Cerf:

I think you're quite right, however that, we couldn't run our system if everything in it were encrypted because then we wouldn't know which ads to show you. So this is a system that was designed around a particular business model.

Google could encrypt user data in storage with a key not known to the company, as several other cloud storage companies already do. Unfortunately, Google's ad supported business model simply does not permit the company to protect user data in this way. The end result is that law enforcement agencies can, and regularly do request user data from the company -- requests that would lead to nothing if the company put user security and privacy first.

Monday, September 19, 2011

The forces that led to the DigiNotar hack

Last week, the New York Times finally covered the DigiNotar hacks, more than two weeks after security experts and the tech media first broke the story. Unfortunately, the top 2-3 newspapers in the US (which is what legislative staff, regulators and policy makers read) have missed most of the important details. The purpose of this blog post is to fill in those gaps, providing key context to understand this incident as part of the larger Internet trust (and surveillance) debate.

Lawful access

As consumers around the world have embraced cloud computing, large Internet firms like Google, Facebook, Twitter, Yahoo, all of them based in the United States, increasingly hold users' most private documents and other data. This has been a boon for law enforcement agencies, which can often obtain these files without a court issued search warrant, or have to provide the investigated individual with the kind of prompt notice that would otherwise occur had their home been searched.

Law enforcement and intelligence agencies in the US, EU, Canada, Brasil, India, Japan, Israel and several other countries all regularly obtain private user data from Google. The company will insist on a court order for some kinds of user data, but will disclose many other types of data and subscriber records without first insisting on an order issued by an independent judge. This isn't because Google is evil, but because privacy laws in these countries, the US included, are so weak.

Google does not treat all governments equally though. For example, the company will not honor requests from the governments of Iran, Libya, Zimbabwe, Vietnam and several other countries. You might be inclined to believe that Google has taken this position because of the poor human rights record in these countries - that is part of the reason (but not the whole one, otherwise, Google would refuse requests from the US government which has a documented track record of assassination, rendition/kidnapping and torture). Google's policy of refusing these requests, I believe, largely comes down to the fact that Google does not have an office or staff in those countries. Without a local presence, employees to threaten with arrest or equipment to seize, these governments lack leverage over Google.

This situation is not specific to Google - Facebook, Yahoo, Microsoft and other large US firms all disclose user data to governments that have leverage over them, and ignore requests from others. Thus, lacking any "legitimate" way to engage in what they believe is lawful surveillance of their citizens, these governments that lack leverage have turned to other methods. Specifically, network surveillance.

An unintended consequence of HTTPS by default

When users connect to Facebook, Twitter, or Hotmail—as well as many other popular websites—they are vulnerable to passive network surveillance and active attacks, such as account hijacking. These services are vulnerable because they do not use HTTPS encryption to protect all data as it is transmitted over the Internet.

Such attacks are trivially easy for hackers to perform against users of an open WiFi network using tools like Firesheep. They are also relatively easy for government agencies to perform on a larger scale, when they can compel the assistance of upstream ISPs.

As I described above, because Google will not respond to formal requests for user data from certain governments, it is likely that the state security agencies in these countries have come to depend on network interception, performed with the assistance of domestic ISPs.

Unfortunately for these governments, in January 2010, Google enabled HTTPS by default for Gmail and a few other services. Once the firm flipped the default setting, passive network surveillance became impossible. Thus, in January 2010, the governments of Iran and a few other countries lost their ability to watch the communications of domestic Google users.

For now, these governments can still spy on Facebook, Twitter and Hotmail, as these services do not use HTTPS by default. That is changing though. Following the release of Firesheep in October 2010, (as well as two senior US government officials calling for encryption by default) all three services now offer configuration options to force the use of HTTPS. These firms are all moving towards HTTPS by default - for some firms, it will likely be a matter of weeks until it happens, for others, months.

Governments can see the writing on the wall - HTTPS by default will become the norm. Passive network surveillance will lose its potency as a tool of government monitoring, and once that happens, the state intelligence agencies will "go dark", losing the ability to keep tabs on their citizen's use of foreign, mostly US-based Internet communications services.

HTTPS Certificate Authorities and surveillance

As these large providers switch to HTTPS by default, government agencies will no longer be able to rely on passive network interception. By switching to active interception attacks, these governments can, in many cases, easily neutralize the HTTPS encryption, thus restoring their ability to spy on their citizens. One active attack, known as a "man in the middle attack" requires that the government first obtain a HTTPS certificate issued by a Certificate Authority (CA) trusted by the major web browsers.

In March of 2010, Sid Stamm and I published a paper on what we called compelled certificate creation attacks, in which a government simply requires a domestic Certificate Authority issue it one or more certificates for surveillance purposes. When we released a draft of our paper, we also published a product brochure I had obtained in the fall of 2009 at the ISS surveillance conference, for a Packet Forensics interception device that described how it could be used to intercept communications using these kinds of certificates.

The browsers trust a lot of Certificate Authorities, probably too many. These include companies located in countries around the world. They also include Certificate Authorities that are operated by government agencies. For example, Microsoft trusts a couple dozen governments, that include Tunisia and Venezuela. It is perhaps worth noting that Microsoft continues to trust the Tunisian government even after it was caught in December 2010 actively hijacking the accounts of Facebook users -- an act that led to Facebook enabling HTTPS by default for all users in the country.)

In any case, as Sid an I described, governments can compel domestic Certificate Authorities to provide them with the certificates necessary to intercept their own citizens' communications. However, not all governments around the world are as lucky as Tunisia to be trusted by the browsers, nor do all of them have a domestic certificate authority that they can bully around. Some countries, like Iran, have no way to obtain a certificate that will let them spy on Google users (yes, I know that you can buy intermediate CA issuing powers, but I am assuming that no one will sell this to the Iranian gov).

In recent weeks, we have learned that the encrypted communications of 300,000 people in Iran were monitored by an entity using a certificate that DigiNotar issued. While the Iranian government has not admitted to conducting this man in the middle surveillance against its citizens, it seems reasonable to assume they were behind it. The reason for this certificate theft seems pretty clear, when you consider the other details described in this blog post:

Iran wants to spy on its citizens. It wants the same interception and spying capabilities that the US and other western governments have. Unfortunately for the Iranian government, it has no domestic CA, and Google doesn't have an office in Tehran. So, it used a certificate obtained by hacking into a CA already trusted by the browsers - a CA that had weak default passwords, and that covered up the attack for weeks after it learned about it, giving the Iranian government plenty of time to use the stolen certificate to spy on its citizens.

As Facebook, Twitter and other big sites embrace HTTPS by default, the temptation will grow for for governments without other ways to spy their citizens to hack into certificate authorities with weak security. Can you blame them?

NSA and other US government agencies have gambled with our security

In December 2009, after I had obtained Packet Forensics' product marketing materials, I met with a former senior US intelligence official. I told him that I believed that governments around the world were abusing this flaw to spy on their own citizens, as well as foreigners. When I told him I would be going public in a few months, motivated by my concerns about China and other governments spying on Americans, he said I would be aiding "terrorists in Peshawar" by helping to secure their communications. Needless to say, our meeting wasn't particularly productive.

US intelligence agencies have long known about the flaws associated with the current certificate authority web of trust. For example, in 1998, James Hayes, an air force captain working for the National Security Agency published an academic paper in which he described the ease with which certificates could be used to intercept traffic:

Certificate masquerading allows a masquerader to substitute an unsuspecting server’s valid certificate with the masquerader’s valid certificate. The masquerader could monitor Web traffic, picking up unsuspecting victims’ surfing habits, such as the various net shopping malls and stores a victim may visit. The masquerader could change messages at will without detection, or collect the necessary information and go shopping on his or her own time.

Of course, it isn't too surprising that NSA has known about these vulnerabilities. If the agency hadn't know about these risks, it would have been grossly incompetent.

The question to consider then, is what has and hasn't the NSA done with this knowledge. In addition to attacking the computers of foreign governments, NSA is supposed to protect US government electronic assets. In the 10 years since NSA first acknowledged it knew about the problems with certificate authorities, what steps has the agency taken to protect US government computers from these attacks? Likewise, what has it done to protect US businesses and individuals?

The answer, I believe, is "nothing". The reason for this, I suspect, is that NSA wanted to exploit the flaws itself and didn't want to do anything that would lead to the elimination of what is likely a valuable source of intelligence information -- even though this meant that the governments of China, Turkey, Israel, Tunisia and Venezuela would have access to this surveillance method too.

Perhaps this was a reasonable choice to make, when the intelligence agencies abusing the flaw could be trusted to do so discreetly (The first rule of State-run CA Club is...). The Iranians have upset that delicate understanding. They have acquired and used certificates in a manner that is anything but discreet, thus forcing the issue to the front page of newspapers around the world.

Now, any state actor or criminal enterprise with a budget to hire hackers can likely get its hands on fraudulent certificates sufficient to intercept users' communications, as Comodo and DigiNotar will not be the last certificate authorities with weak security to be hacked. Hundreds of millions of computers around the world remain vulnerable to this attack, and will likely stay this way, until the web browser vendors decide upon and deploy effective defenses.

Had the US defense and intelligence community acted 10 years ago to protect the Internet, instead of exploiting this flaw, we would not be in the dire situation that we are currently in, waiting for the next hacked certificate authority, or the next man in the middle attack.

Thursday, August 04, 2011

Warrantless "emergency" surveillance of Internet communications by DOJ up 400%

According to an official DOJ report, the use of "emergency", warrantless requests to ISPs for customer communications content has skyrocketed over 400% in a single year.

The 2009 report (pdf), which I recently obtained via a Freedom of Information Act request (it took DOJ 11 months (pdf) to give me the two-page report), reveals that law enforcement agencies within the Department of Justice sought and obtained communications content for 91 accounts. This number is a significant increase over previous years: 17 accounts in 2008 (pdf), 9 accounts in 2007 (pdf), and 17 accounts in 2006 (pdf).

Background

When Congress passed the Electronic Communications Privacy Act in 1986, it permitted law enforcement agencies to obtain stored communications and customer records in emergencies without the need for a court order.

In such scenarios, a carrier can (but is not required to) disclose the requested information if it, "in good faith, believes that an emergency involving danger of death or serious physical injury to any person requires disclosure without delay of communications relating to the emergency." Typically, belief means that a police officer states that an emergency exists.

With the passage of the USA PATRIOT Improvement and Reauthorization Act of 2005, Congress created specific statistical reporting requirements for the voluntary disclosure of the contents of subscriber communications in emergency situations. In describing his motivation for introducing the requirement, Senator Lungren stated that:

"I felt that some accountability is necessary to ensure that this authority is not being abused… This information [contained in the reports] I believe should be highly beneficial to the Committee, fulfilling our oversight responsibility in the future … this is the best way for us to have a ready manner of looking at this particular section. In the hearings that we had, I found no basis for claiming that there has been abuse of this section. I don't believe on its face it is an abusive section. But I do believe that it could be subject to abuse in the future and, therefore, this allows us as Members of Congress to have an ability to track this on a regular basis."

The current reports are deeply flawed

The emergency request reports are compiled and submitted by the Attorney General, and only apply to disclosures made to law enforcement agencies within the Department of Justice. As such, there are no statistics for emergency disclosures made to other federal law enforcement agencies, such as the Secret Service, as well as those made to state and local law enforcement agencies.

Furthermore, although 18 USC 2702 permits both the disclosure of the content of communications, as well as non-content records associated with subscribers and their communications (such as geo-location data), Congress only required that statistics be compiled for the disclosure of communications content. It is not clear why Congress limited the reports in this way.

Because the reporting requirements do not apply to disclosures made to law enforcement agencies outside the Department of Justice, and do not include the disclosure of non-content communications data and other subscriber records, the reports reveal a very limited portion of the scale of voluntary disclosures to law enforcement agencies.

Likewise, although Congress intended for these reports to assist with public oversight of the emergency disclosure authority, the Department of Justice has not proactively made these reports available to the general public. The reports for 2006 and 2007 were leaked to me by a friend with contacts on the Hill. I obtained the 2008 and 2009 reports via FOIA requests -- and disgracefully, it took DOJ 11 months to provide me with a copy of the 2-page report for 2009.

The emergency requests documented in these reports only scratch the surface

A letter (pdf) submitted by Verizon to Congressional committees in 2007 revealed that the company had received 25,000 emergency requests during the previous year. Of these 25,000 emergency requests, just 300 requests were from federal law enforcement agencies. In contrast, the reports submitted to Congress by the Attorney General reveal less than 20 disclosures for that year. Even though no other service provider has disclosed similar numbers regarding emergency disclosures, it is quite clear that the Department of Justice statistics are not adequately reporting the scale of this form of surveillance. In fact, they underreport these disclosures by several orders of magnitude.

The current reporting law is largely useless. It does not apply to state and local law enforcement agencies, who make tens of thousands of warrantless requests to ISPs each year. It does not apply to federal law enforcement agencies outside DOJ, such as the Secret Service. Finally, it does not apply to emergency disclosures of non-content information, such as geo-location data, subscriber information (such as name and address), or IP addresses used.

As such, Congress currently has no idea how many warrantless requests are made to ISPs each year. How can it hope to make sane policy in this area, when it has no useful data?

Friday, June 24, 2011

Privacy preserving FOIA lawsuits

Several weeks ago, after an extremely successful online fundraising effort to cover the costs, I filed a FOIA complaint in Washington, DC Federal District Court.

Before filing the complaint, I looked through the court website and paid particular attention to a document posted there, titled Information for Parties Who Wish to File a Civil Complaint (pdf), which states:

The name of this Court must be written at the top of the first page [of the complaint]. The complete name and address for each plaintiff must be included in the caption of the complaint. A Post Office Box is insufficient as an address, unless you file a separate motion asking the Court to permit such an address.

Since moving to Washington DC, I've tried to keep my residential address out of databases, primarily by using a PO Box for everything possible. As such, I wasn't too keen on my home address showing up in a public court docket. Following the guidance given by the court, I put my PO box address on my FOIA complaint and filed an accompanying Motion To Include PO Box Address on Complaint.

Two weeks later, I called the court clerk to find out the status of the case, I was told that my motion had been rejected and that the my complaint and all the accompanying documents had been sent back to me.

The clerk didn't actually tell me the reason why the motion had been rejected, and so as soon as I returned to DC, I refiled the complaint with my home address, which was promptly docketed by the clerk.

Several days later, an envelope from the clerk arrived in the mail, which included a copy of the motion that I had filed. Written on it was a note by Judge Royce Lamberth, informing me that my motion was denied, but that the court would reconsider it if I provided my residence address to be filed under seal for the court and defendants.

This news came too late for me -- my home address is now in the DC court docket (something I am still rather upset about), but perhaps this information will be useful to others.

Motion for Po Box Denied

Tuesday, May 24, 2011

Senators hint at DOJ's secret reinterpretation and use of Section 215 of the Patriot Act

Summary

According to two Democratic Senators, the Department of Justice has secretly reinterpreted a controversial provision contained in the USA Patriot Act to give the government surveillance powers that are "inconsistent with the public’s understanding of these laws." The senators also accuse DOJ of misleading the American public when describing the use of this legal authority.

This disclosure builds on previous cryptic statements from DOJ officials regarding the use of "Section 215" powers for "sensitive collection program," and Senator Russ Feingold regarding repeated abuses of Section 215 that he was not permitted to publicly describe.

Although FBI Director Robert Mueller revealed earlier this year that the FBI has used Section 215 powers to monitor the sale of hydrogen peroxide, such data collection is unlikely to be the "sensitive collection program" about which several senators have tried to alert the public.

If I had to make a wild guess, I suspect it is likely related to warrantless, massive scale collection of geo-location information from cellular phones.

Secret reinterpretations of the law

Marcy Wheeler reported this evening that Senators Wyden and Udall, both of whom are on the Intelligence committee have submitted an amendment (pdf) as part of the rushed, bipartisan effort to reauthorize Patriot Act. The amendment is noteworthy not because of the changes to the law it proposes, but the information it reveals:

(6) United States Government officials should not secretly reinterpret public laws and statutes in a manner that is inconsistent with the public’s understanding of these laws, and should not describe the execution of these laws in a way that misinforms or misleads the public;

(7) On February 2, 2011, the congressional intelligence committees received a secret report from the Attorney General and the Director of National Intelligence that has been publicly described as pertaining to intelligence collection authorities that are subject to expiration under section 224 of the USA PATRIOT Act (Public Law 107–56; 115 Stat. 295); and

(8) while it is entirely appropriate for particular intelligence collection techniques to be kept secret, the laws that authorize such techniques, and the United States Government’s official interpretation of these laws, should not be kept secret but should instead be transparent to the public, so that these laws can be the subject of informed public debate and consideration.

For those of you who don't read legalese, this means that the Department of Justice has secretly reinterpreted a controversial provision in the Patriot Act, likely Section 215, and is using it in a way that is inconsistent with the public's understanding of the law.

DOJ has already admitted that Section 215 is being used for a "sensitive collection program"

On September 22, 2009, Todd Hinnen, then the Deputy Assistant Attorney General for law and policy in DOJ’s National Security Division testified before the House Judiciary Subcommittee on the Constitution, Civil Rights, and Civil Liberties in support of the reauthorization of key provisions of the USA PATRIOT Act.

During his oral testimony, Mr. Hinnen stated that:

"The business records provision [Section 215] allows the government to obtain any tangible thing it demonstrates to the FISA court is relevant to a counterterrorism or counterintelligence investigation.

This provision is used to obtain critical information from the businesses unwittingly used by terrorists in their travel, plotting, preparation for, communication regarding, and execution of attacks.

It also supports an important, sensitive collection program about which many members of the subcommittee or their staffs have been briefed."

Section 215 has been repeatedly abused

On October 1, 2009, Senator Feingold made several statements regarding abuses of Section 215 during a Senate Judiciary Committee markup hearing:

"I remain concerned that critical information about the implementation of the Patriot Act remains classified. Information that I believe, would have a significant impact on the debate..... There is also information about the use of Section 215 orders that I believe Congress and the American People deserve to know. It is unfortunate that we cannot discuss this information today.

…

Mr Chairman, I am also a member of the intelligence Committee. I recall during the debate in 2005 that proponents of Section 215 argued that these authorities had never been misused. They cannot make that statement now. They have been misused. I cannot elaborate here. But I recommend that my colleagues seek more information in a classified setting.

…

I want to specifically disagree with Senator Kyle's statement that just the fact that there haven't been abuses of the other provisions which are Sunsetted. That is not my view of Section 215. I believe section 215 has been misused as well."

Likewise, after the Senate rejected several reforms of Section 215 powers in 2009, Senator Durbin told his colleagues that:

"[T]he real reason for resisting this obvious, common-sense modification of Section 215 is unfortunately cloaked in secrecy. Some day that cloak will be lifted, and future generations will whether ask our actions today meet the test of a democratic society: transparency, accountability, and fidelity to the rule of law and our Constitution."

Conclusion

Clearly, there are many unanswered questions - we do not know what kind of data collection is occurring, and why it is problematic enough to cause four senators to speak up publicly. However, given that four senators have now spoken up, this strongly suggests that there is something seriously rotten going on.

Tuesday, May 03, 2011

Industry-created "privacy enhancing" abandonware

Industry loves self regulation and why shouldn't it? Given the choice between strong enforcement by a federal agency, and scout's honor promises, industry would be foolish to support a strong FTC.

Unfortunately, the self-regulatory groups and organizations that are created in response to the threat of regulation are often extremely short lived.

Pam Dixon noted this in her her comment (pdf) submitted in response to the FTC's recent privacy report:

[I]ndustry knows that the Commission’s attention span is limited. When the Commission showed interest in online privacy in the years before 2000, industry responded by developing and loudly trumpeting a host of privacy self-regulatory activities. Most of these activities were strictly for the purpose of convincing policy makers at the Commission and elsewhere that regulation or legislation was a bad idea. All of these activities actually or effectively disappeared as soon as new appointees to the Commission demonstrated a lack of interest in regulatory or legislative approaches to privacy.

[These include:]

The Individual Reference Services Group (IRSG) was announced in 1997 as a self-regulatory organization for companies that provide information that identifies or locates individuals. The group terminated in 2001.

The Privacy Leadership Initiative began in 2000 to promote self regulation and to support privacy educational activities for business and for consumers. The organization lasted about two years.

The Online Privacy Alliance began in 1998 with an interest in promoting industry self regulation for privacy. OPA’s last reported activity appears to have taken place in 2001, although its website continues to exist and shows signs of an update in 2011.

The Network Advertising Initiative had its origins in 1999, when the Federal Trade Commission showed interest in the privacy effects of online behavioral targeting. By 2003, when FTC interest in privacy regulation had evaporated, the NAI had only two members. Enforcement and audit activity lapsed as well. NAI did nothing to fulfill its promises or keep its standards up to date with current technology until 2008, when FTC interest increased

Industry created privacy enhancing software is made for regulators, not consumers

A few weeks ago, Ryan Singel at Wired wrote about Google's curious lack of support for Do Not Track (DNT). Rather than embracing the DNT header supported by the three other major browser vendors, Google is instead pushing the 3rd party browser plugins it has released that make it possible for consumers to retain their opt out cookies.

As I told Ryan then:

"[Google's] opt-out cookies and their plug-in are not aimed at consumers," Soghoian says. "They are aimed at policy makers. Their purpose is to give them something to talk about when they get called in front of Congress. No one is using this plug-in and they don’t expect anyone to use it."

Soon after this piece was published, I received a bit of pushback from several friends in Washington, who felt I was unfairly slamming the company.

However, when you actually examine the history of the industry's privacy enhancing technologies, they seem awfully similar to the short-lived self regulatory organizations that Pam Dixon highlighted.

Privacy enhancing abandonware

On March 11, 2009, Google entered the behavioral advertising market. On the same day, Google released its Advertising Cookie Opt-out Plugin for Firefox and Internet Explorer. The browser plugin permanently saves the DoubleClick opt-out cookie, enabling users to retain their opt-out status even after clearing all cookies.

Google's tool was a genuine innovation in privacy enhancing technologies. Furthermore, as the tool was released under an open source license, I was able to take the source code, expand it, and turn it into TACO, which opted consumers out of dozens of different ad networks.

The initial release of Google's plugin worked with Firefox 1.5 through 3.0.

In June 2009, Mozilla released Firefox 3.5. It took Google nearly two weeks to release an update to its plugin that was compatible with the new version of the browser.

One year later, Mozilla released Firefox 3.6 in January 2010. This time, it took more than a month for Google to release an updated version of the add-on.

Most recently, on March 22, 2011, Mozilla released Firefox 4.0. More than 5 weeks later, Google still has not released an updated version of its opt out add-on.

Google can perhaps be forgiven for ignoring the users of its Firefox privacy add-on -- the company's attention seems to have shifted to its new plugin: Keep My Opt Outs, which only supports the company's Chrome Browser (the tool was ~~quickly rushed out~~ announced on the same day that Mozilla announced its support for Do Not Track).

Similarly, in November 2009, the Network Advertising Initiative (an organization representing many of the major ad networks) released its own Firefox plugin that makes opt out cookies permanent. NAI Executive Director Charles Curran told one journalist that "this [tool] has been a recognition of criticism of opt-outs that are recorded in cookies. It's essentially designed to prevent the standard sweep of cookies that you get from a cookie cache dump...It's designed to work with the browser functionality."

As with Google's plugin, although it has been more than 5 weeks since the the release of Firefox 4.0, the NAI plugin still has not been updated to support it.

Why updates are important

When a user upgrades to a new version of Firefox, the browser will check for available updates to all installed browser plugins. Any plugins that have not been updated to support the new browser release will be disabled. This is obviously a pretty big problem, which is why Mozilla actively encourages developers to make sure that their addons support upcoming versions of the browser. For the 4.0 version of Firefox, which was released in March, Mozilla started harassing add-on developers as far back as November, 2010.

As such, there are likely tens of thousands (if not more) users of Firefox 4.0 whose Advertising Cookie Opt-out Plugin is currently disabled due to incompatibility. The moment these users clear their cookies (something some many have configured to happen automatically when they restart their browser), they will lose their doubleclick.net behavioral advertising opt out cookie. Likewise, the thousands of Firefox 4.0 users who had previously installed the NAI opt out plugin have now lost the opt out cookie persistence that they were promised.

These firms have created privacy enhancing technologies and then loudly advertised them to consumers and regulators. Unfortunately, now that the attention of regulators has shifted to Do Not Track, both Google and the NAI appear to have abandoned the users of their respective plugins. Neither firm has provided their users with sufficient notice to let them know the impact, or let them know what other options they have to continue to maintain their opt out choices.

Perhaps the FTC will take notice?

Friday, April 22, 2011

How can US law enforcement agencies access location data stored by Google and Apple?

Note: I am not a lawyer. US privacy law is exceedingly complex. If I am wrong, I hope that someone who knows this better will chime in.

Over the past day, the iPhone location scandal has expanded beyond location data retained on the phone to data sent by iPhones and Android devices back to Apple and Google. This raises some really interesting issues, particularly regarding the degree to which these companies can be compelled to disclose that data to law enforcement agencies. In this blog post, I am going to try and examine the limited legal protections afforded to this data.

Introduction

Today, the Wall Street Journal reported that Apple's iPhones and iPads and Google's Android mobile phones all collect and transmit back to the companies data about a device's nearby WiFi access points, geo-location data, and in Google's case, a unique identifier.

According to the Journal, Android phones collect the data every few seconds and transmit it to the company at least several times an hour. Apple, meanwhile, "intermittently" collects data and transmits that data to itself every 12 hours.

The motivation for this data collection appears to be in order to create a large database of WiFi access points and their associated location, which can then be used by mobile devices to determine the user's approximate location information (doing so via WiFi uses far less battery power than using the GPS chip).

While such collection is likely entirely commercial in nature, this also raises serious privacy concerns regarding the ease with which law enforcement agencies can access this sensitive data.

A quick primer in location privacy law

The primary law in the US that governs the privacy of information kept by Internet and communications companies is the Electronic Communications Privacy Act (ECPA). This law dates back to 1986, long before cloud computing, email inboxes larger than 5 megabytes, or GPS enabled smartphones. To be quite blunt, the law is hopelessly out of date, and it is for this reason that the House and Senate held multiple hearings over the last two years focused on ECPA reform.

For user data to be protected by ECPA, it needs to fall into one of two categories:

An "electronic communication service" ("ECS") is "any service which provides to users thereof the ability to send or receive wire or electronic communications." Examples of this include telephone email services.

A "remote computing service" ("RCS") is a "provision to the public of computer storage or processing services by means of an electronic communications system." Roughly speaking, a remote computing service is provided by an off-site computer that stores or processes data for a user. Examples of this likely include data stored in the cloud, such as online backup services.

ECPA provides varying degrees of protections for communications content and non-content data stored by an ECS or RCS (without going too far into the details, communications content generally required a warrant, and most non-content data can be obtained with a lesser court order). However, if the service is neither an ECS, nor an RCS, law enforcement agencies can obtain the information with a mere subpoena, without getting a judge to sign off on the order.

Location data under ECPA

Law enforcement agencies routinely obtain location data from wireless telephone companies. Depending on the kind of data sought (historical or real time, fine-grained or approximate tower data), the kind of court order varies between a probable cause warrant, or an order based upon facts showing that the information will be relevant and material to an ongoing investigation.

It is important to note that the wireless carriers are providing their customers with a communications service, and that the location data is usually generated in the process of the users' phone transmitting voice or other data to a tower. While most consumers probably do not realize that the phone companies know where they are whenever they make a call or check their email, consumers are at least knowingly making a call or checking their email. As such, the location data obtained by the government quite clearly falls into the ECS category under ECPA.

Internet companies, location data and ECPA

In 2009, Google launched Latitude, its mobile location check-in competitor to Loopt and Foursquare. Shortly after the launch, the EFF reported that both Loopt and Google had pledged to require that user location data would only be delivered to law enforcement agencies in response to a warrant.

As EFF explained at the time:

When it comes to friend-finding services, we think it’s clear that your location information is the content of a private communication between you and your friends, and that it deserves the same legal protections against wiretapping as the content of your phone calls or your emails.

Because the text of ECPA doesn't actually include the word "location", Loopt and Google tried to get the best protections they could for users' check-in data by arguing that it is in fact a communication transmitted through their service to users' friends. That is, these firms argued that check-in location data is is an ECS.

(Note to legal experts: I am simplifying this a little bit, since these companies actually insisted on a wiretap order. The companies don't keep any historical location data by default, other than the most recent data-point, so they insisted on an intercept order before they would start retaining future location data).

iPhone/Android location data: ECS, RCS or neither?

Now, with this in mind, lets consider the location data transmitted covertly by iPhones and Android devices. Given that the existence of this information collection and transmission wasn't widely disclosed to users (other than in privacy policies that no one reads), that it didn't hit the press until this week, and that users are not knowingly transmitting the information to their friends or anyone else, I think it is going to be pretty tough for these two firms to be able to claim that this location data falls into the ECS protections of ECPA. This location data is simply not a communication by the user.

Similarly, I don't think that these companies can reasonably claim that this location data falls into the category of an RCS, since it isn't a storage or processing service provided to the user. Quite simply, the companies are collecting this data for their own benefit, not the user's, who probably has no idea that it is being collected and transmitted to a server somewhere.

What this means, I think, is that this location data likely does not fall under the protections of ECPA, which means that law enforcement agencies can likely obtain it with just a subpoena.

Now, it is quite possible that if and when these firms receive a request for this data, they could refuse to comply with the subpoena, and argue that it should be subject to the protections of the 4th Amendment. Certainly, some judges around the country have decided that mobile phone location data is sensitive enough to require a probable cause warrant issued by a judge. However, many other judges do not agree with that theory. Without the protections of ECPA, if the courts do not think this data deserves 4th amendment protections, there is nothing to stop law enforcement agencies from getting it with a subpoena.

Conclusion

What should be clear after reading this post is that privacy law in this country is hopelessly out of date. The collection of location information by Apple and Google raises some really troubling questions regarding the degree to which existing law restricts law enforcement access to the data when it is not associated with a communication by the user, but rather, is collected without their knowledge or consent.

As I noted at the beginning of this post, I am not a legal expert (but a computer scientist by training). There are several fantastic privacy law experts out there, and I really hope that they look into this issue, and write their own, far more extensive analysis.

Tuesday, April 12, 2011

How Dropbox sacrifices user privacy for cost savings

Note: This flaw is different than the authentication flaw in Dropbox that Derek Newton recently published.

Summary

Dropbox, the popular cloud based backup service deduplicates the files that its users have stored online. This means that if two different users store the same file in their respective accounts, Dropbox will only actually store a single copy of the file on its servers.

The service tells users that it "uses the same secure methods as banks and the military to send and store your data" and that "[a]ll files stored on Dropbox servers are encrypted (AES-256) and are inaccessible without your account password." However, the company does in fact have access to the unencrypted data (if it didn't, it wouldn't be able to detect duplicate data across different accounts).

This bandwidth and disk storage design tweak creates an easily observable side channel through which a single bit of data (whether any particular file is already stored by one or more users) can be observed.

If you value your privacy or are worried about what might happen if Dropbox were compelled by a court order to disclose which of its users have stored a particular file, you should encrypt your data yourself with a tool like truecrypt or switch to one of several cloud based backup services that encrypt data with a key only known to the user.

Introduction

For those of you who haven't heard of it, Dropbox is a popular cloud-based backup service that automatically synchronizes user data. It is really easy to use and the company even offers users 2GB of storage for free, with the option to pay for more space.

The problem is, offering free storage space to users can be quite expensive, at least once you gain millions of users. In what I suspect was a price-motivated design decision, Dropbox deduplicates the data uploaded by its users. What this means is that if two users backup the same file, Dropbox only stores a single copy of it. The file still appears in both users' accounts, but the company doesn't consume storage space nor upload bandwidth on a second copy of the file.

The company's CTO described the deduplication in a note posted in the "Bugs & Troubleshooting" section on the company's web forum last year:

Woah! How did that 750MB file upload so quickly?

Dropbox tries to be very smart about minimizing the amount of bandwidth used. If we detect that a file you're trying to upload has already been uploaded to Dropbox, we don't make you upload it again. Similarly, if you make a change to a file that's already on Dropbox, you'll only have to upload the pieces of the file that changed.

This works across all data on Dropbox, not just your own account. There are no security implications [emphasis added] - your data is still kept logically separated and not affected by changes that other users make to their data.

Ashkan Soltani was able to verify the deduplication for himself a couple weeks ago. It took just a few minutes with a packet sniffer. A new randomly generated 6.8MB file uploaded to dropbox lead to 7.4MB of network traffic, while a 6.4MB file that had been previously uploaded to a different dropbox account lead to just 16KB in network traffic.

Claims of security and privacy

There are long standing privacy and security concerns with storing data in the cloud, and so Dropbox has a helpful page on their website which attempts to address these:

Your files are actually safer while stored in your Dropbox than on your computer in some cases. We use the same secure methods as banks and the military to send and store your data.

Dropbox takes the security of your files and of our software very seriously. We use the best tools and engineering practices available to build our software, and we have smart people making sure that Dropbox remains secure. Your files are backed-up, stored securely, and password-protected.

...

Dropbox uses modern encryption methods to both transfer and store your data...

All files stored on Dropbox servers are encrypted (AES-256) and are inaccessible without your account password

Reading through this document, it would be easy for anyone but a crypto expert to get the false impression that Dropbox does in fact protect the security and privacy of users' data. Many users and even the technology press will not realize that AES-256 is useless against many attacks if the encryption key isn't kept private.

What is missing from the firm's website is a statement regarding how the company is using encryption, and in particular, what kinds of keys are used and who has access to them.

Encryption and deduplication

Encryption and deduplication are two technologies that generally don't mix well. If the encryption is done correctly, it should not be possible to detect what files a user has stored (or even if they have stored the same file as someone else), and so deduplication will not be possible.

Dropbox is likely calculating hashes of users' files before they are transmitted to the company's servers. While it is not clear if the company is using a single encryption key for all of the files users' have stored with the service, or multiple encryption keys, it doesn't really matter (from a privacy and security standpoint), because Dropbox knows the keys. If the company didn't have access to the encryption keys, it wouldn't be able to detect duplicate files.

While the decision to deduplicate data has probably saved the company quite a bit of storage space and bandwidth, it has significant flaws which are particularly troubling given the statements made by the company on its security and privacy page.

Cloud backup providers do not need to design their products this way. Spideroak and Tarsnap are two competing services that encrypt their users' data with a key only known to that user. These companies have opted to put their users' privacy first, but the side effect is that they require more back-end storage space. If 20 users upload the same file, both companies upload and store 20 copies of that file (and in fact, they have no way of knowing if a user is uploading something that another user has backed up).

Why is this a problem?

As Ashkan Soltani was able to test in just a few minutes, it is possible to determine if any given file is already stored by one or more Dropbox users, simply by observing the amount of data transferred between your own computer and Dropbox's servers. If the file isn't already stored by Dropbox, the entire file will be uploaded. If Dropbox has the file already, just a few kb of communication will occur.

While this doesn't tell you which other users have uploaded this file, presumably Dropbox can figure it out. I doubt they'd do it if asked by a random user, but when presented with a court order, they could be forced to.

What this means, is that from the comfort of their desks, law enforcement agencies or copyright trolls can upload contraband files to Dropbox, watch the amount of bandwidth consumed, and then obtain a court order if the amount of data transferred is smaller than the size of the file.

Last year, the New York Attorney General announced that Facebook, MySpace and IsoHunt had agreed to start comparing every image uploaded by a user to an AG supplied database of more than 8000 hashes of child pornography. It is easy to imagine a similar database of hashes for pirated movies and songs, ebooks stripped of DRM, or leaked US government diplomatic cables.

Responsible Disclosure

On April 1, 2011, Marcia Hofmann at the Electronic Frontier Foundation contacted Dropbox to let them know about the flaw, and that a researcher would be publishing the information on April 12th. There are plenty of horror stories of security researchers getting threatened by companies, and so I hoped that by keeping my identity a secret, and having an EFF attorney notify the company about the flaw, that I would reduce my risk of trouble.

At 6:15PM west coast time on April 11th, an attorney from Fenwick & West retained by Dropbox left Marcia a voicemail message, in which he reveled that: "the company is updating their privacy policy and security overview that is on the website to add further detail."

Marcia spoke with the company's attorney this morning, and was told that the company will be updating its privacy policy and security overview to clarify that if Dropbox receives a warrant, it has the ability to remove its own encryption to provide data to law enforcement.

While I want to praise the company for being willing to clarify the security statements made on its website, I hope this will be a first step on this issue, and not the last.

It is unlikely that the millions of existing Dropbox users will stumble across the new privacy policy in their regular web browsing. As such, the company should send out an email to its users to let them know about this flaw, and advise them of the steps they can take if they are concerned about the privacy of their data.

I also urge the company to abandon its deduplication system design, and embrace strong encryption with a key only known to each user. Other online backup services have done it for some time. This is the only real way that data can be secure in the cloud.

Wednesday, March 23, 2011

DEA rejects FOIA for 38 pages of docs related to Sprint's digital surveilance API

As some of my regular readers know, in October 2009, I attended an invitation-only surveillance industry conference in Washington DC. It was at that event where I recorded an executive from Sprint bragging about the 8 million GPS queries his company delivered via a special website to law enforcement agencies in a 13 month period.

At that same event, Paul W. Taylor, the manager of Sprint/Nextel’s Electronic Surveillance team revealed that the wireless carrier also provides a next-generation surveillance API to law enforcement agencies, allowing them to automate and digitally submit their requests for user data:

"We have actually our LSite [Application Programming Interface (API)] is, there is no agreement that you have to sign. We give it to every single law enforcement manufacturer, the vendors, the law enforcement collection system vendors, we also give it to our CALEA vendors, and we've given it to the FBI, we've given it to NYPD, to the Drug Enforcement Agency. We have a pilot program with them, where they have a subpoena generation system in-house where their agents actually sit down and enter case data, it gets approved by the head guy at the office, and then from there, it gets electronically sent to Sprint, and we get it ... So, the DEA is using this, they're sending a lot and the turn-around time is 12-24 hours. So we see a lot of uses there."

My PhD research is focused on the relationship between communications and applications service providers and the government, and the way that these companies voluntarily facilitate (or occasionally, resist) surveillance of their customers. As such, this sounded pretty interesting, and so on December 3, 2009, I filed a FOIA request with the DEA to get documents associated with the Sprint LSite API and the DEA's use of the system.

On March 8, 2011, I received a letter (pdf) from the DEA, telling me that although they found 38 pages of relevant material, they are withholding every single page.

I will of course be appealing this rejection, either by myself, or with any luck, someone experienced with FOIA appeals and litigation will contact me and offer to help.

It is bad enough that Sprint is bending over backwards to assist the government in its surveillance of Sprint customers, but what is even worse, is that the DEA is refusing to allow the public to learn anything about this program. If, as Mr Taylor suggested, there is a computer in every DEA office connected directly to Sprint's computer systems, the public has a right to know.

Monday, March 21, 2011

The negative impact of AT&T's purchase of T-Mobile on the market for privacy

Yesterday, AT&T announced that it will be purchasing T-Mobile, the fourth largest wireless carrier in the US. While there are many who have raised antitrust concerns about this deal due to the impact it will have on the price of wireless services and mobile device/application choice, I want to raise a slightly different concern: the impact this will have on privacy.

While it is little known to most consumers, T-Mobile is actually the most privacy preserving of the major wireless carriers. As I described in a blog post earlier this year, T-Mobile does not have or keep IP address logs for its mobile users. What this means is that if the FBI, police or a civil litigant wish to later learn which user was using a particular IP address at a given date and time, T-Mobile is unable to provide the information.

In comparison, Verizon, AT&T and Sprint all keep logs regarding the IP addresses they issue to their customers, and in some cases, even the individual URLs of the pages viewed from handsets.

While privacy advocates encourage companies to retain as little data about their customers as possible, the Department of Justice wants them to retain identifying IP data for long periods of time. Enough so that T-Mobile was called out (albeit not by name) by a senior DOJ official at a data retention hearing at the House Judiciary Committee back in January:

"One mid-size cell phone company does not retain any records, and others are moving in that direction."

If and when the Federal government approves this deal, T-Mobile's customers and infrastructure will likely be folded into the AT&T mothership. As a result, T-Mobile's customers will lose their privacy preserving ISP, and instead have their online activities tracked by AT&T.

After this deal goes through, there will be three major wireless carriers, all of whom have solid track records of being hostile to privacy:

AT&T, a company that voluntarily participated in the Bush-era warrantless wiretapping program in which it illegally disclosed its customers communications to the National Security Agency.

Verizon, a company that similarly voluntarily participated in the warrantless wiretapping program, and then when sued by the Electronic Frontier Foundation, argued in court that it had free speech right protected by the 1st Amendment to disclose that data to the NSA.

Sprint, a company that established a website so that law enforcement agencies would no longer have to go through the trouble of seeking the assistance of Sprint employees in order to locate individual Sprint customers. This website was then used to ping Sprint users more than 8 million times in a single year.

The market for privacy

Today, privacy is largely an issue risk mitigation for firms. Chief Privacy Officers are tasked with protecting against data breaches, and class action lawsuits related to the 3rd party cookies that litter companies' homepages. The privacy organizations within companies do not bring in new customers, or improve the bottom line, but protect the firm from regulators and class action lawyers.

Recently, there are signs that this may be changing. Microsoft and Mozilla are now visibly competing on privacy features such as "Do Not Track" built into their web browsers. Several venture capital firms have invested cash into firms like Reputation.com and Abine who are selling privacy enhancing products to consumers.

To be clear, the market for privacy is in its infancy. As such, the government should be doing everything possible to nurture and encourage such growth. It is for that reason that the FTC should not permit the one and only privacy protecting major wireless carrier to be swallowed up by AT&T, a company that has repeatedly violated the privacy of its customers.

The FTC should lead the government's investigation into this deal, and should reject it on privacy grounds

When the FTC approved Google's merger with Doubeclick in 2007, then Commissioner Pamela Jones Harbour raised the issue of privacy in her dissent (pages 9-12). As I think history now confirms, the FTC erred in ignoring Commissioner Harbour and not considering the issue of privacy in the Google deal. However, many of her comments similarly apply to the AT&T/T-Mobile deal.

While the FTC cannot turn back the clock on Google/Doubleclick, it can and should protect the privacy of the millions of T-Mobile subscribers. The FTC should block this merger. However, even if the deal is permitted to go through, the FTC should at least extract strict privacy guarantees from AT&T that include a policy of not retaining IP address allocation or other Internet browsing logs.

If the FTC, Commerce Department and Congress want the market to provide privacy to consumers, then they need to make sure that consumers have options in this area. Without options, informed consumers cannot vote with their wallets. Companies that choose to go the extra mile to protect privacy should be rewarded for doing so, and not, when the market for privacy is so young, be swallowed up by those that steamroll over their customers' desire to keep their data safe.

Friday, March 11, 2011

Federal judge in Twitter/Wikileaks case rules that consumers read privacy policies

Earlier this afternoon, a federal magistrate judge issued an order in the much-hyped Twitter/Wikileaks case. While I will leave it to others in the media to analyze the order and its impact, I do want to focus on one specific issue.

The three individuals who objected to having their Twitter account records obtained by the government (referred to in the order as the petitioners) raised an interesting 4th amendment claim regarding their IP address information. Building on recent developments in the area of location privacy (where the 3rd circuit ruled that consumers do not knowingly transmit their location information to phone companies, because they generally don't understand the technical details of how phones work), the individuals here claimed that they didn't realize that they were conveying their IP addresses to Twitter, and thus maintained a privacy interest in this information.

The judge didn't buy this argument at all -- but rather than focusing on the fact that two of the individuals are skilled security experts who obviously understand how IP addresses work, she instead based her decision on Twitter's privacy policy. From page 13 of her order:

In an attempt to distinguish the reasoning of Smith v. Maryland and Bynum, petitioners content that Twitter users do not directly, visibly, or knowingly convey their IP addresses to the website, and thus maintain a legitimate privacy interest. This is inaccurate. Before creating a Twitter account, readers are notified that IP addresses are among the kinds of "Log Data" that Twitter collects, transfers and manipulates. See Warshak, 2010 recognizing that internet service provider's notice of intent to monitor subscribers' emails diminishes expectation of privacy). Thus, because petitioners voluntarily conveyed their IP addresses to Twitter as a condition of use, they have no legitimate Fourth Amendment privacy interest.

A footnote below the paragraph states further that:

At the hearing, petitioners suggested that they did not read or understand Twitter's Privacy Policy, such that any conveyance of IP addresses to Twitter was involuntary. This is unpersuasive. Internet users are bound by the terms of click-through agreements made online. A.V. ex rel. Vanderhye v. iParadigms, LLC, 544 F.Supp.2d 473,480 (E.D. Va. 2008) (finding a valid "clickwrap" contract where users clicked "I Agree" to acknowledge their acceptance of the terms) (aff'd A.V. ex rel v. iParadigms, LLC, F.3d 630,645 n.8 (4th Cir. 2009). By clicking on "create my account", petitioners consented to Twitter's terms of use in a binding "clickwrap" agreement to turn over to Twitter their IP addresses and more.

Twitter's privacy policy

The facts here are quite a bit different than the Vanderhye v. iParadigms case that the judge cites. I will leave it to legal scholars to pick apart and analyze those differences. Instead, I want to highlight the Twitter sign up process, and then a few other facts which make it clear that it is absolutely insane to assume that consumers have read privacy policies, when all available evidence (and statements by several senior government officials) suggests the opposite.

When you sign up for a Twitter account, you are shown a copy of the 200-line Terms of Service, in a text-box which displays 5 lines of text at a time. Users are not required to scroll to the bottom, or click a checkbox acknowledging that they have read the terms. Instead, right above the clickable "Create My Account" button, there is the following line of text:

By clicking on "Create my account" below, you are agreeing to the Terms of Service above and the Privacy Policy.

The Twitter terms of service do not actually include any mention of IP addresses. Instead, it is Twitter's privacy policy that includes the following section of text in its sixth paragraph:

Log Data: Our servers automatically record information ("Log Data") created by your use of the Services. Log Data may include information such as your IP address, browser type, the referring domain, pages visited, and search terms. Other actions, such as interactions with advertisements, may also be included in Log Data.

Although the judge states in her order that "[b]efore creating a Twitter account, readers are notified that IP addresses are among the kinds of 'Log Data' that Twitter collects, transfers and manipulates," that isn't entirely true.

It would be far more accurate to say that before creating a Twitter account, users are presented a link to a privacy policy, which includes a statement six paragraphs down about IP address collection. Users are further told that by clicking on a button to create the account, that they acknowledge that they read the linked privacy policy, although Twitter does not actually take any steps to make sure that users clicked on the link or scrolled through the content on that page.

Of course, it wouldn't really matter if Twitter forced people to click on the privacy policy, or scroll through the page, because everyone knows that consumers won't actually read through the text.

The FTC and Supreme Court discuss privacy policies

In introductory remarks at a privacy roundtable in December 2009, Federal Trade Commission Chairman Leibowitz told those assembled in the room that:

We all agree that consumers don’t read privacy policies – or EULAs, for that matter.

Similarly, in a August 2009 interview, David Vladeck, the head of the FTC's Bureau of Consumer Protection told the New York Times that:

Disclosures are now written by lawyers, they’re 17 pages long. I don’t think they’re written principally to communicate information; they’re written defensively. I’m a lawyer, I’ve been practicing law for 33 years. I can’t figure out what the hell these consents mean anymore. And I don’t believe that most consumers either read them, or, if they read them, really understand it. Second of all, consent in the face of these kinds of quote disclosures, I’m not sure that consent really reflects a volitional, knowing act.

Even the Chief Justice of the US Supreme Court has weighed in the issue, albeit only in a speech before students in Buffalo, NY last year. Answering a student question, Roberts admitted he doesn’t usually read the terms of service or privacy polices, according to the Associated Press:

It has "the smallest type you can imagine and you unfold it like a map," he said. "It is a problem," he added, "because the legal system obviously is to blame for that." Providing too much information defeats the purpose of disclosure, since no one reads it, he said. "What the answer is," he said, "I don’t know."

Academic research on privacy policies

Among 222 study participants of the 2007 Golden Bear Omnibus Survey, the Samuelson Clinic found that only 1.4% reported reading EULAs often and thoroughly, 66.2% admit to rarely reading or browsing the contents of EULAs, and 7.7% indicated that they have not noticed these agreements in the past or have never read them.

Similarly, a survey of more than 2000 people by Harris Interactive in 2001 found that more than 60 percent of consumers said they had either "spent little or no time looking at websites' privacy policies" or "glanced through websites' privacy policies, but . . . rarely read them in depth." Of those individuals surveyed, only 3 percent said that "most of the time, I carefully read the privacy policies of the websites I visit."

However, while the vast majority of consumers don't read privacy policies, some do seem to notice the presence of a privacy policy on a company's website. Unfortunately, most Americans incorrectly believe that the phrase privacy policy signifies that their information will be kept private. A 2003 survey by Annenberg found that 57% of 1,200 adults who were using the internet at home agreed or agreed strongly with the statement "When a web site has a privacy policy, I know that the site will not share my information with other websites or companies." In the 2005 survey, questioners asked 1,200 people whether that same statement is true or false. 59% answered it is true.

Even if consumers were interested in reading privacy policies -- doing so would likely consume a significant amount of their time. A research team at Carnegie Mellon University calculated the time to read the privacy policies of the sites used by the average consumer, and determined that:

[R]eading privacy policies carry costs in time of approximately 201 hours a year, worth about $2,949 annually per American Internet user. Nationally, if Americans were to read online privacy policies word–for–word, we estimate the value of time lost as about $652 billion annually.

Finally, even if consumers took the time to try and read privacy policies, it is quite likely that many would not be capable of understanding them. In 2004, a team of researchers analyzed the content of 64 popular website's privacy policies, and calculated the reading comprehension skills that a reader would need to understand them. Their research revealed that:

Of the 64 policies examined, only four (6%) were accessible to the 28.3% of the Internet population with less than or equal to a high school education. Thirty-five policies (54%) were beyond the grasp of 56.6% of the Internet population, requiring the equivalent of more than fourteen years of education. Eight policies (13%) were beyond the grasp of 85.4% of the Internet population, requiring the equivalent of a postgraduate education. Overall, a large segment of the population can only reasonably be expected to understand a small fragment of the policies posted.

Conclusion

I don't know the caselaw well enough to say if the judge was correct in stating that clickwraps that link to privacy policies are binding. However, even if there is caselaw supporting this decision, it is in no way supported by evidence of actual consumer behavior, or common sense. If the Chief Justice of the Supreme Court doesn't read privacy policies, how can we expect this of regular consumers?