slight paranoia: April 2011

Friday, April 22, 2011

How can US law enforcement agencies access location data stored by Google and Apple?

Note: I am not a lawyer. US privacy law is exceedingly complex. If I am wrong, I hope that someone who knows this better will chime in.

Over the past day, the iPhone location scandal has expanded beyond location data retained on the phone to data sent by iPhones and Android devices back to Apple and Google. This raises some really interesting issues, particularly regarding the degree to which these companies can be compelled to disclose that data to law enforcement agencies. In this blog post, I am going to try and examine the limited legal protections afforded to this data.

Introduction

Today, the Wall Street Journal reported that Apple's iPhones and iPads and Google's Android mobile phones all collect and transmit back to the companies data about a device's nearby WiFi access points, geo-location data, and in Google's case, a unique identifier.

According to the Journal, Android phones collect the data every few seconds and transmit it to the company at least several times an hour. Apple, meanwhile, "intermittently" collects data and transmits that data to itself every 12 hours.

The motivation for this data collection appears to be in order to create a large database of WiFi access points and their associated location, which can then be used by mobile devices to determine the user's approximate location information (doing so via WiFi uses far less battery power than using the GPS chip).

While such collection is likely entirely commercial in nature, this also raises serious privacy concerns regarding the ease with which law enforcement agencies can access this sensitive data.

A quick primer in location privacy law

The primary law in the US that governs the privacy of information kept by Internet and communications companies is the Electronic Communications Privacy Act (ECPA). This law dates back to 1986, long before cloud computing, email inboxes larger than 5 megabytes, or GPS enabled smartphones. To be quite blunt, the law is hopelessly out of date, and it is for this reason that the House and Senate held multiple hearings over the last two years focused on ECPA reform.

For user data to be protected by ECPA, it needs to fall into one of two categories:

An "electronic communication service" ("ECS") is "any service which provides to users thereof the ability to send or receive wire or electronic communications." Examples of this include telephone email services.

A "remote computing service" ("RCS") is a "provision to the public of computer storage or processing services by means of an electronic communications system." Roughly speaking, a remote computing service is provided by an off-site computer that stores or processes data for a user. Examples of this likely include data stored in the cloud, such as online backup services.

ECPA provides varying degrees of protections for communications content and non-content data stored by an ECS or RCS (without going too far into the details, communications content generally required a warrant, and most non-content data can be obtained with a lesser court order). However, if the service is neither an ECS, nor an RCS, law enforcement agencies can obtain the information with a mere subpoena, without getting a judge to sign off on the order.

Location data under ECPA

Law enforcement agencies routinely obtain location data from wireless telephone companies. Depending on the kind of data sought (historical or real time, fine-grained or approximate tower data), the kind of court order varies between a probable cause warrant, or an order based upon facts showing that the information will be relevant and material to an ongoing investigation.

It is important to note that the wireless carriers are providing their customers with a communications service, and that the location data is usually generated in the process of the users' phone transmitting voice or other data to a tower. While most consumers probably do not realize that the phone companies know where they are whenever they make a call or check their email, consumers are at least knowingly making a call or checking their email. As such, the location data obtained by the government quite clearly falls into the ECS category under ECPA.

Internet companies, location data and ECPA

In 2009, Google launched Latitude, its mobile location check-in competitor to Loopt and Foursquare. Shortly after the launch, the EFF reported that both Loopt and Google had pledged to require that user location data would only be delivered to law enforcement agencies in response to a warrant.

As EFF explained at the time:

When it comes to friend-finding services, we think it’s clear that your location information is the content of a private communication between you and your friends, and that it deserves the same legal protections against wiretapping as the content of your phone calls or your emails.

Because the text of ECPA doesn't actually include the word "location", Loopt and Google tried to get the best protections they could for users' check-in data by arguing that it is in fact a communication transmitted through their service to users' friends. That is, these firms argued that check-in location data is is an ECS.

(Note to legal experts: I am simplifying this a little bit, since these companies actually insisted on a wiretap order. The companies don't keep any historical location data by default, other than the most recent data-point, so they insisted on an intercept order before they would start retaining future location data).

iPhone/Android location data: ECS, RCS or neither?

Now, with this in mind, lets consider the location data transmitted covertly by iPhones and Android devices. Given that the existence of this information collection and transmission wasn't widely disclosed to users (other than in privacy policies that no one reads), that it didn't hit the press until this week, and that users are not knowingly transmitting the information to their friends or anyone else, I think it is going to be pretty tough for these two firms to be able to claim that this location data falls into the ECS protections of ECPA. This location data is simply not a communication by the user.

Similarly, I don't think that these companies can reasonably claim that this location data falls into the category of an RCS, since it isn't a storage or processing service provided to the user. Quite simply, the companies are collecting this data for their own benefit, not the user's, who probably has no idea that it is being collected and transmitted to a server somewhere.

What this means, I think, is that this location data likely does not fall under the protections of ECPA, which means that law enforcement agencies can likely obtain it with just a subpoena.

Now, it is quite possible that if and when these firms receive a request for this data, they could refuse to comply with the subpoena, and argue that it should be subject to the protections of the 4th Amendment. Certainly, some judges around the country have decided that mobile phone location data is sensitive enough to require a probable cause warrant issued by a judge. However, many other judges do not agree with that theory. Without the protections of ECPA, if the courts do not think this data deserves 4th amendment protections, there is nothing to stop law enforcement agencies from getting it with a subpoena.

Conclusion

What should be clear after reading this post is that privacy law in this country is hopelessly out of date. The collection of location information by Apple and Google raises some really troubling questions regarding the degree to which existing law restricts law enforcement access to the data when it is not associated with a communication by the user, but rather, is collected without their knowledge or consent.

As I noted at the beginning of this post, I am not a legal expert (but a computer scientist by training). There are several fantastic privacy law experts out there, and I really hope that they look into this issue, and write their own, far more extensive analysis.

Tuesday, April 12, 2011

How Dropbox sacrifices user privacy for cost savings

Note: This flaw is different than the authentication flaw in Dropbox that Derek Newton recently published.

Summary

Dropbox, the popular cloud based backup service deduplicates the files that its users have stored online. This means that if two different users store the same file in their respective accounts, Dropbox will only actually store a single copy of the file on its servers.

The service tells users that it "uses the same secure methods as banks and the military to send and store your data" and that "[a]ll files stored on Dropbox servers are encrypted (AES-256) and are inaccessible without your account password." However, the company does in fact have access to the unencrypted data (if it didn't, it wouldn't be able to detect duplicate data across different accounts).

This bandwidth and disk storage design tweak creates an easily observable side channel through which a single bit of data (whether any particular file is already stored by one or more users) can be observed.

If you value your privacy or are worried about what might happen if Dropbox were compelled by a court order to disclose which of its users have stored a particular file, you should encrypt your data yourself with a tool like truecrypt or switch to one of several cloud based backup services that encrypt data with a key only known to the user.

Introduction

For those of you who haven't heard of it, Dropbox is a popular cloud-based backup service that automatically synchronizes user data. It is really easy to use and the company even offers users 2GB of storage for free, with the option to pay for more space.

The problem is, offering free storage space to users can be quite expensive, at least once you gain millions of users. In what I suspect was a price-motivated design decision, Dropbox deduplicates the data uploaded by its users. What this means is that if two users backup the same file, Dropbox only stores a single copy of it. The file still appears in both users' accounts, but the company doesn't consume storage space nor upload bandwidth on a second copy of the file.

The company's CTO described the deduplication in a note posted in the "Bugs & Troubleshooting" section on the company's web forum last year:

Woah! How did that 750MB file upload so quickly?

Dropbox tries to be very smart about minimizing the amount of bandwidth used. If we detect that a file you're trying to upload has already been uploaded to Dropbox, we don't make you upload it again. Similarly, if you make a change to a file that's already on Dropbox, you'll only have to upload the pieces of the file that changed.

This works across all data on Dropbox, not just your own account. There are no security implications [emphasis added] - your data is still kept logically separated and not affected by changes that other users make to their data.

Ashkan Soltani was able to verify the deduplication for himself a couple weeks ago. It took just a few minutes with a packet sniffer. A new randomly generated 6.8MB file uploaded to dropbox lead to 7.4MB of network traffic, while a 6.4MB file that had been previously uploaded to a different dropbox account lead to just 16KB in network traffic.

Claims of security and privacy

There are long standing privacy and security concerns with storing data in the cloud, and so Dropbox has a helpful page on their website which attempts to address these:

Your files are actually safer while stored in your Dropbox than on your computer in some cases. We use the same secure methods as banks and the military to send and store your data.

Dropbox takes the security of your files and of our software very seriously. We use the best tools and engineering practices available to build our software, and we have smart people making sure that Dropbox remains secure. Your files are backed-up, stored securely, and password-protected.

...

Dropbox uses modern encryption methods to both transfer and store your data...

All files stored on Dropbox servers are encrypted (AES-256) and are inaccessible without your account password

Reading through this document, it would be easy for anyone but a crypto expert to get the false impression that Dropbox does in fact protect the security and privacy of users' data. Many users and even the technology press will not realize that AES-256 is useless against many attacks if the encryption key isn't kept private.

What is missing from the firm's website is a statement regarding how the company is using encryption, and in particular, what kinds of keys are used and who has access to them.

Encryption and deduplication

Encryption and deduplication are two technologies that generally don't mix well. If the encryption is done correctly, it should not be possible to detect what files a user has stored (or even if they have stored the same file as someone else), and so deduplication will not be possible.

Dropbox is likely calculating hashes of users' files before they are transmitted to the company's servers. While it is not clear if the company is using a single encryption key for all of the files users' have stored with the service, or multiple encryption keys, it doesn't really matter (from a privacy and security standpoint), because Dropbox knows the keys. If the company didn't have access to the encryption keys, it wouldn't be able to detect duplicate files.

While the decision to deduplicate data has probably saved the company quite a bit of storage space and bandwidth, it has significant flaws which are particularly troubling given the statements made by the company on its security and privacy page.

Cloud backup providers do not need to design their products this way. Spideroak and Tarsnap are two competing services that encrypt their users' data with a key only known to that user. These companies have opted to put their users' privacy first, but the side effect is that they require more back-end storage space. If 20 users upload the same file, both companies upload and store 20 copies of that file (and in fact, they have no way of knowing if a user is uploading something that another user has backed up).

Why is this a problem?

As Ashkan Soltani was able to test in just a few minutes, it is possible to determine if any given file is already stored by one or more Dropbox users, simply by observing the amount of data transferred between your own computer and Dropbox's servers. If the file isn't already stored by Dropbox, the entire file will be uploaded. If Dropbox has the file already, just a few kb of communication will occur.

While this doesn't tell you which other users have uploaded this file, presumably Dropbox can figure it out. I doubt they'd do it if asked by a random user, but when presented with a court order, they could be forced to.

What this means, is that from the comfort of their desks, law enforcement agencies or copyright trolls can upload contraband files to Dropbox, watch the amount of bandwidth consumed, and then obtain a court order if the amount of data transferred is smaller than the size of the file.

Last year, the New York Attorney General announced that Facebook, MySpace and IsoHunt had agreed to start comparing every image uploaded by a user to an AG supplied database of more than 8000 hashes of child pornography. It is easy to imagine a similar database of hashes for pirated movies and songs, ebooks stripped of DRM, or leaked US government diplomatic cables.

Responsible Disclosure

On April 1, 2011, Marcia Hofmann at the Electronic Frontier Foundation contacted Dropbox to let them know about the flaw, and that a researcher would be publishing the information on April 12th. There are plenty of horror stories of security researchers getting threatened by companies, and so I hoped that by keeping my identity a secret, and having an EFF attorney notify the company about the flaw, that I would reduce my risk of trouble.

At 6:15PM west coast time on April 11th, an attorney from Fenwick & West retained by Dropbox left Marcia a voicemail message, in which he reveled that: "the company is updating their privacy policy and security overview that is on the website to add further detail."

Marcia spoke with the company's attorney this morning, and was told that the company will be updating its privacy policy and security overview to clarify that if Dropbox receives a warrant, it has the ability to remove its own encryption to provide data to law enforcement.

While I want to praise the company for being willing to clarify the security statements made on its website, I hope this will be a first step on this issue, and not the last.

It is unlikely that the millions of existing Dropbox users will stumble across the new privacy policy in their regular web browsing. As such, the company should send out an email to its users to let them know about this flaw, and advise them of the steps they can take if they are concerned about the privacy of their data.

I also urge the company to abandon its deduplication system design, and embrace strong encryption with a key only known to each user. Other online backup services have done it for some time. This is the only real way that data can be secure in the cloud.