Dropbox, the popular cloud based backup service deduplicates the files that its users have stored online. This means that if two different users store the same file in their respective accounts, Dropbox will only actually store a single copy of the file on its servers.
The service tells users that it "uses the same secure methods as banks and the military to send and store your data" and that "[a]ll files stored on Dropbox servers are encrypted (AES-256) and are inaccessible without your account password." However, the company does in fact have access to the unencrypted data (if it didn't, it wouldn't be able to detect duplicate data across different accounts).
This bandwidth and disk storage design tweak creates an easily observable side channel through which a single bit of data (whether any particular file is already stored by one or more users) can be observed.
If you value your privacy or are worried about what might happen if Dropbox were compelled by a court order to disclose which of its users have stored a particular file, you should encrypt your data yourself with a tool like truecrypt or switch to one of several cloud based backup services that encrypt data with a key only known to the user.
For those of you who haven't heard of it, Dropbox is a popular cloud-based backup service that automatically synchronizes user data. It is really easy to use and the company even offers users 2GB of storage for free, with the option to pay for more space.
The problem is, offering free storage space to users can be quite expensive, at least once you gain millions of users. In what I suspect was a price-motivated design decision, Dropbox deduplicates the data uploaded by its users. What this means is that if two users backup the same file, Dropbox only stores a single copy of it. The file still appears in both users' accounts, but the company doesn't consume storage space nor upload bandwidth on a second copy of the file.
The company's CTO described the deduplication in a note posted in the "Bugs & Troubleshooting" section on the company's web forum last year:
Woah! How did that 750MB file upload so quickly?Ashkan Soltani was able to verify the deduplication for himself a couple weeks ago. It took just a few minutes with a packet sniffer. A new randomly generated 6.8MB file uploaded to dropbox lead to 7.4MB of network traffic, while a 6.4MB file that had been previously uploaded to a different dropbox account lead to just 16KB in network traffic.
Dropbox tries to be very smart about minimizing the amount of bandwidth used. If we detect that a file you're trying to upload has already been uploaded to Dropbox, we don't make you upload it again. Similarly, if you make a change to a file that's already on Dropbox, you'll only have to upload the pieces of the file that changed.
This works across all data on Dropbox, not just your own account. There are no security implications [emphasis added] - your data is still kept logically separated and not affected by changes that other users make to their data.
Claims of security and privacy
There are long standing privacy and security concerns with storing data in the cloud, and so Dropbox has a helpful page on their website which attempts to address these:
Your files are actually safer while stored in your Dropbox than on your computer in some cases. We use the same secure methods as banks and the military to send and store your data.
Dropbox takes the security of your files and of our software very seriously. We use the best tools and engineering practices available to build our software, and we have smart people making sure that Dropbox remains secure. Your files are backed-up, stored securely, and password-protected.
Dropbox uses modern encryption methods to both transfer and store your data...
All files stored on Dropbox servers are encrypted (AES-256) and are inaccessible without your account password
Reading through this document, it would be easy for anyone but a crypto expert to get the false impression that Dropbox does in fact protect the security and privacy of users' data. Many users and even the technology press will not realize that AES-256 is useless against many attacks if the encryption key isn't kept private.
What is missing from the firm's website is a statement regarding how the company is using encryption, and in particular, what kinds of keys are used and who has access to them.
Encryption and deduplication
Encryption and deduplication are two technologies that generally don't mix well. If the encryption is done correctly, it should not be possible to detect what files a user has stored (or even if they have stored the same file as someone else), and so deduplication will not be possible.
Dropbox is likely calculating hashes of users' files before they are transmitted to the company's servers. While it is not clear if the company is using a single encryption key for all of the files users' have stored with the service, or multiple encryption keys, it doesn't really matter (from a privacy and security standpoint), because Dropbox knows the keys. If the company didn't have access to the encryption keys, it wouldn't be able to detect duplicate files.
While the decision to deduplicate data has probably saved the company quite a bit of storage space and bandwidth, it has significant flaws which are particularly troubling given the statements made by the company on its security and privacy page.
Cloud backup providers do not need to design their products this way. Spideroak and Tarsnap are two competing services that encrypt their users' data with a key only known to that user. These companies have opted to put their users' privacy first, but the side effect is that they require more back-end storage space. If 20 users upload the same file, both companies upload and store 20 copies of that file (and in fact, they have no way of knowing if a user is uploading something that another user has backed up).
Why is this a problem?
As Ashkan Soltani was able to test in just a few minutes, it is possible to determine if any given file is already stored by one or more Dropbox users, simply by observing the amount of data transferred between your own computer and Dropbox's servers. If the file isn't already stored by Dropbox, the entire file will be uploaded. If Dropbox has the file already, just a few kb of communication will occur.
While this doesn't tell you which other users have uploaded this file, presumably Dropbox can figure it out. I doubt they'd do it if asked by a random user, but when presented with a court order, they could be forced to.
What this means, is that from the comfort of their desks, law enforcement agencies or copyright trolls can upload contraband files to Dropbox, watch the amount of bandwidth consumed, and then obtain a court order if the amount of data transferred is smaller than the size of the file.
Last year, the New York Attorney General announced that Facebook, MySpace and IsoHunt had agreed to start comparing every image uploaded by a user to an AG supplied database of more than 8000 hashes of child pornography. It is easy to imagine a similar database of hashes for pirated movies and songs, ebooks stripped of DRM, or leaked US government diplomatic cables.
On April 1, 2011, Marcia Hofmann at the Electronic Frontier Foundation contacted Dropbox to let them know about the flaw, and that a researcher would be publishing the information on April 12th. There are plenty of horror stories of security researchers getting threatened by companies, and so I hoped that by keeping my identity a secret, and having an EFF attorney notify the company about the flaw, that I would reduce my risk of trouble.
While I want to praise the company for being willing to clarify the security statements made on its website, I hope this will be a first step on this issue, and not the last.
I also urge the company to abandon its deduplication system design, and embrace strong encryption with a key only known to each user. Other online backup services have done it for some time. This is the only real way that data can be secure in the cloud.