Dropbox, the popular cloud based backup service deduplicates the files that its users have stored online. This means that if two different users store the same file in their respective accounts, Dropbox will only actually store a single copy of the file on its servers.
The service tells users that it "uses the same secure methods as banks and the military to send and store your data" and that "[a]ll files stored on Dropbox servers are encrypted (AES-256) and are inaccessible without your account password." However, the company does in fact have access to the unencrypted data (if it didn't, it wouldn't be able to detect duplicate data across different accounts).
This bandwidth and disk storage design tweak creates an easily observable side channel through which a single bit of data (whether any particular file is already stored by one or more users) can be observed.
If you value your privacy or are worried about what might happen if Dropbox were compelled by a court order to disclose which of its users have stored a particular file, you should encrypt your data yourself with a tool like truecrypt or switch to one of several cloud based backup services that encrypt data with a key only known to the user.
For those of you who haven't heard of it, Dropbox is a popular cloud-based backup service that automatically synchronizes user data. It is really easy to use and the company even offers users 2GB of storage for free, with the option to pay for more space.
The problem is, offering free storage space to users can be quite expensive, at least once you gain millions of users. In what I suspect was a price-motivated design decision, Dropbox deduplicates the data uploaded by its users. What this means is that if two users backup the same file, Dropbox only stores a single copy of it. The file still appears in both users' accounts, but the company doesn't consume storage space nor upload bandwidth on a second copy of the file.
The company's CTO described the deduplication in a note posted in the "Bugs & Troubleshooting" section on the company's web forum last year:
Woah! How did that 750MB file upload so quickly?Ashkan Soltani was able to verify the deduplication for himself a couple weeks ago. It took just a few minutes with a packet sniffer. A new randomly generated 6.8MB file uploaded to dropbox lead to 7.4MB of network traffic, while a 6.4MB file that had been previously uploaded to a different dropbox account lead to just 16KB in network traffic.
Dropbox tries to be very smart about minimizing the amount of bandwidth used. If we detect that a file you're trying to upload has already been uploaded to Dropbox, we don't make you upload it again. Similarly, if you make a change to a file that's already on Dropbox, you'll only have to upload the pieces of the file that changed.
This works across all data on Dropbox, not just your own account. There are no security implications [emphasis added] - your data is still kept logically separated and not affected by changes that other users make to their data.
Claims of security and privacy
There are long standing privacy and security concerns with storing data in the cloud, and so Dropbox has a helpful page on their website which attempts to address these:
Your files are actually safer while stored in your Dropbox than on your computer in some cases. We use the same secure methods as banks and the military to send and store your data.
Dropbox takes the security of your files and of our software very seriously. We use the best tools and engineering practices available to build our software, and we have smart people making sure that Dropbox remains secure. Your files are backed-up, stored securely, and password-protected.
Dropbox uses modern encryption methods to both transfer and store your data...
All files stored on Dropbox servers are encrypted (AES-256) and are inaccessible without your account password
Reading through this document, it would be easy for anyone but a crypto expert to get the false impression that Dropbox does in fact protect the security and privacy of users' data. Many users and even the technology press will not realize that AES-256 is useless against many attacks if the encryption key isn't kept private.
What is missing from the firm's website is a statement regarding how the company is using encryption, and in particular, what kinds of keys are used and who has access to them.
Encryption and deduplication
Encryption and deduplication are two technologies that generally don't mix well. If the encryption is done correctly, it should not be possible to detect what files a user has stored (or even if they have stored the same file as someone else), and so deduplication will not be possible.
Dropbox is likely calculating hashes of users' files before they are transmitted to the company's servers. While it is not clear if the company is using a single encryption key for all of the files users' have stored with the service, or multiple encryption keys, it doesn't really matter (from a privacy and security standpoint), because Dropbox knows the keys. If the company didn't have access to the encryption keys, it wouldn't be able to detect duplicate files.
While the decision to deduplicate data has probably saved the company quite a bit of storage space and bandwidth, it has significant flaws which are particularly troubling given the statements made by the company on its security and privacy page.
Cloud backup providers do not need to design their products this way. Spideroak and Tarsnap are two competing services that encrypt their users' data with a key only known to that user. These companies have opted to put their users' privacy first, but the side effect is that they require more back-end storage space. If 20 users upload the same file, both companies upload and store 20 copies of that file (and in fact, they have no way of knowing if a user is uploading something that another user has backed up).
Why is this a problem?
As Ashkan Soltani was able to test in just a few minutes, it is possible to determine if any given file is already stored by one or more Dropbox users, simply by observing the amount of data transferred between your own computer and Dropbox's servers. If the file isn't already stored by Dropbox, the entire file will be uploaded. If Dropbox has the file already, just a few kb of communication will occur.
While this doesn't tell you which other users have uploaded this file, presumably Dropbox can figure it out. I doubt they'd do it if asked by a random user, but when presented with a court order, they could be forced to.
What this means, is that from the comfort of their desks, law enforcement agencies or copyright trolls can upload contraband files to Dropbox, watch the amount of bandwidth consumed, and then obtain a court order if the amount of data transferred is smaller than the size of the file.
Last year, the New York Attorney General announced that Facebook, MySpace and IsoHunt had agreed to start comparing every image uploaded by a user to an AG supplied database of more than 8000 hashes of child pornography. It is easy to imagine a similar database of hashes for pirated movies and songs, ebooks stripped of DRM, or leaked US government diplomatic cables.
On April 1, 2011, Marcia Hofmann at the Electronic Frontier Foundation contacted Dropbox to let them know about the flaw, and that a researcher would be publishing the information on April 12th. There are plenty of horror stories of security researchers getting threatened by companies, and so I hoped that by keeping my identity a secret, and having an EFF attorney notify the company about the flaw, that I would reduce my risk of trouble.
While I want to praise the company for being willing to clarify the security statements made on its website, I hope this will be a first step on this issue, and not the last.
I also urge the company to abandon its deduplication system design, and embrace strong encryption with a key only known to each user. Other online backup services have done it for some time. This is the only real way that data can be secure in the cloud.
Nice article. Well presented. Excellant conclusions. Thanks for the conginued work...
I think for the truly paranoid, any service that stores individual files is likely to have privacy leaks, because even with encryption, comparing file sizes can tell you a lot. Storing everything in a big dropbox volume would prevent that at the cost of performance since the whole volume has to be synced as a individual file.
This is why I recommend Wuala, it's all encrypted with your user password, you lose your password, you lose your files. Tight enough they can store user's data on eachother's machines without any threat to privacy.
How does this effect users who store data from password applications in Dropbox so that it is accessible by phone, laptop, etc.? I store my 1password data in Dropbox, should I be worried? If so, are there any good alternatives?
Spideroak seems to save space the same way as dropbox. Here's a snip from their website...
Storage Redundancy Savings
Have two copies of the same file? In your SpiderOak account, the 2nd (or 3rd or...) copy doesn't use any more space. Or maybe there are instances when you have a folder with 10 or 20 different "renamed" versions of a similar file as you worked on it over time? SpiderOak internally detects the redundancy in these situations and saves you online storage space.
How do they know what's redundant if it's encrypted?
@Anonymous: Spideroak can simply get a hash of the file encrypted with your password, it would not be a problem if it's the same file on your own account.
"However, the company does in fact have access to the unencrypted data (if it didn't, it wouldn't be able to detect duplicate data across different accounts)."
I don't think you've established that this is necessarily true.
It is possible they hash the file prior to encrypting it, then encrypt it. Afterwards, you simply compare the hash of a newly updated file to the stored hash.
Chris, we dealt with this in the Tahoe-LAFS project.
It is NOT a valid assumption that deduplication requires the keys to be known. Tahoe used a method called convergent encryption to achieve exactly this property, and it does not require the storage provider to have the keys.
You are correct that there is a confirmation attack created, and for this reason we ended up adding what we called a 'convergence secret', a per-user salt that eliminates that attack.
Each user still gets the benefit of deduplication within their account (so backing up the same thing is fast), but there's no confirmation attack against other users.
Ping me and I'll be happy to explain this at length.
"However, the company does in fact have access to the unencrypted data (if it didn't, it wouldn't be able to detect duplicate data across different accounts)."
A) One-way encrypt file 1 and file 2.
B) Compare encrypted versions of files 1 and 2.
C) If encrypted 1 == encrypted 2 then the source files are == too.
Some password databases/comparisons work in a similar fashion. That doesn't mean that you can easily get to the source information without cracking the encryption.
I'm no security expert, just positing a theory that your conclusion isn't necessarily based on fact or even a well structured argument. I'm not saying it isn't true, but this portion of your argument is less than compelling, which makes me less inclined to read the remainder of your post.
Wait, i retract what i said, dropbox would still need to be able to decrypt the data.
In the passed I raised similar security concerns about dropbox on twitter. Immediately I was contacted by a dropbox representative which was very open to me about the way they secure the data. And yes, data is encrypted over tls while transferred and yes dropbox will encrypt the data on s3 storage. So yes, dropbox can access your files but the emplyees are off course not allowed to do that. If it needs to be more secure you should encrypt the data yourself before transferring.
7-Zip will compress and password protect files - and then upload to Dropbox.
I'm confused. How can I reproduce this attack? Aren't the hashes secure hashes, SHA-256? Has SHA-256 been cracked? How can I get a stranger's data. I tried to packet sniff but it looks like all the dropbox traffic from my computer is encrypted over SSL. Can you help me?
Simply put, if dropbox has the ability to recover your lost password, they have to save it somewhere on their servers and therefore technically *can* open your files.
Wuala technology is superior and far more secure:
It seems SpiderOak does not deduplicate data [ https://spideroak.com/blog/20100827150530-why-spideroak-doesnt-de-duplicate-data-across-users-and-why-it-should-worry-you-if-we-did ] but Wuala does [ https://bugs.wuala.com/view.php?id=3339 ]. Bad new for me as a vivid Wuala user..
SpiderOak only deduplicates YOUR so the only data that is compressed is is your personal data.
We do not deduplicate across accounts by choice and by design since its impossible to deduplicate already encrypted data and we do not have the passkeys to decrypt it.
Just wanted to chime in from Dropbox.
We understand the concern that the government could try to guess whether a particular file has been uploaded to Dropbox based on processing times and then request that Dropbox identify a user who has access to that file. However, to seek user content information, the government needs to comply with the provisions of the Electronic Communications Privacy Act by obtaining a warrant supported by probable cause (or in some cases a court order from a judge). Those safeguards protect user privacy. De-duplication does not make users any more vulnerable to intrusive government actions. Today, a government agency could ask any online service to provide the names of all users who have a particular file, whether or not the service employs de-duplication. And in that case, the government would also need to support its request with a warrant or court order. The rules that provide a check against unwarranted government snooping apply to online services equally, regardless of their back-end architecture.
if you want security just encrypt your files - and no more deduption will be done, although your files will take longer to upload.
anyway, there is no way that a hash could be made without having the complete file, and it is pretty obvious a 750Mb takes 5 seconds to upload that it is a dupe, no need to deep analyse it or use a packet sniffer for that lol.
and if it is a 1 way process, then it is not a 1 to many relationship - there is no way that a file, could link back to users only users could have link access to files - many to 1. even if that were so, unverified free accounts are unable to be traced back to their owner.
Mmmmm yes if I was writing dropbox I would have done the same thing. Especially if I am paying the bills. Git works exactly the same way in terms of hashes. It is a reasonable way to handle duplicate content.
It almost sounds like you are suggesting they drum up some fake traffic when the hashes match... pretty sure that is a bad idea. And also runs up their bandwidth costs. Maybe on paid accounts?
Cloud storage is that. Storage on another persons machine. Dropbox is dead freaking simple and it just works. For me it is a great interface and if they can save some bandwidth based on a hash that has already been uploaded then more power to them.
what you asking for is a deniable file system, in which, you store something, but there is no evidence to show that you are the one who stored it.. This is simply not the goal of Dropbox.
Sure, everyone wants more security, more privacy protection. Then, encrypt your damn files before using a service like Dropbox.
Whether my files are encrypted at rest on Dropbox's server or not is of less concern to me than that they are encrypted as the travel over the wire, or through the air. (i.e. SSL)
But I can see how one might be more paranoid if they were sharing child porn or even pirated movies in their Dropbox.
In which case, if you're going to deal with contraband, I wouldn't depend on a free service to keep you safe from the big bad feds?
At Lockify.com we're using encryption/decryption on the client (and a variety of verification methods) to show and ensure your private communications.
Mention this post and we'll get you early access to the private beta.
I don't understand why this deduplication makes any difference, considering that Dropbox has all the keys.
Is it not the case that Dropbox has access to the encryption keys that protect the user data anyway?
This must be the case because there is a "password reset" operation that you can go through to get access to your files again in case you've forgotten your password. This implies that Dropbox itself has the power to use that same process to get access to your files without knowing your password.
In which case the following set of people can read or alter any of the user's files:
An employee of Dropbox acting according to company policy, an employee of Dropbox acting illicitly, a law enforcement agent who persuades or compels Dropbox to do their bidding, an intruder who illicitly gains access to Dropbox's servers.
In addition, anyone who can bribe or coerce any of those people, steal the laptop or phone from one of those people, or gain access to one of those people's computers (such as through malware) also gains the ability to read or change or delete any file of any user.
In light of this, I don't see why it matters whether any of those people *also* have the ability to detect duplicate files using this hash comparison. They already have the ability to read all of the files contents.
" If the company didn't have access to the encryption keys, it wouldn't be able to detect duplicate files."
This is not true! Before the encryption on the client, dropbox determines the hash of the files. So the server stores the encrypted file and its hash (of the unencrypted content).
It can determine the duplicated file only with the hash and it doesn't have to use the key to decrypt the file.
But sure, it would be more secure if they didn't have the encryption key ;-)
But true, if they have to serve the same file to different people they have to know the encryption key.
Your paranoia is ok for DropBox dedeuplication. But we know that they know our encryption keys and we have only to evaluate if the service justifies the loss of privacy and security.
But you're right.
I understand the concern here, but at the same time I prefer the faster synchronization. Perhaps an advanced option to disable this feature for an entire account and leverage a more secure process (ie less data leakage) would solve the concern of those who place a premium on privacy.
As Zandr said, we try to get the best of both worlds in Tahoe-LAFS by combining convergent encryption with an added secret.
The reason we invented the "added convergence secret" was not merely due to the "confirmation of a file" attack, which is what your report is about, but also a subtler and potentially more dangerous attack called the "learn the remaining information" attack.
The intuition behind the "learn the remaining information" attack is that if you give someone the secure hash of your data, and if they can perform, let's say, 2⁵⁰ computations (or buy a rainbow table with 2⁵⁰ entries), and if they can know or guess all but 50 bits of the contents of your file, then they can brute force the remaining 50 bits by comparing against the secure hash that you gave them.
For example, if you receive a PDF document from your bank which contains pages of pages of boilerplate, plus your name and account number and current balance, and you store that on a cloud storage provider that does convergent encryption, then the attacker can try different names, account numbers, and balances (assuming that he is a customer of the same bank and knows the contents of the boilerplate).
For another example if you set up a Linux server and put your MySQL password into /etc/my.cnf and then back up /etc/my.cnf onto a cloud storage service that uses convergent encryption, and attacker now gets the chance to try to brute force your MySQL password. (Of course if your password is long enough they will still fail, but some people rely on the fact that attackers cannot attempt large numbers of guesses (like 2⁵⁰ guesses) against your MySQL password. By using convergent encryption, you may be unwittingly giving attackers that opportunity.)
Our discovery of the "learn the remaining information" attack was due to Drew Perttula, who thus became the second member of the "I Hacked Tahoe-LAFS!" Hall of Fame:
The solution (originally suggested by Drew) is to add an "added convergence secret" which gets securely mixed into the secure hash before that secure hash is used as an encryption key. Each time you set up a Tahoe-LAFS client it generates a new random added convergence secret and stores it in that Tahoe-LAFS client's configuration directory. This means that (like Spideroak according to Daniel Larsson's comment above) you automatically deduplicate files with yourself but not with anyone else. It also means that nobody can perform either the confirmation-of-a-file attack nor the learn-the-remaining-information attack on you.
If you choose to share your added convergence secret with someone else, then you gain automatic deduplication of your files with their files, but you are *still* not vulnerable to either of these two attacks from your Tahoe-LAFS storage provider nor from anyone else in the world except that person whom you shared your secret with! So this is an interesting trade-off between security and efficiency and is arguably a better trade-off than any other deduplication offering that I have seen.
If you choose to set your added convergence secret to a guessable or widely known value such as the empty string, then you gain automatic deduplication of your files with anyone else who set theirs to the same string, but you are also vulnerable to these two attacks by anyone.
Thanks for your attention.
This de-duplication feature has saved me lots of bandwidth. Unless one is using dropbox to share illegal files, they shouldn't be worrying about that or the feds.
As far as passwords or other personal documents goes, you can't expect someone else to have that, so no duplication fear, and it is encrypted during transfer.
"Many users and even the technology press will not realize that AES-256 is useless against many attacks if the encryption key isn't kept private."
True. In fact we are doing our best developing this tool: http://tomb.dyne.org
Zooko wrote: "In light of this, I don't see why it matters whether any of those people *also* have the ability to detect duplicate files using this hash comparison. They already have the ability to read all of the files contents."
It matters to the extent that anyone, not just those people you listed, can do an online plaintext confirmation / partial-information attack.
I wrote, "it matters to the extent that anyone ... can do an online plaintext confirmation ... attack."
However, without the keys they wouldn't know which dropbox user had previously uploaded that file, unless this attack was combined with traffic analysis of uploads.
The article started very well, but is misleading in many ways.
1. Encryption , access to data and access to encrypted data are three different things.
In terms of deduplication, a common practice can be - to analyse the bytestream and replace the chunk of data.
As far as Private key and encryption is concerned, As they state: No one can access the (Legible) Data, so unless you have the Private key, you can not access the legible data.
What exact algorithm they have used to analyse the stream, encrypt, replace, manage file system etc. is a Proprietary secrtes and better that way.
But as far as feasibility - It is possible to deduplicate AND keep your data safe/encrypted at the same time.
So no need for another Privacy Paranoia.
DropBox is a superb service, and innovative in some ways as well.
Of course they have to have access to the encryption keys! Not necessarily to DETECT duplicate files, but to STORE a single copy of a duplicate file (the whole point) and allow multiple users to access it. Think about it!
I'd be more worried if I were storing something I'd be afraid to allow others access to, but I'm not. The only truly secret data I keep in my dropbox is my passwords, and the file they're in is already encrypted.
"However, the company does in fact have access to the unencrypted data (if it didn't, it wouldn't be able to detect duplicate data across different accounts).
Just need to save a short hash of the data, How do you write this If you didn't known how it works?
I think there is a lot of confusion here...
Given a file from a first user, you can either encrypt yourself and/or let DropBox do it for you. DropBox than stores it somewhere with the encryption that it is always there (how can you recover it if DropBox would not have it in first place?).
Now, given another file from a second user, the only way to guarantee that this file is the same as the first one from the first user is to compare sizes, names AND contents, bit by bit. No hash codes over the files can guarantee that two files are identical regardless their lengths (unless the lenghts of the files are smaller than the length of the hash codes).
Therefore, whatever DropBox would do with my files, it has to have my encryption keys with the risks and advantages associated to it.
On the other hand, what is the real benefit of deduplication as it is a complex technique?
Compression and other techniques would be far more effective and fast than this while keeping the privacy.
When I signed up for Dropbox, I didn't provide a crypto key at any time. I didn't have to move one between systems that use my Dropbox account. Therefore, I knew that any encryption that went on was on their end, and they could read any of my files, without any further analysis.
I don't use it for anything I'd consider sensitive. Unless I did the encryption, and managed the key myself, that would be foolish. I use it to hold some of my personal projects that I might want to work on from several places, and if somebody examines them it won't bother me much.
I'm surprised someone so smart doesn't assume that anything they put in the "cloud" will get turned over to law enforcement if a warrant is produced. Same goes for putting something dodgy on such a service.
Honestly, what else did you expect?
Well, there is file level deduplication and block level deduplication. If they are able to detect only a portion of a file has changed and then only upload that portion, they must be doing block level deduplication.
When deduping and storing blocks, the hash of that block is used as an index. When they store that block they encrypt it, not the entire file.
Granted, they would use the same key to encrypt all blocks. However, with that said, from a storage perspective, even if you unencrypted all the blocks, you would have to know how to reassemble the blocks into actual files.
That information could be stored elsewhere and be unique to each user account. That information could then be encrypted by a unique key generated on user account creation using the user's password.
I for one appreciate the reduced bandwidth block level deduplication provides and would hate to have to encrypt and upload the whole file every time I edited a single line. It would make the service impracticable and expensive.
Question... Why cant Dropbox, do the file./hash compare once they have obtained the user encryption key after the outside user has logged on? This is a possibility, right?
While it is not clear if the company is using a single encryption key for all of the files users' have stored with the service, or multiple encryption keys, it doesn't really matter (from a privacy and security standpoint), because Dropbox knows the keys.
When I was evaluating Dropbox for business use, I investigated whether or not Dropbox allowed defining your own encryption key. At that time I discovered they use a single encryption key for all files for all users. This may or may not be the case now. Conclusion: don't use Dropbox for sensitive data unless you encrypt it ahead of time.
You get what you pay for.
Another service worth checking out is JungleDisk.
Private crypto keys:
Settings for different types of authentication (see Derek Newton discussion)
Think about it... how could a 750 MB upload even deduplicate on the server and still upload in 5 seconds? You would still have to upload the file. Deduplication is done on the client and is therefore safe.
Dropbox obviously is able to decrypt your data, the web interface is proof of that. How else would they be able to serve the original unencrypted files over HTTP(S)?
You're being totally unreasonable here. There is no reason at all for users to expect that dropbox doesn't have access to their data. The way dropbox stores data is exactly the way I expected it stores data. If a dropbox-like system did encrypt data at rest, it would be advertised as a feature.
You can't expect every piece of software to be suitable for every purpose. Dropbox is for you grandma to upload her files. If she's worried about leaking the fact that the particular files exist, or that they might be subject to a search warrant, then she's got unusual enough security requirements that she should do a lot of reading and think damn hard about what she's doing before uploading those files.
Seems that as long as your concern is only bandwidth, and not storage... there would be no issue. First time a file is uploaded, save two copies to the server and encrypt one with the user's key, and the other with the system's key. When testing a hash of a new upload, if there's a match, decrypt the system's copy, and save a copy encrypted with the new user's key.
Bandwidth seems like it'd be a much larger expensive than storage in a system like Dropbox.
There seems to be a common misconception in several of these posts regarding hashes. If two files have the same hash, they are not necessarily the same. If they have different hashes, they are definitely different.
The real problem here is that nearly everyone has unrealistic assumptions and beliefs about what is secure, and what it means to be secure. The fact is, unless the encryption is being done under your control, as close to you as possible, and unless only the encrypted form is being transmitted to the cloud provider, your security and privacy will never be absolute. The sooner and more clearly people are educated about this, in my opinion, the better.
My own assumption is that any file that ever leaves my computer is potentially visible to the whole world. (Files on my computer are also potentially visible, though a bit less so -- though that's another story.) Thus if I ever have a file that I really care to keep secret from a determined opponent -- which I generally don't -- I will use pgp or something similar to encrypt it on my personal computer, and I will only store it or transmit it in that form, and I will guard my keys and password like the crown jewels.
We would do our users more of a service by educating them in this semi-paranoid manner of behavior than by giving them assurances of security and privacy that simply can't hold up under a court order. And that includes any form of encryption that is performed in the cloud, because the provider needs to be able to decrypt it as well, and therefore can be compelled to do so under a court order.
This is a message that no one wants to hear, so no vendors are giving it. Instead, they are lying, or at least heavily shading the truth. Encryption in the cloud is almost certainly adequate for certain kinds of secrets, such as cheating on your spouse. It is generally adequate for others, such as most corporate proprietary data. It is absolutely not adequate for anything that you want to keep from a government with applicable jurisdiction, or from serious, determined hackers.
What dropbox provides is more than adequate for most users. Those with a more stringent need for privacy -- most often because they are breaking either a just or unjust law -- need to take responsibility for their own privacy, not count on a remote, third party service to provide it.
Jungledisk is another option. JD is owned by Rackspace but will store data on either Rackspace or Amazon S3. Users create and keep encryption keys. Various options authentication options.
Great read. I look forward to sharing with other technical professionals.
Another anonymous wrote:
"There seems to be a common misconception in several of these posts regarding hashes. If two files have the same hash, they are not necessarily the same."
True, but hash collisions are a problem. See:
I have a question here. If Dropbox identifies that the file that you try to upload has got the same hash with a file they already store, are they performing a subsequent byte-by-byte comparison? Because if not, it might be possible that a hash collision will cause your file not to be uploaded. Quite apart from the security concerns that were the subject of this thread, this may mean simply that the file you thought you safely backed up is not actually there. When attempting to retrieve the file, you would be served with somebody else's file having the same hash.
Can anyone recommend any local free backup software that uses good encryption (eg private key) and then can place the backup in the Dropbox folder, for Windows and Ubuntu?
All of these cloud backup services should provide a separate and free encryption backup software and then a separate method to backup on the cloud the encrypted data
Really great article. I will not be using Dropbox without an Trucrypt volume going forward. Means losing access via my iPhone but potentially a small price to pay.
Dropbox uses Amazon S3 for storage. All data stored in S3, is stored encrypted. Both Amazon and Dropbox know this encryption key. (Though Dropbox may be encrypting it separately on top of that).
One user's files are kept separate from another's logically, probably through a database that Dropbox maintains. All files are likely stored on S3 in one bucket (think: Directory), as per-user subdirectories would not be useful with a deduplication setup, and all the sharing options that Dropbox offers.
Dropbox relies on deduplication to survive. If you were to sign up for S3, upload 2GB, store it all month, and download 4GB, it would cost you about $0.95 each month. That doesn't sound like much, but if you invite the whole free world to store stuff on your dime, that adds up. By only storing files once, you are saving lots of space and bandwidth. Further, the Dropbox client will copy between eachother within a LAN to save even more bandwidth.
Customers that buy 50/100GB plans are getting gouged compared to buying it direct from Amazon, but the value-add that Dropbox has is worth it, and they end up paying for the free users's storage and bandwidth.
This will all fall apart if every user uploaded a 2GB truecrypt volume, which they couldn't deduplicate, costing Dropbox a lot to store.
This is for Arash (Dropbox CTO):
The issue is not that we're safeguarded by law (we're not, that kind of thing has been proven to be pretty arbitrary), nor is it about the efficiency or nature of your deduplication technologies. It's that your marketing claims that my data was completely obfuscated from viewing by your employees and you've now backtracked and stated that's not the case.
That's a lie.
Arash; you are missing the point. It might be sort of true that "De-duplication does not make users any more vulnerable to intrusive government actions" - what makes users vulnerable is the fact that you, Dropbox, have the key to decrypt the users data. The point isn't that the government can demand a duplicated file. The point is they can demand *any* file, and you can provide it. The author of this article, Christopher Soghoian, is pointing out that by simply storing decrypting keys only with the user (passphrase/etc), you would protect your users from government requests. This would be a compelling feature for some users. It isn't really true to say that "The rules that provide a check against unwarranted government snooping apply to online services equally, regardless of their back-end architecture" - in fact, a service architected so that only users have the keys would be much safer from any government snooping (see Spideroak). But I appreciate that your company has finally clarified that your files are not effectively encrypted on your servers.
If I understand correctly. Deduplication can happen pre encryption, but serving that file back up to you would require DB to decyrpted the file from another user's account... Hence DB can decrypt a file and serve it to someone else on the premise that those files were identical pre-encryption.
I applaud the effort to push DB to be more transparent about their security and I feel better off knowing the data isn't protected from court order or extremely dedicated hackers, but I'm still comfortable storing my data there including my 1P files which are encrypted and unique anyway.
One point regarding the other cloud storage systems. Imagine this: court order reads "the next time use X access this data you are instructed to intercept and provide to Law enforcement the encryption key for user X's data". All is still ok as long ad that key never is transmited so the next court order cones down: you are to modify your software to deliver the encryption key for user X to law enforcement". The point is that any time a third party is involved LE can compel them to expose your data one way or another.
It seems that the SpiderOak.com service uses a similar system to DropBox.
I use GnuPG to encrypt files before upload. It's an extra step but it ensures they're safe from snoops.
This is a concern even if you aren't sharing contraband. What if you are sharing information that is legal, but damaging to a big corporation or a government?
@Arash (Dropbox CTO):
Sounds like Dropbox has a level of trust and faith in government and the law that I don't share. With the amount of overt abuse of the law in this realm, it's hard to see how such faith is warranted.
And that is precisely why I would not use a system that you designed.
Thanks for spelling the issues out. Finally something readable to link to.
Great blog title as well.
Hi, we've just launched a sync tool, SecretSync, that provides client-side encryption for Dropbox. (In beta.)
SecretSync ensures that your files are encrypted on your computer before being put into Dropbox. It's similar to using TrueCrypt, except it's file-based, like Dropbox. I.e. you just put files in a special folder on your computer, and the encryption happens.
With client-side encryption, de-duplication can't occur. This is because AES encryption requires what's called an initialization vector, or IV. This is a 'nonce' that changes every time you encrypt. So every time you update a file, it looks entirely different. If 10 people were to upload the exact same file using SecretSync, it would literally be as 10 unique file signatures.
If your data is that sensitive, you should be encrypting with your own keys before storing on something like dropbox. It's free / very cheap so don't use it for storing national secrets or pictures of you and that lap dancer. I don't think this is a big deal!
While people are mentioning alternatives that actually care about your privacy I thought i'd mention a site I launched this week called www.senditonthenet.com
We operate in a similar manner to wuala (in that we don't know what your password is and can't access your data) however we don't require a software download, it runs entirely in your web browser.
just to summarize and follow up on a few things:
dropbox is not the only service that controls both ends and tries to use hashes as proofs that the client has a file. connected (now iron mountain) uses the same method. the aol attachment store uses the same method.
it is risky to use a hash as a proof that the client has a file. consider the case where a piece of malware leaks file hashes to an attacker. the attacker, using their custom client, asserts they have a file with that hash value. the system says "already got it", and gives them access to the file from then on. the implication is you need to protect hashes on stored files as rigorously as you'd protect the files themselves as the hashes "stand in" for access to the contents.
this also suggests that if someone were to merely distribute hashes of popular movies, you could get them from dropbox, as it is likely to have at least one of those.
regarding hash collisions, usually file size has to compare equal but not file name, as people often change that.
if you implement this, you may want to use an HMAC rather than a standard hash, so that you are not easily compelled to compare values with those in databases of forbidden content just because you can.
[i'll be anonymous in this posting, but many of you know me...]
"Is it not the case that Dropbox has access to the encryption keys that protect the user data anyway?
This must be the case because there is a "password reset" operation that you can go through to get access to your files again in case you've forgotten your password. This implies that Dropbox itself has the power to use that same process to get access to your files without knowing your password."
Think again. Password reset will give a Dropbox employee access to a customer's file, but it will also leave the file protected by a new password that is then unknown to the customer. The employe has no way to retrieve the customer's original password to return the account to the state it was in before he gained access to the data. The customer may not know that his privacy has been compromised, but he may become suspicious when his password has been rendered ineffective.
If you are TRULY paranoid, why are you storing your sensitive data in the cloud anyway? Shouldn't it be triple duplicated locally with a remote thermite kill trigger?
I do appreciate you bringing this to light. It is important for companies to clearly display their terms of usage. This changes nothing about how I use Dropbox because nothing I want secret will find it's way into the cloud-nether.
I don't think you have idea of what you are talking about. You could easily compare a hash against another hash, to find a match, without it being unencrypted. encryption of both pieces of data are encrypted, I think you are just jealous drop-box is doing well. Many of your assumptions just show your lack of knowledge on the subject. This sounds more like smear attack against dropbox.
Nevermore: Zooko is right. The point is that if it is possible to implement a "password reset" feature for recovering from forgotten passwords, then the password must not be the thing required to unlock a file. Instead, it must be the case that Dropbox has whatever is needed to recover the file. This means, necessarily, that Dropbox can decrypt any file, and can do it without the user knowing. As an aside, this is an important tradeoff - password recovery would be impossible if Dropbox made themselves unable to comply with government requests, if passwords were the key for decrypting the file.
When I say the Dropbox service I've had a feeling that wasn't secure... and I was right! Some familiars use it and recommend it, but I guessed (and guessed right) that would not be possible to provide that service without a backdoor... at least in most country's isn't possible.
Even if you encrypt using some tool like 7-zip it uses AES-256... you need at least an 40 characters password to achieve the 256 bits of protection... and AES wasn't considered that much secure even when it won the AES contest... the best was Serpent Algorithm... but security wasn't definitely the priority... the current AES was considered the algorithm with less security margin (meaning that would probably be the one more easily attacked of all in the final, somewhere in the future).
You can still use truecrypt volumes, but it's not so "simple" to use... and most people will not use it.
And if the company can be server a court order to revealed the content's what prevent them from updating their client software to start surveillance their clients?
Any service that can compare your local contents with the ones in their systems, is an automatic red flag! If they can prevent you from sending your file because someone else have the same file, and they can "copy" to your account, double red flag! They can't have the content's encrypted, with a code that they don't know, and go their, open the encrypted file, extract the file from the other and send to your account the same file... unless they know what the secret (password) that protects the file is... or the file isn't encrypted at all!
If they can see both the contents their and in your computer... they can access your account and the files aren't really encrypted... unless the extend of that is a file that contains the hash [for example: SHA-512 with a personal salt (for privacy reasons, always different from user to user to prevent hash comparative with current databases)] associated to the account, that compare with your local one to see if they have already been upload successfully... because it can send a file encrypted and separately send and hash so the service knows that their is a new file their... even if they can't see the file it self. But they need to see the file names, and for convenience when they are sent, and how big they are... because the person wants to know what is their and what isn't. I don't know how would they do this... because to be secure would be difficult to manage.
cj said: "You could easily compare a hash against another hash, to find a match, without it being unencrypted. encryption of both pieces of data are encrypted" No, because then when the 2nd uploader tries to retrieve the file, this user won't be able to get the original without the first uploader's keys for decrypting. The optimization here does in fact mean that Dropbox has the means to decrypt any files.
I'm not a Dropbox user but AFAIK there's something installed on the machine, this piece of software might compute the hash before encryption and send it.
So the knowledge of the keys is not required to compute the hash, Dropbox can do that without it. So it's not a proof they know the keys.
But if they don't publish the key management (where are the keys, who has access etc.) it's all crap. Security based on speculation of how things work is not a security.
In today's world, no-one attacks the algorithms. It's the key management that is attacked (including dictionary / brute force / dictionary attacks).
I think a lot of the discussion can be summarised as "to anybody who knew a little bit about how encryption/security works, it was always obvious that in reality, Dropbox employees technically could read customers' files".
And that's absolutely fine in principle-- there are all sorts of centralised systems that we use in our day-to-day lives where employees and Powers That Be can read our data. Even with the knowledge that Dropbox employees could technically read uesrs' files, or that the government could force Dropbox to patch the software so that they could read them, etc., we may well come to the conclusion that the benefits of using the service still outweigh the risks. Even as somebody more able to assess the risks, I still use Dropbox for that very reason and think that it is an extremely useful piece of software when used appropriately. The point is that in order to assess this risk-benefit equation, we need to be open about the risks.
The problem in this case is that Dropbox chose to make the claim that employees couldn't access customers' data. The fact that some users with specialist knowledge could determine the falsity of that claim does not detract from the problem that to the average user, the claim is misleading.
The issue of encryption is a grey area. Dropbox, and surely lots of other services, are bandying "AES-256" about in the knowledge that this may *imply* a higher level of security than is actually the case. As has been pointed out, actually achieving 256 bits of entropy from a password is hugely unlikely with the types of passwords that most users will choose. However, in this case, it's hard to say that an actual false claim is being made: it is technically true (presumably) that they use AES-256 encryption.
"So the knowledge of the keys is not required to compute the hash, Dropbox can do that without it. So it's not a proof they know the keys."
Check the post DIRECTLY ABOVE yours.
"If Dropbox identifies that the file that you try to upload has got the same hash with a file they already store, are they performing a subsequent byte-by-byte comparison?"
They can't be doing that, as that would require the entire file to be uploaded again, and network traffic shows that it isn't.
Now, one thing is what DropBox is doing - it's a reasonable level of 'security' for a free service, and the deduplication can be a big benefit for end-users as well.
The big problem is that THEY HAVE BEEN LYING about security and privacy all along.
What if I want fast, online file storage and sync that doesn't waste precious bandwidth and storage uploading files that have already been stored?
I deliberately use Dropbox for non-private information, such as university coursework. As such it is faster and cheaper to use than other software I use for more private data eg Wuala.
I agree that it would be nice to have people well informed that their data could be accessed by government agencies with a warrant, but it is the economies of scale that data-deduplication affords that allows Dropbox to provide me with a service for next to nothing.
Any thoughts on the security & reliability of CrashPlan? I love the service, and the fact that you can also backup "free" to other computers you own, or to friends who want to share space with you. The pricing plan is also in line with SpiderOak, but obviously CrashPlan is backup-only (no sync, etc).
Would be interesting to hear how you judge the security of TeamDrive (www.teamdrive.com). Does it go beyond the security of SpiderOak?
Also problematic is the fact that Dropbox's CTO does not know how to spell a one-syllable word like "whoa."
How can I trust a man to encrypt my data if he cannot spell as well as I could when I was in second grade?
You can not assume that because deduplications is used, that the files are not properly encrypted.
All you need to do, is to derive the encryption key from that data in the file. So each file is encrypted with its own key. But if more people have the same file, they will encrypt it the same way, and dedublication will work.
The point is that Dropbox made misleading claims about their security. Fact.
The point is NOT that you should know better than to believe a misleading claim. If you know better, good for you. But that does NOT relieve Dropbox of its obligation to make accurate claims about its security practices.
Dropbox has revised its claims so that they are now more accurate. That is good. But many users signed up for the service, based on a flawed explanation of that service. This is not insignificant.
The point is NOT that Dropbox is a bad service. Dropbox is an excellent service. It just does NOT provide exactly the level of service that it claimed to provide.
CableCat says "if more people have the same file, they will encrypt it the same way" - but how do they decrypt it? Do they need any special information, like a password, to decrypt it? The point is that with deduplication, all parties "owning" that file have to be able to retrieve it. This means that passwords must not be necessary to decrypt files, since everyone only has their own password. So we can make the assumption that dropbox has the means to decrypt any file without the user's permission or knowledge.
And pray tell, where did Mr. Borenstein purchase the crystal ball that told him that people who protect their stuff "most often do so because they are breaking a law". Speculative nonsense.
The issue here is that they made a claim that simply cannot be true, and so it would be better if DropBox just retracted the comment.
I don't have a problem with how dropbox operates, whether the file is encrypted and the user password stored so the dropbox framework can decrypt content on the basis of dedupe, that level of access is acceptable to me..
I would imagine in the instance of duplicated files, DropBox would decrypt the initial uploaded file, and re-encrypt it with a generic shared hash which is associated to all the other users, and I assume this shared hash would encrypted to each users own password so atleast at the raw user database level no two hash keys are alike - even for duplicated files..
Also, most file systems have some kind of user access control which even at root level requires full access to the physical data in order to serve it. So just like on your corporate storage for example, you'll have standard users unable to access certain areas of your storage, but system level as well as administrators would be able to access all areas.
I would atleast hope DropBox only allow root access to physical data and it's decryption at Platform level and only allow the core platform at the content owner real world access to it, meaning DropBox staff would NOT be unable to access the physical data because the user access layer in the DropBox architecture..
Nice way of pointing out the completely obvious.
Of course Dropbox has access to the contents of the files - you don't just need to analyze the de-duplication feature to see that.
The public files link feature and the automatic photo galleries make that clear.
I find it bizarre that you're kicking up such a fuss about something that most people knew about years ago. I guess you're just trying to make a name for yourself, but what a curious way to go about it.
The publication of this blog has done a great disservice to the shareholders at Dropbox and to the good citizens of the world.
The only users who truly need to fear the Privacy issue are those that are breaking the law. I for one would be happy to see these people's privacy being broken by the relevant authorities. To me then the chance that child-porn and other illegal activities are placed at risk is a feature, not a bug.
Dropbox offers an excellent service, and provides as good a security level as can be reasonably be expected from cloud storage. It would be a shame if their profitability was hit just so that criminals could have their lives made easier.
But you got to pull readers into your blog, haven't you.
Wuala is a good swiss alternative (free for the first GB) develloped by a swiss University. With Wuala all your files get encrypted on your computer, so that no one - including the employees at Wuala and LaCie - can access your private files. Apparently the servers are in Switzerland, France and Germany.
I disagree with your assessment. The real issue is not with the government or rights-holders being able to subpoena incriminating evidence, it's with Dropbox employees having access to sensitive personal information. When I signed up for Dropbox, they claimed that their employees had no way of accessing my data. Now I come to find out from a third party that the only thing preventing them from doing so is company policies. This is unacceptable. I replaced the "My Documents" folder with Dropbox and as such have many documents containing account numbers, SSN, etc located on their servers. Even if the employees follow their policies, that's no guarantee that someone couldn't access my sensitive info if the employee's login info or laptop were stolen/compromised.
I will be researching how to encrypt all my data to Dropbox. I know of TrueCrypt, but did not find it very user-friendly when I tried it in the past. Does anyone know of any alternatives?
I'm curious as to how you KNEW they didn't md5sum (or similar) on the user's computer and then check for a similar sum (and maybe even size) on their servers. They wouldn't need the encryption key if that was done, but I see nothing addressing that fact.
@dimecadmoium: It's difficult to tell exactly what you are suggesting. I suppose you're right, that if it made sense for Dropbox to trust the hash generated by the client, then they would know they already had the file. However, that wouldn't matter much, since they would need the encryption key of the client that really uploaded it in order to send the file back to the other, new client.
If dropbox deduplicates files on their servers, it doesn't mean they have plaintext access to them.
Dropbox could simply compare hash values to determine if two files are equal, without knowing anything about the content. If the hash value is provided by the dropbox client within the upload, but before encryption, files in the cloud ARE encrypted.
"If the hash value is provided by the dropbox client within the upload, but before encryption, files in the cloud ARE encrypted."—But, then if the 2nd uploader requests the file, dropbox would have to decrypt the file before sending it back, since the 2nd uploader would not be able to decrypt a file encrypted by the 1st uploader.
I didn`t know that they can sacrifice users privacy, how can i find out if my files have been watched?
I took a look at Wuala for comparison. If everything is encrypted, and their people can't access the files because they don't have the keys- then how does sharing work?
Does someone you share a folder with have to have your key?
Regarding Wuala: I'm speculating here, I haven't use Wuala. Wuala stores the keys used for en/decrypting each file. Those keys are encrypted somehow with users' passwords.
When you share a file with someone else, it's conceivable that Wuala might use public/private keys to encrypt the file key with the public portion of the other party's password-key without knowing that user's private password. Maybe that private key is encrypted with the user's password? Just a guess.
Note that this does not require Wuala to store all the secrets required to decrypt anything after this operation is completed, so they still would be unable to do the deduping that Dropbox does. Also, it seems you can not recover a lost password with Wuala.
It says: Many users and even the technology press will not realize that AES-256 is useless against MANY ATTACKS if the encryption key isn't kept private.
Well, many users and even the technology press will not realize that AES-256 is useless if the encryption key isn't kept private.
Dropbox installs a client on your computer... so they have access to the plaintext version of your file BEFORE it gets transferred.
It is stupid easy for the client to hash the plaintext file, send the file size and hash to the server to see if it's a duplicate of an existing file, and if it is just store your filename and a reference to the existing file on the server.
This wouldn't be a performance improvement if dropbox had to go and decrypt everyone's files just to check if your latest upload is a duplicate. So clearly dropbox is storing your data encrypted, but an index of everyone's data is used for the de-duplication. Since file names are irrelevant to de-duplication, that index probably does not include filenames. But this has nothing to do with privacy, because the public does not have access to the index and, as someone else mentioned, if the government has a warrant they can search or seize your information on the dropbox servers regardless of their performance optimizations.
If an attacker is watching my dropbox connection, de-duplication means that it's HARDER for an attacker to guess which files I am uploading, since duplicate files of any size become a small fixed size upload - they look the same to an attacker. And if I am uploading unique files, the attacker cannot know what they are anyway, unless he watched me download them from the web... in which case, he already knows that I have them without the use of dropbox.
Its possible to deduplicate without knowing the key. Before uploading, calculate a hash and compare it with existing hashes. To store a duplicate, store a link to the original copy and encrypt the link with the user's password. Or is there a flaw in this?
I like this article for attempting to educate people about a possible risk. However, the article is wrong in some key ways. In addition, besides simply educating people, there is no point to this article.
It has been mentioned several times in other comments that it is possible to have deduplication and still maintain encrypted data, so no need for me to reinvent the wheel.
Overall though, does it matter? For starters, deduplication only works on common stuff, like an mp3 file, movie, etc. Do you think the unique hash of your personally written word document matches anyone else's word document in any way? Or any other file for that matter? So the only thing that actually gets deduplicated, is stuff that is not private at all. Secondly, if you have stuff that really does need to be so secure that you care so much, why are you even considering a cloud provider like dropbox without adding you own layers of security in?
So with the way deduplication works, is it simply deleting the duplicate copies of a particular file? Or is it keeping, say a fifth copy, of that file uploaded in a user's Dropbox account, but decrypting that file and assigning the encryption key to another user?
A few recommendations:
1 - Do not rely on 3rd parties for your only backup - take a local copy. Dropbox could go into administration tomorrow - where is your data or their comforting words then?
2 - If you use Dropbox for filesharing - e.g. on several devices, it is not a good backup. What I mean is it doesn't meet the true definition of a backup e.g. a secure copy of important data. What happens if your PC has disk corruption on your dropbox folder during a disk check when it boots up? The corruption is duplicated to the server and all your devices when dropbox starts. A backup should ideally be be separate from normal day-to-day data.
3 - Do not trust seriously private data to any 3rd party. Encrypt it locally with the strongest encryption you can, and if it is that private, save it to an external disk and put it in a safe.
The old methods still work.
An old adage in privacy and security is "Trust No-one".
Post a Comment