slight paranoia: data protection

Showing posts with label data protection. Show all posts

Monday, July 09, 2007

Astroglide Data Loss Could Result In $18 Million Fine

[scroll way down for a spreadsheet containing numbers of Astroglide requests per state]

Executive Summary

In April 2007, Biofilm Inc. accidentally published on the Internet the names and addresses of over 200,000 customers who had requested a free sample of their popular sex lubricant Astroglide. This blog post highlights the fact that the leaked data could serve as highly effective bait for targeted phishing attacks and other kinds of scams. A full breakdown of numbers of requests for each state are released. These numbers are then used to estimate potential fines against Biofilm should state Attorneys General wish to get involved.

Introduction

Privacy is a strange beast. It is one of our "rights" least well defined and protected by the law. The U. S. Constitution contains no express right to privacy. Likewise, data protection is something that has yet to be properly addressed by US law.

Consumers regularly surrender their personal information to random strangers in return for t-shirts and teddy bears as credit card sign-up bonuses. Similarly, many consumers permit the tracking of individual items in their supermarket purchases by companies in return for modest discounts or "points" through loyalty schemes.

Data protection and privacy become far more important when they relate to personal and sexual information. Most consumers would probably be more concerned about someone else gaining access to the order info for ther their Good Vibrations (an online seller of marital aids) account than for their past book purchases from Amazon.

Likewise, when Congress rushed to pass extremely pro-privacy restrictions on the release of video-rental records in 1988, it was not because they were concerned about tabloid journalists learning how many times a particular Senator had rented Citizen Kane.

A Slippery Problem

The main subject of this blog post relates to a data loss/accidental release by a California company named Biofilm, Inc. They are the makers of Astroglide, a popular sexual lubricant.

For most of April 2007, a database of names and addresses of individuals who had requested free samples of Astroglide was inadvertently left unprotected on the company's website. In addition to random visitors being able to access the database, Google's search engine spider software made copies of the database - cached copies of which continued to be available online from Google's site for more than a week after Astroglide removed the data from their own website.

Within hours of Wired News picking up the Astroglide story, fellow Indiana University PhD student Sid Stamm and I began frantically downloading all the data from Google's cache. The leaked Astroglide database contains the names and addresses of individuals who requested a samples between 2003 to 2007. With a bit of effort to clean out duplicate entries, we soon had a database of just over two-hundred thousand unique names and addresses.

I've been struggling to come up with an interesting, useful and ethical way to use this data. While the obvious Yahoo Maps mashup is amusing (and scarily mind blowing), it's just not fair to the people who gave Astroglide their data in good faith. They do not deserve to have their privacy violated and abused more than they have already suffered. The screenshot posted at the top of this blog post is real - but out of respect to the people in the database, I will not be putting the mashup online.

More Than Just Embarassment

There is almost no chance that the Astroglide data could be used to steal someone's identity. Unfortunately, the data loss laws passed by the various states only really have identity theft in mind, and so they did not kick-in in this incident. This is primarily due to the fact that the data that was exposed does not match the strict definition of PII (personally identifiable information), as in this case, no social security, credit card or other account numbers were revealed.

Adam Shostack is quite vocal about his belief that data breaches/data loss incidents are not just about identity theft. He writes that "[Data Breaches] are about honesty about a commitment that an organization has made while collecting data, and a failure to meet that commitment."

My immediate reaction and concern when reading about the Astroglide incident was, "how embarrassing." Yes, it would be quite unpleasant for the people in the database if their colleagues, friends and attendees of their church learned that they had requested a sexual lubricant. Having this information come up in a Google search for the person's name could even pose a problem during some job interviews.

The Astroglide incident is bigger than just the issue of embarrassment. The smallest bit of information about an individual can serve as a vehicle for targeted phishing and other kinds of fraud. I discussed this with Prof. Markus Jakobsson and he came up with two fantastic examples of scams that could use this data.

A version of the spanish lottery scam with a spear phishing touch: A would-be phisher could send a postcard to each name on the list, advising them that since they are fans of the product, they were enrolled in an online lottery - and that they have won. All that they need to do is to go online to claim their winnings.

A class action version of the Nigerian 419 scam: A swindler could send a postcard to victims, notifying them of the data loss, and stating that they have been invited to join a class action lawsuit against Biofilm/Astrolide. The victim would be told that they will receive several hundred dollars as part of the settlement, and all that they need to do to claim their share is to fill out the postcard with their banking details and send it off.

These and other similar attacks would be much easier (and cheaper for the attacker) if they could be conducted by email. Turning each of the 200,000 names addresses into a valid email address is not an easy task - thankfully. This at least raises the cost of any attempted scam to the cost of a stamp for each potential victim.

A few months ago, I highlighted an incident at Indiana University where phishers were able to obtain a list of valid email addresses for IU students. They were then able to use this list, which consisted solely of users' names and email addresses to launch a highly successful spear phishing attack against the IU Credit Union.

Likewise, my colleagues in the Stop Phishing Research Group at Indiana University have conducted several targeted phishing studies that have clearly demonstrated the impact that of even the smallest bit of accurate information on a user can have on the effectiveness of a phishing attack. Simply put, Anything that is known about people can be used to win their trust. Such insights are used to improve consumer education in the recent effort www.SecurityCartoon.com.

I suspect that most phishing attacks against credit unions and small regional banks already involve some form of data breach/loss. The economics of phishing simply do not add up otherwise - a phisher would be far better off claiming to be Citibank/Chase if they are sending out an email to 3 million randomly collected email addresses. I predict that we'll see a lot more of these kinds of phishing attacks. Although, due to the fact that notification won't be required in data loss incidents where social security or credit card numbers are not lost, the public will not be told how the phishers got their target list.

Phishers are constantly evolving their techniques. As in-browser anti-phishing technology becomes the norm, and spam filters mature, we will likely see a shift towards more targeted phishing. These attacks involve far less email messages, and are thus likely to better stay below the radar of the anti-phishing blacklist teams at Google, Microsoft and Phishtank. While data loss/breach incidents involving social security numbers of course pose a identity theft risk - the risk of this information being used for phishing and other scam attacks is currently being completely overlooked.

The solution to this, of course, is to amend the data breach/loss notification laws to apply when any customer information is lost or released to unauthorized parties. Companies will fight this, citing the high cost of notification and a desire to avoid needlessly worrying their customers. The laws will stay the same, and phishers will laugh all the way to the bank.

Could Biofilm/Astroglide be fined?

Contrast the Astroglide data loss to a completely separate yet similar incident:

Between August and November of 2002, the order information (name, address, items purchased) for over 560 customers was available to any curious visitor on the website of American underwear retailer Victoria's Secret. This was due to a web security snafu, which was soon fixed after it was reported. The following year, New York Attorney General Eliot Spitzer negotiated a settlement with Victoria's Secret, in which the company agreed to pay the state of New York $50,000 as well as notifying each customer whose data was inadvertently made available online. The New York Times had a full write up of the story online.

I think it's really useful to compare the two different cases. In both, data was accidentally put on the Internet. Neither dataset contained credit card numbers, social security numbers, or what we would usually think of as PII. As such, the various state data breach/loss laws didn't kick in.

However, while the data lost (name and address) wasn't particularly sensitive - after all, in many cases, it can be looked up in the phone book - it is the combination of that data with a highly sensitive and sexual product which would give the average consumer a legitimate cause for concern.

Victoria's Secret agreed to notify every customer whose data was accidentally put online. Astroglide has not told a single customer. Victoria's secret agreed to pay $50,000 to the state of NY for about 560 customers, although only 26 of them were actually NY residents. Astroglide has not paid a single penny to any state as a result of this incident.

I think Biofilm should be held accountable for the accidental publication of the names and addresses of 200,000 customers. To remedy this, I have spent quite a bit of time over the past couple weeks filing complaints with numerous state Attorneys General, including the notoriously pro-privacy AGs in California and New York. I have filed a complaint with the Federal Trade Commission. A few hundred overseas consumers tried to get Biofilm to send them a sample by airmail. Thus, I'm working with The Canadian Internet Policy and Public Interest Clinic to file a complaint with the appropriate Canadian authorities. I've also already filed complaints with the data protection agencies in the UK, Ireland, Belgium, The Netherlands and Finland.

A wise lawyer has informed me that the ultimate way to kickstart things is to find a California resident victim, and have that person file an action under CA Business & Professional Code 17-200. My name is not in the database and I do not live in California. Furthermore, I do not feel that is would be ethical to go through the list of 17 thousand California residents, looking them up in google, hopefully finding an email address, and then contacting those individuals to ask them to file a complaint. Thus, as much as I'd like to get a CA Business & Professional code complaint filed against Biofilm, my hands are currently tied.

There are two ways to judge the cost of data loss per customer for Victoria's Secret. $50,000 divided by 26 New York residents equals approximately $1925 per customer. However, given that no other state fined Victoria's Secret, it is probably safer to divide the $50,000 fine by all 560 customers, which gives us a fine of approximately $90 per customer.

Using that $90 per customer figure, I decided to figure out how large of a fine Astroglide could potentially face, assuming of course, that one or more state Attorneys General began investigating.

I pulled per-state stats from the database - which are broad enough that I feel confident that I can release them without putting any individual user's privacy at risk. Using state population estimates from the US Census Bureau, I was also able to calculate a ratio for the number of people in each state per Astroglide request. As much as I was hoping that KY (Kentucky) would win - I could already visualize the Fark headline - North Dakota won, with one Astroglide sample request per 908 state residents. New Mexico came in "last" with one request per 2656 state residents. Analysis of what these numbers actually mean is an exercise best left to the reader.

While it may not be realistic to expect Biofilm to pay $18 million in fines, it's quite surprising that they've been able to get away without even having to notify all of their customers. My hope is that by putting this limited bit of information online, I can hopefully start a debate on this issue.

Conclusion

This blog post will hopefully raise the profile of the Astroglide data loss incident, which unfortunately disappeared from the headlines after a day or two without Biofilm being held accountable for the massive breach of customer trust. It should also highlight the fact that once data has been cached by Google, putting the proverbial genie back in the bottle is next to impossible. If two PhD students can pull a copy of the database from Google's servers, so can malicious parties, including would-be phishers. It is perfectly reasonable to expect that multiple copies of the database were downloaded before Google heeded Biofilm's request, a few days later, and removed the data from its cache servers. Likewise, it is quite reasonable to expect that at least one of the downloaders has criminal intentions - or at least a willingness to sell the data on to others.

Consumers in the database face more than just embarrassment. To minimize the risk associated with phishing and other scam attacks, Biofilm should be forced to notify each of the 200,000 + exposed individuals. The take home lesson from all of this, is that these kinds of data loss incidents will continue to occur in the future and it's highly unlikely that consumers will be told. Existing data breach/data loss laws have been narrowly focused to target the threat of identity theft, a noble goal, but by no means the only threat that consumers face. These laws should be amended to correct this problem. Consumers have a right to be told whenever their information is inadvertently released to unauthorized parties.

Tuesday, June 26, 2007

Go Fish: Is Facebook Violating European Data Protection Rules?

Update: Facebook has fixed the problem. More here

Executive Summary

Using nothing more complex than an advanced search on Facebook's website, an interested person can learn extremely private pieces of information (sexuality, political leanings, religion) that are stored within another user's private Facebook profile.

Users of Facebook can modify the privacy settings for their profile. This will restrict the public viewing and only permit a person's immediate friends to view their profile. While Facebook does allow users to control their profile's existence in search queries, this second preference is not automatically set when a user makes their profile private - and thus many users do not know to do so.

Users cannot be expected to know that the contents of their private profiles can be mined via searches, and thus, very few do set the search permissions associated with their profile.

It is clear however that users intend for their profiles to not be public. A large number of users have gone to the effort to restrict who can view their profiles, but many, unfortunately, remain exposed to a trivial attack.

The Attack

The attack is very simple. For a specific target, one must simply issue an advanced query for the user's name, and any attribute of the profile that one wishes to search.

For example, I've created a new profile in the name of "Chris Privacy Soghoian", who is socially conservative, a Catholic and lives in London, England. His profile privacy has been set so that only his friends may see his profile. Random strangers should not be able to learn anything about the profile - they cannot click on it or view the profile's information.

By issuing an advanced search request for Name: "Chris Privacy Soghoian" and Religion: "Christian - Catholic", one can learn if the profile for that user has listed Catholicism as his religion. Note: To be able to find this profile, you need to be signed in to a facebook account that is a member of the London, UK network. Anyone can join this and other geography based networks, but you must do so first before searching.

If a profile is returned for the search terms requested, one can be sure that the user in question has the relevant information in his profile. It is also easy to see that the profile has been set to private, as the user's name is in black un-clickable text.

Likewise, a similar search for Chris Privacy Soghoian/Buddhist would come back with no results.

This shows how easy it is to learn confidential information that users believe that random strangers cannot learn when they have set their profile to be private/friends-only.

This attack is very similar to the children's game Go Fish. It won't tell you the contents of a profile, but it will provide you with positive or negative confirmation if you know what you're looking for.

So What's The Big Deal?

I originally wrote about this attack in September of last year. I was mainly focused then with finding out the names of students who admitted (in their private profiles) to working at the local strip clubs, and of those students under the age of 21 who listed beer and alcohol in their hobbies.

Stripping and alcohol are interesting enough - and they prove to be fantastic examples when I use them as a demo of "why you need to be careful on Facebook" when lecturing students in my department. However, in focusing on things that would amuse and scare undergrads, I completely missed the hot potato: Sexuality and Religion.

I attended the Privacy Enhancing Technologies workshop last week. While there, I mentioned the Facebook attack to several attendees. A couple of the Europeans were shocked, and told me that Facebook was almost certainly running afoul of a number of European data protection rules.

Privacy is not something that the US government really cares too much - unless of course, you are a politician or supreme court nominee - in which case, they'll pass watertight legislation to protect your ahem "adult" movie rental records.

The Europeans do care about privacy. Sexuality and Religion are bits of information that they consider to be highly sensitive.. and thus, my little go fish attack is now suddenly a lot more important than it was before. Facebook's default search privacy policies may violate European Data Protection rules.

Sample Queries

The following searches will only work if you are signed in to facebook. It is easy to create an account - anyone can make one, and all that you need is a valid email address.

The queries will search everyone within all of your networks - which will include any university/school/employer that you select, as well as a geographic group. There is no proof required of your current location, and so if you wish to search for everyone in France, it's trivial to make a new account/profile located there.

All women interested in women.
All men interested in men.
All Christian men interested in other men.
All Hindu men interested in other men.
All Muslim men interested in other men.
All Jewish men interested in other men.
All Christian women interested in other women.
All Hindu women interested in other women.
All Muslim women interested in other women.
All Jewish women interested in other women.

Clicking on any one of these - at least when you've joined a decent sized network - will return a large group of people - a fairly significant number of whom have profiles that are marked private, which you cannot click on or learn more about. However, by merely appearing in the list of returned profiles, you can be sure that the person's private profile contains information that matches the search terms. This is a problem.

Fixing The Problem

Facebook's privacy policy essentially states that Facebook is not responsible in any case where a user is able to obtain private information about someone else: Although we allow you to set privacy options that limit access to your pages, please be aware that no security measures are perfect or impenetrable... Therefore, we cannot and do not guarantee that User Content you post on the Site will not be viewed by unauthorized persons. We are not responsible for circumvention of any privacy settings or security measures contained on the Site.

Facebook should be commended for the fact that they have implemented a simple technical solution to the problem. Users can control their search privacy settings - and thus control who can see their profile when searches are issued on the facebook site. This feature did not exist when I first described the vulnerability last year.

The problem is that users must opt-in to this more restrictive privacy setting. Users who have gone to the effort of marking their profiles as private (so that others cannot view them) are not clearly warned that other users may be able to learn bits of information by issuing highly specific search queries. Users should not be expected to know or even understand this.

Facebook should change their defaults, and automatically restrict the profile search settings for any user who makes their profile private. Those users who wished to permit strangers to find them in a search could opt in and modify this setting themselves.

Disclosure

Normally, for something like this, I would follow the norms of responsible disclosure (as I did last month with Firefox/Google) and give Facebook advanced notice of my planned release. However, since I first announced this attack on my blog last September, it doesn't really make sense to try and keep it secret. This post doesn't announce anything new - it merely restates the previously described attack in clearer language, provides a couple screenshots and some sample queries that people can click on.