Monday, October 25, 2010

Eric Schmidt's blames the EU for Google's data retention policies

Google CEO Eric Schmidt was interviewed by CNN this past week.

The most interesting bit of the interview is at the beginning:
Schmidt: We keep the searches that you do for roughly a year/year and a half, and then we forget them.

Question: You say that, but can somebody come to you and say that we need information about Kathleen Parker.

Schmidt: Under a federal court order, properly delivered to us, we might be forced do that, but otherwise no.

Question: Does that happen very often?

Schmidt: Very rarely and if its not formally delivered, then we'll fight it.

Question: You say you keep stuff for a year/year and a half. Who decides?

Schmidt: Well, in fact, the European Government passed a set of laws that require us to keep it for a certain amount, and the reason is that the public safety sometimes wants to be able to look at that information.

Somehow in just a few sentences, Schmidt manages to misrepresent the facts several times (the question of if Schmidt is merely misinformed, or actively lying is left as an exercise for the reader).

First, on the subject of retention, it is completely false to say that after a year/year and a half, Google "forgets" searches.

Google's actual data retention policy is that after 9 months, the company deletes the last octet of users' IP addresses from its search logs, and then modifies the cookie in the logs with a one-way cryptographic hash after 18 months.

The company never deletes or "forgets" users' searches. It merely deletes a little bit of data that associates the searches to known Google users.

For those of you who may be inclined to give Schmidt the benefit of the doubt, regarding the difference between "forgetting" searches, and deleting a couple bits of an IP address in a log, remember that Schmidt has a PhD in computer science.

Google searches and the EU data retention directive

In May of 2007, Google's Global Privacy Counsel claimed that the European Data Retention Directive might apply to search engines.
Google may be subject to the EU Data Retention Directive, which was passed last year, in the wake of the Madrid and London terrorist bombings, to help law enforcement in the investigation and prosecution of "serious crime". The Directive requires all EU Member States to pass data retention laws by 2009 with retention for periods between 6 and 24 months. Since these laws do not yet exist, and are only now being proposed and debated, it is too early to know the final retention time periods, the jurisdictional impact, and the scope of applicability. It's therefore too early to state whether such laws would apply to particular Google services, and if so, which ones. In the U.S., the Department of Justice and others have similarly called for 24-month data retention laws.

One week later, the European Article 29 working party wrote a letter to Google, informing the company that:
As you are aware, server logs are information that can be linked to an identified or identifiable natural person and can, therefore, be considered personal data in the meaning of Data Protection Directive 95/46/EC. For that reason their collection and storage must respect data protection rules.

A month later, Google's Global Privacy Counsel replied to the Working Party:
Because Google may be subject to the requirements of the [Retention] Directive in some Member States, under the principle of legality, we have no choice but to be prepared to retain log server data for up to 24 months

Soon after, the European Commission's Data Protection Unit issued a statement to the media, stating that:
The Data Retention Directive applies only to providers of publicly available electronic communications services or of public communication networks and not to search engine systems . . . Accordingly, Google is not subject to this Directive as far as it concerns the search engine part of its applications and has no obligations thereof

Speaking of Google's claims, Ryan Singel of Wired News wrote that:
"It's a convincing argument, but it’s a misleading one. . . [Google's Global Privacy Counsel] Fleischer has been making this argument for months now, and even Threat Level bought it the first go-round. But let’s reiterate: There is no United States or E.U. law that requires Google to keep detailed logs of what individuals search for and click on at Google’s search engine. It’s simply dishonest to continually imply otherwise in order to hide the real political and monetary reasons that Google chooses to hang onto this data.

Professor Michael Zimmer, an expert on search engine privacy issues similarly debunked Google's false claims.

Finally, in 2008, the Article 29 Working Party issued an opinion on data retention issues related to search engines, which noted that:
Consequently, any reference to the Data Retention Directive in connection with the storage of server logs generated through the offering of a search engine service is not justified . . . the Working Party does not see a basis for a retention period beyond 6 months. However, the retention of personal data and the corresponding retention period must always be justified (with concrete and relevant arguments) and reduced to a minimum.

As this lengthy summary should have made clear, Eric Schmidt's statements that the company has to retain search data because of EU law are simply bogus.

Wednesday, October 20, 2010

More private data leakage at Facebook

Via an anonymous commenter at the Freedom to Tinker blog, I discovered a recent paper from some researchers at Microsoft Research and the Max Plank Institute, analyzing online behavioral advertising.

The most interesting bit is the following text:

[W]e set up six Facebook profiles to check the impact of sexual-preference: a highly-sensitive personal attribute. Two profiles (male control) are for males interested in females, two (female control) for females interested in males, and one test profile of a male interested in males and one of a female interested in females. The age and location were set to 25 and Washington D.C. respectively.

. . .

Alarmingly, we found ads where the ad text was completely neutral to sexual preference (e.g. for a nursing degree in a medical college in Florida) that was targeted exclusively to gay men. The danger with such ads, unlike the gay bar ad where the target demographic is blatantly obvious, is that the user reading the ad text would have no idea that by clicking it he would reveal to the advertiser both his sexual-preference and a unique identifier (cookie, IP address, or email address if he signs up on the advertiser's site). Furthermore, such deceptive ads are not uncommon; indeed exactly half of the 66 ads shown exclusively to gay men (more than 50 times) during our experiment did not mention "gay" anywhere in the ad text.

This means that simply by clicking on a Facebook ad, a user could be revealing a bit of highly sensitive personal information to an advertiser, simply due to the fact that the advertiser has only targeted a particular group (gender, sexuality, religion) for that advertisement. Thus, the moment you arrive at the advertiser's website, they now know that the IP address and cookie value they have assigned to you is associated with someone that is gay, muslim, or a republican.

While it may be obvious that some advertisements are targeted based on these attributes, such as gay dating sites, this study makes it clear that there are some advertisements where such targeting is not intuitive.

Given the privacy firestorm earlier this week, I have a tough time imagining that Facebook will be able to sweep this under the carpet, or, that class action attorneys won't jump on this.

As I see it, the company has two options:

1. Do not allow advertisers to target advertisements based on sensitive categories, such as religion, sexuality, or political affiliation.

2. Disclose, directly below the ad, the fact that the ad was targeted based on a specific profile attribute, and state there which attribute that was. Users should also be told, after clicking on the ad, but before being directed to the site, that the advertiser may be able to learn this sensitive information about them, simply by visiting the site.

I suspect that neither option is going to be something that Facebook is going to want to embrace.

Sunday, October 17, 2010

It is time for the web browser vendors to embrace privacy by default

Three times over the past six months, web browsers' referrer headers have played a major role in major privacy issues. Much of the attention has reasonably been focused on the websites that were leaking their users' private data (in some cases, unintentionally, but at least in Google's case, intentionally). It may be time to focus a bit of attention on the role that the web browser vendors play, and in the pathetic tools they offer to consumers to control this form of information leakage.

The root of the current focus by privacy advocates on the browser referrer header stems from a paper (pdf download) written two researchers last year, who found that Facebook, MySpace and several other online social networks were leaking the unique IDs of their users to behavioral advertising networks. Furthermore, according to a class action lawsuit filed last week, Facebook actually began to leak even more information to advertisers, including users' names, starting in February of this year. It wasn't until the Wall Street Journal called up MySpace and Facebook for quotes in May, that the two companies quickly rolled out fixes (behold, the power of the media).

One month ago, I filed a complaint with the FTC, arguing that Google intentionally leaks its users' search queries to third parties via browser referrer headers. Unlike the Facebook leakage episode, in which it is generally acknowledged that Facebook didn't know about the leakage, Google has repeatedly gone out of its way to make sure this leakage continues, and has publicly confirmed that it is a feature, not a bug.

Now today, the Wall Street Journal has another blockbuster article on referrer leakage. This time, it is Facebook apps that are leaking Facebook user IDs to third parties, including advertising networks and data aggregators like Rapleaf.

It is certainly reasonable to point the finger at companies like Zynga, whose Farmville game has been confirmed by experts to be leaking users' Facebook IDs. However, as the Electronic Frontier Foundation's Peter Eckersley told the WSJ today, "The thing that is perhaps surprising is how much of a privacy problem referers have turned out to be."

These referrer leakage problems are not going to go away, and depending on hundreds of thousands of different websites and apps to take proactive steps to protect their users' privacy is doomed to failure. As such, we need to look to the web browser vendors to fix this problem, since, after all, it is the web browser that sends the referrer header in the first place.

Referrer headers and the browser vendors

The original HTTP standard, dating from 1996, which defined the core technical standard used by web browsers noted that the referrer header feature had significant potential for privacy problems:
Because the source of a link may be private information or may reveal an otherwise private information source, it is strongly recommended that the user be able to select whether or not the Referer field is sent. For example, a browser client could have a toggle switch for browsing openly/anonymously, which would respectively enable/disable the sending of Referer and From information.

Fast forward 14 years, and only two web browsers, Firefox and Chrome, offer a feature to disable the transmission of the referrer header. Internet Explorer and Safari, which are used by 65% of users on the Internet, include no built in functionality to scrub or otherwise protect this information.

While Firefox and Chrome do include features to disable the referrer header, these features are not enabled by default, and enabling them requires technical knowledge that is beyond the vast majority of users.

For example, Firefox users must first type "about:config" into the location bar, navigate past a very scary warning, and then change an obscure preference from 1 to 2.

Likewise, Chrome requires that users start the browser from the command line with a undocumented parameter (‐no‐referrers):

It is time to embrace privacy by default

Earlier this summer, the European Article 29 Working Party released an extensive report on privacy and behavioral advertising. The report (pdf) called on web browser vendors to play a more important role in protecting users, and to embrace privacy by default. While the Working Party was primarily describing cookie controls, the same message applies to referrer headers:
"Given the importance that browser settings play in ensuring that data subjects effectively give their consent to the storage of cookies and the processing of their information, it seems of paramount importance for browsers to be provided with default privacy-protective settings. In other words, to be provided with the setting of 'non-acceptance and non-transmission of third party cookies'. To complement this and to make it more effective, the browsers should require users to go through a privacy wizard when they first install or update the browser and provide for an easy way of exercising choice during use. The Working Party 29 calls upon browser makers to take urgent action and coordinate with ad network providers."

It is time for the browser vendors to listen to this advice. Had IE, Firefox, Chrome and Safari blocked (or at least partially scrubbed) referring headers by default, the leakage from Facebook that the Wall Street Journal highlighted today would never have occurred.

Thursday, October 07, 2010

My FTC complaint about Google's private search query leakage

Today, the Wall Street Journal published an article about a complaint I submitted to the FTC last month, regarding Google's intentional leakage of individuals' search queries with third party sites.

The complaint is 29 pages long, and so I want to try to explain it to those of you who don't have the time or desire to read through the whole complaint.

The complaint centers around an obscure feature in web browsers, known as the HTTP referrer header. Danny Sullivan, a widely respected search engine industry analyst has written that the http referrer header is "effectively the Caller ID of the internet. It allows web site owners and marketers to know where visitors came from." However, while practically everyone with a telephone knows about the existence of caller ID, as Danny also notes, the existence of the referrer header is "little known to most web surfers."

This header reveals to the websites you visit the URL of the page you were viewing before you visited that site. When you visit a site after clicking on a link in a search engine results page, that site learns the terms you searched for (because Google and the other search engines include your search terms in the URL).

Google does not dispute that it is leaking users search queries to third parties. A Google spokesperson told the Wall Street Journal today that its passing of search-query data to third parties "is a standard practice across all search engines" and that "webmasters use this to see what searches bring visitors to their websites."

Thus, we move on to the main point of my complaint, which is that the company does not disclose this "common practice" to its customers, and in fact, promises its customers that it will not share their search data with others.

For example, of the 49 videos in Google's YouTube privacy channel, not one single video describes referrer headers, or provides users with tips on how to protect themselves from such disclosure. On the other hand, the first video that plays when you visit the privacy channel tells the visitor that "at Google, we make privacy a priority in everything we do." Indeed.

Deceptive statements in Google's privacy policy

Google's current privacy policy states that:

Google only shares personal information with other companies or individuals outside of Google in the following limited circumstances:

* We have your consent. We require opt-in consent for the sharing of any sensitive personal information.

* We provide such information to our subsidiaries, affiliated companies or other trusted businesses or persons for the purpose of processing personal information on our behalf . . .

* We have a good faith belief that access, use, preservation or disclosure of such information is reasonably necessary to (a) satisfy any applicable law, regulation, legal process or enforceable governmental request, (b) enforce applicable Terms of Service, including investigation of potential violations thereof, (c) detect, prevent, or otherwise address fraud, security or technical issues, or (d) protect against harm to the rights, property or safety of Google, its users or the public as required or permitted by law.

The widespread leakage of search queries doesn't appear to fall into these three "limited circumstances." Perhaps Google doesn't consider search query data to be "personal information"? However, at least four years ago, it did. When fighting a much publicized request from the Department of Justice for its customers search queries, the company argued that:
"[S]earch query content can disclose identities and personally identifiable information such as user‐initiated searches for their own social security or credit card numbers, or their mistakenly pasted but revealing text."

Until October 3, the company's privacy policy also included the following statement:
We may share with third parties certain pieces of aggregated, non-personal information, such as the number of users who searched for a particular term, for example, or how many users clicked on a particular advertisement. Such information does not identify you individually.

I don't think that it is possible to reasonably claim that millions of individual search queries associated to particular IP addresses are "aggregated, non-personal information".

Google's customers expect their search queries to stay private

In its brief opposing DOJ's request, Google also argued that it has an obligation to protect the privacy of its customers' search queries:
Google users trust that when they enter a search query into a Google search box, not only will they receive back the most relevant results, but that Google will keep private whatever information users communicate absent a compelling reason . . .

The privacy and anonymity of the service are major factors in the attraction of users – that is, users trust Google to do right by their personal information and to provide them with the best search results. If users believe that the text of their search queries into Google's search engine may become public knowledge, it only logically follows that they will be less likely to use the service."

Matt Cutts, a Google senior engineer argued similarly in an affidavit filed with the court:
"Google does not publicly disclose the searches (sic) queries entered into its search engine. If users believe that the text of their search queries could become public knowledge, they may be less likely to use the search engine for fear of disclosure of their sensitive or private searches for information or websites."

Google already protects some of its users search queries

Since May of this year, Google has offered an encrypted search service, available at (in fact, it is the only search engine to currently offer such a service). In addition to protecting users from network snooping, one additional benefit of the service is that it also automatically protects users' query data from leaking via referrer headers.

However, Google has done a poor job of advertising the existence of its encrypted search website, and an even worse job in letting users know about the existence of search query referrer leakage. If users don't know that their queries are being shared with third parties, why would they bother to use the encrypted search service in the first place.

The remedy I seek

If Google wants to share its users' search query data with third parties, there is nothing I can do to stop it. That practice, alone, isn't currently illegal. However, the company should not be permitted to lie about its practices. If it wants to share its customers' search queries with third parties, it should disclose that it is doing so. Even moreso, it shouldn't be able to loudly, and falsely proclaim that it is protecting its users' search data.

However, since the company has for years bragged about the extent to which it protects its customers data, I think that it should be forced to stand by its marketing claims. Thus, I have petitioned the FTC to compel the company to begin scrubbing this data, and to take appropriate steps to inform its existing customers about the fact that it has intentionally shared their historical search data with third parties. This, I think, is the right thing to do.

Tuesday, October 05, 2010

US Marshal Service's Electronic Surveillance Manual

Last week, the FOIA fairy delivered 25 pages of internal rules that outline when and how the US Marshal Service uses electronic surveillance methods. According to the cover letters accompanying the documents, the policies are "obsolete" and that "the office is preparing to rewrite/revise it, which could take 30 days or longer to complete."

The full document can be downloaded here (pdf)

The most interesting things that jumped out to me:

1. One of the most heavily redacted sections relates to the use of trigger fish, or cell site analyzers, which allow the government to locate phones without the assistance of the phone company.

(click for a larger image)

2. The special rules that USMS investigators must follow before wiretapping VIPs such as Members of Congress, Governors and Judges:

(click for a larger image)

3. The revelation that USMS advises investigators to always seek "hybrid" 2703(d) + pen register orders, rather than plain pen register orders when they are investigating a suspect.

(click for a larger image)