Tuesday, December 21, 2010

Thoughts on Mozilla and Privacy

Mozilla has followed Microsoft's lead, and committed to embracing some form of a do not track mechanism in the Firefox browser as soon as early 2011. While this is of course great news, the browser vendor still has a long way to go, particularly if it wants to be able to compete on privacy.

Do Not Track

At a presentation earlier this week, Mozilla's new CEO announced that the Firefox browser would soon include enhanced privacy features, stating that "technology that supports something like a Do Not Track button is needed and we will deliver in the first part of next year." This is great news for users of Firefox, and I look forward to seeing Mozilla taking an active role in the Do Not Track debate as it continues to evolve in Washington, DC.

Of course, Mozilla is not the only browser vendor to make a major privacy announcement in the last month -- just a few weeks ago, Microsoft revealed that the forthcoming beta of IE9 would include support for an ad tracking blacklist. In order to fully analyze Mozilla's announcement, and the organization's reasons for doing so, one must consider it in light of Microsoft's recent announcement, as well as the recent press coverage that both companies have received over their internal deliberations regarding privacy features.

Should Mozilla compete on privacy?

Years ago, when there were just two major browsers, Mozilla had a clear identity. Firefox was the faster, more stable, more secure, standards-compliant browser, with a large number of rich 3rd-party add-ons, including AdBlock Plus. Compared to the sluggish, buggy, popup-ad plagued Internet Explorer browser that is pre-installed on each new Windows PC, the decision to install Firefox was a no-brainer. Those consumers still using IE weren't doing so by choice, for the most part, but were using it because they didn't know there were other options -- hell, as this video demonstrates, they likely didn't even know what a browser is.

Fast forward to 2010, and the browser market has significantly changed.

Apple's 7 year old Safari browser totally dominates the company's iOS platform (primarily due to the company's terms of service which long banned competing browsers), comes pre-installed on all Macintosh computers, and has even made its way on to quite a few Windows computers by sneakily leveraging the iTunes software security update process.

Even more interesting has been the rise of Google's two-year old Chrome browser. It matches Mozilla on standards compliance, supports its own 3rd party extension ecosystem (including AdBlock software), and more importantly, it handily beats the currently shipping version of Firefox on both speed and stability. This has lead to a significant number of tech-savvy users ditching Firefox for Chrome.

The reason I mention this isn't to take a position on which browser is faster or more stable -- merely that Mozilla is now under increasing competitive pressure from Google and Apple, competition that simply didn't exist when IE was the only other game in town.

More than ever, Mozilla needs to be able to differentiate its product, and compete on features that it can win on -- beating Google on speed may be possible, but it'll be tough. Beating Google on privacy should be easy though...

Competing on privacy means more transparency

[Warning, browser vendor insider baseball below]

A few weeks ago, the Wall Street Journal revealed that Mozilla had "killed a powerful new tool to limit tracking under pressure from an ad-industry executive." The feature would have made all 3rd party tracking cookies "session cookies" by default (and thus cause them to be deleted after users shut down their browser).

[Full disclosure: I chat regularly with the WSJ journalists covering the web privacy beat, I provided them with background information on this story, and tipped them off to the communication between Simeon Simeonov and Mozilla.]

After post-publication complaints from Mozilla, the Journal added a correction note to the bottom of the article, stating:
Mozilla Corp. said it removed a privacy feature from a development version of its Firefox Web browsing software on June 8 because of concerns inside the company that the feature would spur more surreptitious forms of tracking and hamper the performance of companies that provide Web statistics and host content for other companies. The removal occurred before a conversation between advertising industry executive Simeon Simeonov and Jay Sullivan, Mozilla's vice president of products, which took place on June 9. A Nov. 30 Marketplace article about the removal incorrectly said that the feature was removed on June 10 in response to the concerns raised by Mr. Simeonov during his conversation with Mr. Sullivan.

Even after the correction, the article was not well received by members of the Mozilla Corporation. Asa Dotzler, Mozilla's Director of Community Development, described the Journal article as "bullshit" and "a complete fabrication designed to smear Mozilla and generate controversy and pageviews."

According to Dotzler:

The real timeline was this: Mozilla engineers prototyped the feature and put it into testing. Mozilla engineers discussed what kind of impact it might have on the Web and concluded that not only would it not be very effective and have some undesirable side effects, but that it would drive advertisers to build worse experiences where users had even less privacy and control. So Mozilla scrapped the feature and started work on designing a better feature. Later, some advertising reps met with Mozilla to let Mozilla know what they were up to on the privacy front and to talk with Mozilla about what it was up to.

I have had a few back and forth emails with Asa over the last few days, and have been frustrated by the experience. In any case, I disagree with him, and I actually believe that the WSJ's original timeline is pretty solid.

My understanding is that the timeline is something like this:

May 12, 2010: Mozilla developer Dan Witte files a bug in the Mozilla bug database, proposing a change to the 3rd party cookie handling code.

May 19: Dan creates patch to implement proposed change, uploads patch to bug tracking system for discussion/review.

May 24: Code review and approved by Mozilla developer Shawn Wilsher.

May 28: Dan's patch is merged into Firefox developer tree.

June 3: Word of patch reaches Jules Polonetsky of the Future of Privacy Forum, who blogs and tweets it.

June 4: Simeon Simeonov emails Mozilla CEO John Lilly, after seeing Jules' blog post.

(How do I know Simeon contacted John? Because Simeon called me up at 1:45PM EST on June 4 to tell me he had done so, after which, we spent 20 minutes debating the impact it would have on the ad industry and user privacy).

June 4, 7PM PST: Mozilla VP of Engineering Mike Shaver posts note to bug report, noting that it is a pretty major change, one that he was not aware of, and that there should be "a fair bit of discussion" about it.

June 8: Patch reverted.

While the WSJ's correction notes that the patch was reverted by Mozilla before Simeon Simeonov and Jay Sullivan, Mozilla's vice president of products, spoke on June 9, the story also mentions an earlier communication that took place between Mozilla's CEO and Simeon -- an email communication which no one at Mozilla has directly denied. This occurred several days before the patch was reverted, and 10 hours before Mozilla VP of Engineering Mike Shaver first commented on the patch.

Let me be clear - I do not believe that Mozilla buckled under pressure from the advertising industry. What I do believe, however, is that Mozilla's senior management had no idea about the existence of this patch, that it had been merged into the Mozilla developer tree several days before, or the major impact it would have on the Internet advertising industry until Mozilla's CEO was contacted by an advertising industry executive.

Once Mozilla's CEO received the email, he likely forwarded it to several people within Mozilla, and I suspect there were dozens of emails sent back and forth between management and the engineers about the patch and its impact on the Internet. As outsiders, we (Mozilla's users) are not privy to those conversations -- instead, we simply see Mike Shaver's comment about there needing to be more discussion about the issue, and then a few days later, a brief note is posted to the bug to say that the patch was reverted.

Yesterday, Mitchell Baker, the Chair of the Mozilla Foundation posted a note to her own blog, taking issue with the Journal article. In her response, Baker claimed that the WSJ story was "not accurate in any shape or form", adding that "decision-making at Mozilla is based on the criteria in the Mozilla Manifesto".

One of the principles in the Mozilla Manifesto is that "Transparent community-based processes promote participation, accountability, and trust."

Again, let me be clear - I think there are legitimate reasons for the decision to revert the 3rd party cookie handling patch, and that Mozilla's entire approach to cookies should be rewritten to better protect user privacy. However, I think it is pretty difficult for Mozilla's executives to argue that the decision to revert the patch was done according to the criteria in the Mozilla Manifesto. Simply put, a large part of the discussion happened behind closed doors, in email messages between Mozilla employees, none of which have been made public. There was very little transparency in the process.

There is a pretty significant missing part of the puzzle here, and I think that Mozilla has a responsibility to shine a bit more light on the internal discussions surrounding this patch.

Conclusion

I am a proud and happy Firefox user. I am on good terms with several Mozilla employees, and I have even developed a successful Firefox add-on, which was downloaded more than 700,000 times before I sold it earlier this year. The computer I am typing this blog post on was paid for with the profits from that sale. I want Mozilla to continue to enjoy great success.

I have watched over the last year or two as Google has eaten away at Mozilla's speed and performance advantage, and so I desperately want Mozilla to find an area in which it can out compete Google. I really do believe that privacy is that area.

However, for Mozilla to win on privacy, it needs to put users first, 100% of the time, and it needs to be very open about it. As an organization that receives the vast majority of its funding from an advertising company (Google), Mozilla needs to hold itself to the highest standard of ethics and permit its users to know the reasoning behind design decisions, particularly those that will impact Google and the other advertising networks.

Tuesday, December 07, 2010

Initial thoughts on Microsoft's IE9 Tracking Protection Announcement

While I am often critical of companies for their privacy practices, when they do good things, I think it is important to publicly praise them for it. As such, Microsoft deserves a significant amount of credit for moving the ball forward on privacy enhancing features in the browser. This blog post will reveal a few of my initial thoughts about Microsoft's announcement, and what I think are the politics behind its decision.

Briefly, Microsoft today announced that it will be improving the InPrivate Filtering feature in its browser -- which would have been a great feature, if the company hadn't intentionally sabotaged it in response to pressure from people within the company's advertising division.

When it was enabled by the user, InPrivate Filtering observed the 3rd party servers that users kept interacting with as they browsed the web, and once a server showed up more than a set number of times, the browser would block future connections to it. The feature was surprisingly effective, but unfortunately, Microsoft decided to require users to re-enable it each time they used their browser, rather than making the preference stick.

The company announced today that the forthcoming release candidate of IE9 will replace InPrivate Filtering with a Tracking Protection feature. The company is doing away with the automatic compilation of a list by the browser based on the users' own browsing, and instead shifting to a model where the user can subscribe to a regularly updated list of servers to which the browser will block all 3rd party connections.

If this feature sounds familiar, perhaps it is because Microsoft is essentially building AdBlock Plus into their browser, except that Microsoft itself will not be providing the list of ad networks. It will be up to consumer groups (or perhaps government regulators) to do that themselves.

It is important to note that once a user subscribes to such a list, as with the InPrivate Filtering feature, all 3rd party connections to the servers will be blocked. This means that not only will advertising networks on the list be blocked from tracking users, but IE9 will not even display advertising provided by those firms' servers.

Analysis

I have a few thoughts on this announcement. I'm short on time, and so I'm going to list them (in no particular order):

  • Realpolitik. This is a very savvy, strategic decision on Microsoft's part. I think that the company probably thinks its own advertising business (or at least, its own overall bottom line) will suffer less than its competitors. After all, Google gets most of its money from online advertising, whereas Microsoft still earns a vast sum of money from Office and Windows.

  • Do not track. This is almost certainly designed to impact the current debate on Do Not Track taking place in Washington DC. While the debate has thus far centered around a header based mechanism, Microsoft may well try to make the case that the FTC could supply a subscription list of known tracking servers, which consumers could then subscribe to by visiting www.donottrack.gov, or some similar URL.

  • Multiple domains. Once the EFF, NAI, ACLU and perhaps even FTC start distributing subscription lists of ad network servers, the online advertising industry will likely have to embrace a multi-domain model. That is, if they continue to serve both contextual (non-targeted) and targeted advertisements from the same domain name, then their servers' inclusion in subscription blacklists will mean that consumers will not see any of the advertisements they deliver, and not just avoid the tracking. Faced with the choice of not being able to show any ads, or just not being able to target users, the ad networks may have to swallow their pride, and roll out alternate, non-tracking domains and servers for contextual ads.

  • What is tracking. If the ad networks do shift to a multi-domain model, then they will likely argue that they should still be able to deliver persistent cookies to users from their non-tracking domains, if those cookies are solely used for the purpose of doing frequency capping, and sequencing of multi-creative advertising campaigns. They will also try and argue that retargeting should not be considered tracking. There will likely be an intense lobbying campaign by the advertisers to narrowly define tracking, at least for the purpose of any FTC or other government agency supplied blacklist.

  • First to the party. When Google deployed SSL by default for users of Gmail in January, the company received widespread praise. When Microsoft followed suit in November (albeit not by default), the announcement received significantly less press, and even some criticism (for not doing it sooner, and not by default). The take-home message here is that the first company to roll out a privacy technology is the one that gets all the attention. Now that Microsoft has made this announcement, Google, Apple and Mozilla may be forced to follow, but if and when they do, they won't get nearly as much praise for doing so.

  • Competing on privacy. Microsoft has long wanted, and tried to compete on privacy, but never quite got it right. Most significantly, the company took the lead in adopting a strong search data retention, and IP address anonymizaton policy, in contrast to Google, which still continues to deceptively claim that its own policy of deleting a single octet from IP address logs is anonymization. While Microsoft offered far better privacy in this space, it failed in the battle to communicate these differences to the press, and Google received praise for offering far less. With this announcement, Microsoft appears to be yet again attempting to compete on privacy -- with any luck, the company will be successful in differentiating its product on these features.

  • Future proofing against 3rd party tracking. By opting to block connections to servers on the blacklist, Microsoft is offering IE9 users protection against more than just cookie based tracking. Flash cookies, evercookie, cache cookies, timing attacks, and even fingerprinting will all be blocked -- as long as the tracking is conducted by 3rd party servers. However, as Craig Wills and Balachander Krishnamurthy have documented, ad networks are increasingly using subdomain alias techniques (e.g. ads.publisher.com points to adserver.com) to bypass browser's 3rd party cookie blocking features. If ad networks find their servers blocked by IE, we may increasingly see them "innovate" around this blocking by further embracing alias subdomains and other sneaky techniques.

Conclusion

This is a great, pro-privacy and strategically savvy move on Microsoft's part. I am delighted to see companies competing on privacy, and building better features into their products. This announcement will likely have a significant impact on the current Do Not Track debate, and it will be interesting to see how the ad industry, the other browser vendors, and government regulators respond.

Thursday, December 02, 2010

DOJ's "hotwatch" real-time surveillance of credit card transactions

A 10 page Powerpoint presentation (pdf) that I recently obtained through a Freedom of Information Act Request to the Department of Justice, reveals that law enforcement agencies routinely seek and obtain real-time surveillance of credit card transaction. The government's guidelines reveal that this surveillance often occurs with a simple subpoena, thus sidestepping any Fourth Amendment protections.

Background

On October 11, 2005, the US Attorney from the Eastern District of New York submitted a court filing in the case of In re Application For Pen Register and Trap and Trace Device With Cell Site Location Authority (Magistrate's Docket No. 05-1093), which related to the use of pen register requests for mobile phone location records.

In that case, the US Attorney’s office relied on authority they believed was contained in the All Writs Act to justify their request for customer location information. In support of its claim, the office stated that:

Currently, the government routinely applies for and upon a showing of relevance to an ongoing investigation receives “hotwatch” orders issued pursuant to the All Writs Act. Such orders direct a credit card issuer to disclose to law enforcement each subsequent credit card transaction effected by a subject of investigation immediately after the issuer records that transaction.

A search of Google, Lexisnexis and Westlaw revealed nothing related to "hotwatch" orders, and so I filed a FOIA request to find out more. If the government "routinely" applies for and obtains hotwatch orders, why wasn't there more information about these.

It took a year and a half to learn anything. The Executive office of US Attorneys at the Department of Justice located 10 pages of relevant information, but decided to withhold them in full. I filed my first ever FOIA appeal, which was successful, albeit very slow, and finally received those 10 pages this week.



As the document makes clear, Federal law enforcement agencies do not limit their surveillance of US residents to phone calls, emails and geo-location information. They are also interested in calling cards, credit cards, rental cars and airline reservations, as well as retail shopping clubs.

The document also reveals that DOJ's preferred method of obtaining this information is via an administrative subpoena. The only role that courts play in this process is in issuing non-disclosure orders to the banks, preventing them from telling their customers that the government has spied on their financial transactions. No Fourth Amendment analysis is conducted by judges when issuing such non-disclosure orders.

While Congress has required that the courts compile and publish detailed statistical reports on the degree to which law enforcement agencies engage in wiretapping, we currently have no idea how often law enforcement agencies engage in real-time surveillance of financial transactions.

Monday, November 22, 2010

DOJ has granted itself new surveillance powers

Update @ 8PM 11/22/2010: EFF first sounded the alarm about roving 2703(d) orders back in 2005, which were being used to obtain phone information.

Electronic communications privacy law in the United States is hopelessly out of date. As several privacy groups have noted, the statute that governs when and how law enforcement agencies can obtain individual's private files and electronic documents hasn't really been updated since it was first written in 1986.

Over the past year, privacy groups, academics and many companies have gotten together to push for reform of the Electronic Communications Privacy Act (ECPA). These stakeholders have lobbied for reform of this law, and in turn, both the House and Senate have held hearings on various issues, ranging from cloud computing to cellular location data.

Of course, complaints about the existing statute are not limited to those wishing to protect user privacy -- law enforcement agencies would very much like to expand their authority. However, as I document in this blog post, rather than going to Congress to ask for new surveillance powers, the Department of Justice, and in particular, the US Marshals Service, have simply created for themselves a new "roving" order for stored communications records.

Let that sink in for a second. Rather than wait for Congress to give it new authority, the Department of Justice has instead just given itself broad new surveillance powers.

Roving Wiretaps

For nearly 15 years, law enforcement agencies have had "roving wiretap" authority, meaning that they can get a court order that does not name a specific telephone line or e-mail account but allows them to wiretap any phone line, cell phone, or Internet connection that a suspect uses. In order to use this expanded authority, prosecutors have to show probable cause that they believe that the individual under investigation is avoiding intercepts at a particular place.

Although there are more than 2000 wiretap orders issued each year, as the table below reveals, federal and local law enforcement agencies rarely seek to use this roving authority.



Roving Pen Registers and Trap & Trace orders

While wiretap orders are used for the real-time interception of communications content, pen register and trap & trace orders are used to intercept, in real-time, non-content information associated with communications. This includes the numbers dialed, to/from addresses associated with emails, etc.

Traditionally, like wiretap orders, pen register/trap & trace orders had to name the recipient (phone company or ISP) in the order. If the government wished to go to a different ISP, they'd need to return to the judge to get another order. However, the USA PATRIOT act expanded the scope of pen register and trap & trace orders, essentially turning them into roving orders by default:

The [pen register] order . . . shall apply to any person or entity providing wire or electronic communication service in the United States whose assistance may facilitate the execution of the order.

Whenever such an order is served on any person or entity not specifically named in the order, upon request of such person or entity, the attorney for the Government or law enforcement or investigative officer that is serving the order shall provide written or electronic certification that the order applies to the person or entity being served.

Thus, post PATRIOT Act, by using a wiretap or pen register authority, law enforcement agencies can use a single court order to obtain real-time non-content data from any 3rd party that may have it, even if the service provider was not named in the original court order.

Stored communications and customer records

The vast majority of surveillance requests are not for real-time data, but for historical information. That is, rather than seeking to intercept emails or web browsing activities as they are transmitted, law enforcement agencies often seek information after the fact. This is both easier, and often much cheaper.

For example, existing surveillance reports reveal that 1773 wiretap orders were issued in 2005, 625 of which were for federal agencies. Similarly in 2005, a total of 6790 pen registers and 4393 trap & trace orders were obtained by law enforcement agencies within the Department of Justice (the FBI, DEA, ATF and the Marshals).

In that same year, Verizon received 36,000 requests for customer information from federal law enforcement agencies and 54,000 requests from state and local law enforcement agencies.

That is, Verizon's requests alone dwarf the number of publicly reported wiretaps and pen registers, by nearly 700%. This doesn't mean that the wiretap numbers are incorrect -- merely that the vast majority of requests that Verizon received were for stored records, such as historical information on the phone numbers its customers dialed, old text messages, and stored emails. It is quite reasonable to assume that other major telecommunications carriers receive a similar number of requests.

2703(d) orders

Federal law requires that law enforcement agencies first obtain a special court order (known as a 2703(d) order) before they can compel third party service providers to deliver many types of stored user non-content data. Such court orders must name the service provider that has the data, and unlike in the case of wiretaps and pen registers, Congress has not granted roving authority to law enforcement agencies. This means that law enforcement agencies are supposed to obtain a 2703(d) order naming each ISP or phone company that has data that the government would like to get.

Roving 2703(d) orders

Updated at 8PM on 11/22/2010 to give credit to EFF for first discovering roving d orders

In 2005, the Electronic Frontier Foundation filed a brief in federal court, objecting to a request by the Department of Justice for an order requiring "relevant service providers… to provide subscriber information about [all] numbers obtained from the use of… pen/trap devices" upon oral or written demand by relevant law enforcement officials.

Section 2703 of 18 USC provides that:
"a governmental entity may require a provider of electronic communications service…to disclose a record or other information pertaining to a subscriber or customer of such service…only when the government… obtains a court order for such disclosure under subsection (d) of this section."
As the EFF told the court:
"This language [in 2703] clearly contemplates orders that require disclosure of particular records regarding particular customers of particular providers, not general orders that the government can use on its own discretion to continuously demand unspecified records about unspecified people from unspecified providers, for the entire duration of a related pen-trap surveillance.

. . .

The Stored Communications Act simply does not authorize open-ended or "roving" orders that are enforced based on the government’s oral or written representations of its pen-trap results. Indeed, such orders would leave the government in a dangerously unchecked position to obtain subscriber information for any telephone number without court oversight or approval."

The EFF's 2005 brief objected to the government's attempts to get roving 2703(d) orders for subscriber records from phone companies. It seems that the government has since expanded its use of these roving 2703(d) orders to email providers.

I recently obtained a copy of the US Marshals Electronic Surveillance Manual, which I obtained through a Freedom of Information Act (FOIA) request. As I highlighted in a previous blog post, that handbook reveals that the US Marshals have adopted a policy of always obtaining a 2703(d) order whenever they seek a pen register.


The surveillance manual lists several advantages to obtaining such "hybrid" 2703(d)/pen register orders - such as the ability to get geo-location data from providers, who are prohibited by law from revealing "any information that may disclose the physical location of the subscriber" in response to a pen register order. It is not until a few paragraphs later, when another advantage of the hybrid order (and its limitations) is hinted at.


What is happening here is a bit complex. In essence, federal surveillance law does not permit for roving 2703(d) orders, but it does permit for roving pen register authority. Therefore, DOJ believes that when it staples together a pen register order and a 2703(d) order, that the roving aspect of the pen register order automatically transfers to the 2703(d) order.

Thus, DOJ believes that law enforcement agencies can send a copy of a hybrid 2703(d)/pen register order to ISPs not named in the order, and force them to disclose stored subscriber records and communications non-content data, such as email headers.

DOJ's reason for doing this, at least according to the Marshals' surveillance manual, is "because we say so":
Although compelling compliance with a Pen/Trap order that also required disclosure of stored records (e.g. subscriber) is unclear under this section, investigators should assert that compliance with the entire order is mandatory irrespective of whether a provider is specifically named in the order.
Again -- even though the law does not grant the government this expanded authority, DOJ urges investigators to still assert that that companies must comply with the request.

DOJ is using this authority

Nearly a year ago, I obtained an invoice from Google to the US Marshals Service related to a pen register order from December 2007.
The invoice states that:
"We understand that you have requested customer information regarding the user account specified in the Pen Register/Trap Trace, which includes the following information: (1) Subscriber information for the gmail account [redacted]@gmail.com; (2) Information regarding session timestamps and originating IP addresses for recent logins by this account; and a CD containing (3) Header information for the specified date range."

The phrasing of this text reveals that the Marshals first delivered the pen register order to a different ISP, and that the gmail.com account appeared in the data delivered by that other service provider in response to the pen register request. As such, neither Google nor the particular gmail.com address were named in the original pen register order issued by the judge.

Google likely received a hybrid 2703/pen register order from the US Marshals Service, and, even though the company was not named in the original order, it provided historical, stored non-content data and subscriber information to law enforcement officials. The company could very easily have told the Marshals to get lost, and come back with a 2703(d) order signed by a judge, naming Google.

I'm not sure what is more alarming, that the US government abuses its already broad surveillance powers, or that Google, a company that pledges to "be a responsible steward of the information we hold" is not in fact insisting that law enforcement agencies follow the letter of the law.

Thursday, November 11, 2010

Thoughts on Microsoft's Hotmail SSL deployment

Update 10:00pm: I was contacted by an extremely well informed individual who told me that my speculation about Microsoft's webserver SSL performance was completely wrong. The individual declined to reveal the reason why the company opted to make SSL opt-in, which makes the decision even more curious. Why expose users to needless security risks if protecting them doesn't require significant additional computing resources.


On November 9, Microsoft rolled out opt-in HTTPS (SSL) protection for its Hotmail service, which came just a couple weeks after Firesheep made the importance of such security measures quite clear. For those of you just tuning in to SSL issues, Microsoft's announcement might seem like a great move. This blog post will explain why Microsoft deployed this security enhancement, why it hasn't done it by default, and why it should.

Background

Over the past few years, researchers released several security tools that automated the capture of credentials and session cookies, allowing an attacker to easily hijack user accounts that were logged into over an insecure wifi connection. In October, 2008, Mike Perry released Cookiemonster, which made session hijacking against several major popular web 2.0 services even easier. Across the board, webmail and social networking services totally ignored the individual pleas from security researchers and academics that they protect their users by default. Google offered SSL, but disabled it by default, and the other big companies, Facebook, Microsoft, Yahoo, didn't offer SSL at all.

Fed up with the lack of any progress, in June 2009, I published an open letter to Google's CEO, asking him to protect his customers and deploy SSL default. 37 other big name security researchers, academics and legal experts signed on, helping to get a bit of press attention. Google soon said they'd begin to study the possibility of deploying SSL by default, and then in January 2010, the company did it -- encrypting every Gmail users' entire session by default.

In addition to publishing the open letter, I sent copies of it to privacy bigshots at both Microsoft and Facebook, and told them, essentially, "don't make me write a letter for you too." Individuals at both companies thanked me for the warning, and told me they were looking into the possibility of offering SSL.

In March 2010, outgoing FTC Commissioner Pamela Jones Habour spent much of her final public speech talking about SSL.
Even though these service providers know about the vulnerabilities, and the ease with which they can be exploited, the firms continue to send private customer information over unsecured Internet connections that easily could have been secured.

My bottom line is simple: security needs to be a default in the cloud. Today, I challenge all of the companies that are not yet using SSL by default. That includes all email providers, social networking sites, and any website that transmits consumer data. Step up and protect consumers. Don’t do it just some of the time. Make your websites secure by default.
Commissioner Habour's remarks were, to my knowledge, the first time a senior government official had ever weighed in on the issue. The fact that this happened seven months after I joined the FTC is entirely coincidental.

Microsoft's move towards SSL

Just one month later, in April 2010, Microsoft announced that they too would soon offer SSL, although not by default. Fast forward to November 9, 2010, and Microsoft has made good on its promise.

Users who go out of their way to type https://www.hotmail.com will now receive protection for just that session. Furthermore, the first time users type in the https URL, they see a helpful dialog offering to make SSL the default for future connections.



The dialog states that Microsoft recommends the use of HTTPS by default. The problem with this, of course, is that Microsoft only shows this dialog to consumers who know enough about SSL to have visited the secure version of hotmail in the first place.

Consumers who do not know about the risks of using Hotmail over an insecure wifi connection will never see this dialog, and will thus not know that Microsoft recommends they use SSL by default.

That isn't the only way that Hotmail users can discover the availability of SSL and turn it on.

Hotmail users who regularly read the Inside Windows Live blog may have seen Microsoft's announcement of its SSL deployment, where the company announced a special URL that Hotmail users can visit to set the SSL preference: https://account.live.com/ManageSSL (shown below).



Curiously, neither the Inside Windows Live blog, nor the special ManageSSL web page state that Microsoft recommends the use of SSL by default, and the ManageSSL web page even has the "Don't use HTTPS automatically" option pre-selected by default.

Realistically, the vast majority of Hotmail users simply type "www.hotmail.com" into their browser, and do not read the Inside Windows Live blog, and so will be completely unaware that Microsoft now offers an SSL option. There is no mention of SSL on the regular Hotmail front page.

These users are not completely out of luck, as there is a preference within the Hotmail options that they can flip to enable SSL by default. From within their Hotmail Inbox, they need to click on "Options", then "More Options", then "Account details (password, aliases, time zone)", then "Connect with HTTPS" (the last option on the page), then "Use HTTPS automatically", and finally, click "save". See, that was easy. It only took 6 mouse clicks.

Why Microsoft doesn't use SSL by default for Hotmail

At the same time as Microsoft started to offer SSL as an option for Hotmail, it also enabled SSL by default for its SkyDrive, Photos, Docs, and Devices products. What is the difference between these services? Hotmail has lots of users, and no one uses Photos or Skydrive. Simply put, it is easy (and cheap) to deploy SSL for a service when it only has a few (hundred?) thousand users. Hotmail, which reportedly has over 500 million users, is a bit more expensive to protect.

"Wait a minute.. didn't Google say they didn't need any additional servers for SSL?" you may ask. Yes, it's true. Google was able to deploy SSL by default on their their existing servers, and according to Adam Langley, a senior Google engineer, after tweaking the OpenSSL library used by Google, SSL accounts for just 1% of the CPU overhead on those servers.

However, Google has a top notch server infrastructure, running on Linux, and a lot of really skilled engineers. Microsoft, on the other hand, uses their own products.

While Microsoft doesn't reveal too many details about the infrastructure hosting Hotmail, from Netcraft, we can see that they are using their own IIS/6.0 webserver (Netcraft lists the OS as Linux, but that is because Akamai is sitting in front of Microsoft's servers). It is of course understandable that Microsoft likes to use its own products -- unfortunately, the IIS webserver isn't very good, does not use OpenSSL, and thus SSL likely consumes quite a bit more CPU than the 1% hit that Google described.

As such, I suspect that Microsoft has instead opted to either: Pay Akamai to take care of SSL, or the company bought a large number of off the shelf SSL accelerator devices. In either case, SSL is likely costing Microsoft real money -- and, given that the company's Online Services Division lost half a billion dollars last year, it isn't too surprising why the company might be keen to try and keep its SSL related costs to a minimum.

Simply put, if Microsoft is paying a direct financial cost for SSL, then it is easy to understand why it is not offering SSL to its 500 million hotmail users by default.


What should Microsoft (and other companies) do?

When it comes to privacy and security, I think that the government can play a really important role in protecting consumers, particularly when the market has failed to deliver products that are safe by default. The problems that Firesheep has highlighted existed for years, in fact, as long as Hotmail or Facebook have existed, they have been vulnerable to account hijacking. These companies have had more than enough time to protect their customers, and have simply ignored the problem.

While I do think that privacy regulators can play a role here, I don't think it is appropriate for regulators to require that companies deliver specific products -- things get very messy when technology-ignorant bureaucrats mandate product features. However, I do think that governments can, and should compel those companies that have not protected their customers by default to at least warn users about the risks.

Earlier this year, I published a law journal article about encryption in the cloud -- which specifically focuses on fact that most services don't even offer SSL, let alone turn it on by default. In that article, I argue that if companies do not wish to protect their customers, they should at least warn them about the risks of connecting to their services when using an insecure wifi connection. Knowing that companies are unlikely to voluntarily provide such notices, I call on the government to compel the display of cigarette packet style warnings for insecure cloud based services, such as:

WARNING: Email messages that you write can be read and intercepted by others when you connect to this service using a public network (such a wireless network at a coffee shop, public library or school). If you wish to protect yourself from this risk, click here for a secure version of this service.

WARNING: The word processing documents that you create using this service can be read and modified by others when you connect to this site using a public network (such a wireless network at a coffee shop, public library or school). Widely available technologies exist that will protect you from these risks, but this service provider has opted to not offer such protective functionality.


Of course, I suspect that Microsoft and Facebook would rather eat the financial cost of deploying SSL, even if it runs into the millions of dollars, rather than display such a scary warning.. and that is exactly the point. Simply by forcing companies to reveal known risks in their products, governments can gently nudge companies to protect their customers.

Saturday, November 06, 2010

DOJ: Consumers read and understand privacy policies

The Department of Justice has a problem. One by one, judges across the country have been chipping away at DOJ's flimsy legal theories upon which it has for years compelled phone companies to disclose individuals' historical and real-time geo-location information without a warrant.

DOJ's legal theory relies upon the third party doctrine. Essentially, what this means is that companies can be compelled, without a search warrant, to disclose any information that their customers have willingly given them.

One of the most important Supreme Court cases which shaped the this rule, Smith v. Maryland, focused on the legal process through which law enforcement agencies can obtain the phone numbers dialed by a suspect:
[W]e doubt that people in general entertain any actual expectation of privacy in the numbers they dial. All telephone users realize that they must 'convey' phone numbers to the telephone company, since it is through telephone company switching equipment that their calls are completed.

. . .

[W]hen he used his phone, petitioner voluntarily conveyed numerical information to the telephone company and "exposed" that information to its equipment in the ordinary course of business. In so doing, petitioner assumed the risk that the company would reveal to police the numbers he dialed.

Since that 1979 case, the government has stretched the third party doctrine, from dialed phone numbers to essentially all non-content information transmitted by a telephone, including cell site records revealing where an individual has been.

Unfortunately for the government, the Third Circuit Court of Appeals recently eviscerated the government's legal theory, finding that there is a big difference between dialed phone numbers, and triangulated geo-location information:
A cell phone customer has not "voluntarily" shared his location information with a cellular provider in any meaningful way. As the EFF notes, it is unlikely that cell phone customers are aware that their cell phone providers collect and store historical location information. Therefore, "[w]hen a cell phone user makes a call, the only information that is voluntarily and knowingly conveyed to the phone company is the number that is dialed and there is no indication to the user that call will also locate the caller; when a cell phone user receives a call, he hasn't voluntarily exposed anything at all.

After the Third circuit decision, magistrate judges took note, asking the Department of Justice to explain the reasons why cellular information should still be disclosed under the third party doctrine, rather than requiring a search warrant based upon a showing of probable cause.

On October 25, the Department of Justice responded in a brief (pdf) filed with a federal magistrate judge in Houston:
Cell phone users also understand that the provider will know the location of its own cell tower, and that the provider will thus have some knowledge of the user’s location. Indeed, providers’ terms of service and privacy policies make clear that the provider’s obtain this information.

. . .

Use of a cell phone is entirely voluntary, and a user will know from his experience with his cell phone and from a provider’s privacy policy/terms of service that he will communicate with a provider’s cell tower and that this communication will convey information to the provider about his location.

A footnote below the first sentence includes some text from T-Mobile's privacy policy, after which, DOJ argues that the privacy policy makes it clear that users understand their location information is communicated to T-Mobile:
The first of these paragraphs demonstrates that a cell phone customer will be aware that T-Mobile obtains information regarding the customer’s location. The second paragraph demonstrates that a customer will be aware that T-Mobile collects this information. The third paragraph demonstrates that the customer will be aware that this information becomes a T-Mobile business record.

Consumers read privacy policies, because we say so

DOJ's argument is essentially this:

  1. Phone companies disclose in their privacy policies that they have access to subscribers' location information (with citation to privacy policies).
  2. (. . .)
  3. Therefore, consumers reasonably understand that their location information is transmitted to the phone company whenever their phone is on, and thus historical location information shouldn't be protect by the 4th amendment.

What is missing, of course, is a direct claim that consumers read privacy policies. The government can't actually state this claim, because it is frankly laughable. Instead, it argues that:
"[A] user will know from his experience with his cell phone and from a provider’s privacy policy/terms of service"

The implied claim is that consumers read privacy policies. How else would a user know what is in the provider's privacy policy and terms of service unless he or she read the thing? Thus, the government's legal theory still depends upon the idea that consumers, or at least most consumers, read and understand privacy policies.

The FTC and Supreme Court discuss privacy policies

The Department of Justice isn't the only part of the US government to have made official statements regarding privacy policies, and the extent to which consumers read them. The Federal Trade Commission is tasked with protecting consumers' privacy online, and officials there frequently speak about this topic.

In introductory remarks at a privacy roundtable in December 2009, Federal Trade Commission Chairman Leibowitz told those assembled in the room that:
We all agree that consumers don’t read privacy policies – or EULAs, for that matter.

Similarly, in a August 2009 interview, David Vladeck, the head of the FTC's Bureau of Consumer Protection told the New York Times that:
Disclosures are now written by lawyers, they’re 17 pages long. I don’t think they’re written principally to communicate information; they’re written defensively. I’m a lawyer, I’ve been practicing law for 33 years. I can’t figure out what the hell these consents mean anymore. And I don’t believe that most consumers either read them, or, if they read them, really understand it. Second of all, consent in the face of these kinds of quote disclosures, I’m not sure that consent really reflects a volitional, knowing act.

Echoing both of these statements, in an official filing earlier this year with the Commerce Department, the FTC wrote that:
The current privacy framework in the United States is based on companies' privacy practices and consumers' choices regarding how their information is used. In reality, we have learned that many consumer do not read, let alone understand such notices, limiting their ability to make informed choices.

Even the Chief Justice of the US Supreme Court has weighed in the issue, albeit only in a speech before students in Buffalo, NY just a few weeks ago. Answering a student question, Roberts admitted he doesn’t usually read the terms of service or privacy polices, according to the Associated Press:
It has "the smallest type you can imagine and you unfold it like a map," he said. "It is a problem," he added, "because the legal system obviously is to blame for that." Providing too much information defeats the purpose of disclosure, since no one reads it, he said. "What the answer is," he said, "I don’t know."

Academic research on privacy policies

Academic research seems to uniformly support the FTC's arguments.

Among 222 study participants of the 2007 Golden Bear Omnibus Survey, the Samuelson Clinic found that only 1.4% reported reading EULAs often and thoroughly, 66.2% admit to rarely reading or browsing the contents of EULAs, and 7.7% indicated that they have not noticed these agreements in the past or have never read them.

Similarly, a survey of more than 2000 people by Harris Interactive in 2001 found that more than 60 percent of consumers said they had either "spent little or no time looking at websites' privacy policies" or "glanced through websites' privacy policies, but . . . rarely read them in depth." Of those individuals surveyed, only 3 percent said that "most of the time, I carefully read the privacy policies of the websites I visit."

American consumers are not alone. In 2009, the UK Information Commissioner's Office conducted a survey of more than 2000 people, and found that 71% did not read or understand privacy policies.

While the vast majority of consumers don't read privacy policies, some do seem to notice the presence of a privacy policy on a company's website. Unfortunately, most Americans incorrectly believe that the phrase privacy policy signifies that their information will be kept private. A 2003 survey by Annenberg found that 57% of 1,200 adults who were using the internet at home agreed or agreed strongly with the statement "When a web site has a privacy policy, I know that the site will not share my information with other websites or companies." In the 2005 survey, questioners asked 1,200 people whether that same statement is true or false. 59% answered it is true.

Even if consumers were interested in reading privacy policies -- doing so would likely consume a significant amount of their time. A research team at Carnegie Mellon University calculated the time to read the privacy policies of the sites used by the average consumer, and determined that:
[R]eading privacy policies carry costs in time of approximately 201 hours a year, worth about $2,949 annually per American Internet user. Nationally, if Americans were to read online privacy policies word–for–word, we estimate the value of time lost as about $652 billion annually.

Finally, even if consumers took the time to try and read privacy policies, it is quite likely that many would not be capable of understanding them. In 2004, a team of researchers analyzed the content of 64 popular website's privacy policies, and calculated the reading comprehension skills that a reader would need to understand them. Their research revealed that:
Of the 64 policies examined, only four (6%) were accessible to the 28.3% of the Internet population with less than or equal to a high school education. Thirty-five policies (54%) were beyond the grasp of 56.6% of the Internet population, requiring the equivalent of more than fourteen years of education. Eight policies (13%) were beyond the grasp of 85.4% of the Internet population, requiring the equivalent of a postgraduate education. Overall, a large segment of the population can only reasonably be expected to understand a small fragment of the policies posted.

Conclusion

As the academic research I have summarized here, and multiple statements by FTC officials make clear, consumers do not read privacy policies. As such, it is shocking that the Department of Justice would, in representing the official position of the United States Government, argue otherwise before a court

I hope that responsible persons inside DOJ will take note of this blog post, contact the court, and retract their claim. I also hope that the new White House Interagency Subcommittee on Privacy & Internet Policy will take note of this issue, and make sure that this sort of claim doesn't find its way into any future DOJ legal briefs.

Monday, October 25, 2010

Eric Schmidt's blames the EU for Google's data retention policies

Google CEO Eric Schmidt was interviewed by CNN this past week.



The most interesting bit of the interview is at the beginning:
Schmidt: We keep the searches that you do for roughly a year/year and a half, and then we forget them.

Question: You say that, but can somebody come to you and say that we need information about Kathleen Parker.

Schmidt: Under a federal court order, properly delivered to us, we might be forced do that, but otherwise no.

Question: Does that happen very often?

Schmidt: Very rarely and if its not formally delivered, then we'll fight it.

Question: You say you keep stuff for a year/year and a half. Who decides?

Schmidt: Well, in fact, the European Government passed a set of laws that require us to keep it for a certain amount, and the reason is that the public safety sometimes wants to be able to look at that information.

Somehow in just a few sentences, Schmidt manages to misrepresent the facts several times (the question of if Schmidt is merely misinformed, or actively lying is left as an exercise for the reader).

First, on the subject of retention, it is completely false to say that after a year/year and a half, Google "forgets" searches.

Google's actual data retention policy is that after 9 months, the company deletes the last octet of users' IP addresses from its search logs, and then modifies the cookie in the logs with a one-way cryptographic hash after 18 months.

The company never deletes or "forgets" users' searches. It merely deletes a little bit of data that associates the searches to known Google users.

For those of you who may be inclined to give Schmidt the benefit of the doubt, regarding the difference between "forgetting" searches, and deleting a couple bits of an IP address in a log, remember that Schmidt has a PhD in computer science.

Google searches and the EU data retention directive

In May of 2007, Google's Global Privacy Counsel claimed that the European Data Retention Directive might apply to search engines.
Google may be subject to the EU Data Retention Directive, which was passed last year, in the wake of the Madrid and London terrorist bombings, to help law enforcement in the investigation and prosecution of "serious crime". The Directive requires all EU Member States to pass data retention laws by 2009 with retention for periods between 6 and 24 months. Since these laws do not yet exist, and are only now being proposed and debated, it is too early to know the final retention time periods, the jurisdictional impact, and the scope of applicability. It's therefore too early to state whether such laws would apply to particular Google services, and if so, which ones. In the U.S., the Department of Justice and others have similarly called for 24-month data retention laws.

One week later, the European Article 29 working party wrote a letter to Google, informing the company that:
As you are aware, server logs are information that can be linked to an identified or identifiable natural person and can, therefore, be considered personal data in the meaning of Data Protection Directive 95/46/EC. For that reason their collection and storage must respect data protection rules.

A month later, Google's Global Privacy Counsel replied to the Working Party:
Because Google may be subject to the requirements of the [Retention] Directive in some Member States, under the principle of legality, we have no choice but to be prepared to retain log server data for up to 24 months

Soon after, the European Commission's Data Protection Unit issued a statement to the media, stating that:
The Data Retention Directive applies only to providers of publicly available electronic communications services or of public communication networks and not to search engine systems . . . Accordingly, Google is not subject to this Directive as far as it concerns the search engine part of its applications and has no obligations thereof

Speaking of Google's claims, Ryan Singel of Wired News wrote that:
"It's a convincing argument, but it’s a misleading one. . . [Google's Global Privacy Counsel] Fleischer has been making this argument for months now, and even Threat Level bought it the first go-round. But let’s reiterate: There is no United States or E.U. law that requires Google to keep detailed logs of what individuals search for and click on at Google’s search engine. It’s simply dishonest to continually imply otherwise in order to hide the real political and monetary reasons that Google chooses to hang onto this data.

Professor Michael Zimmer, an expert on search engine privacy issues similarly debunked Google's false claims.

Finally, in 2008, the Article 29 Working Party issued an opinion on data retention issues related to search engines, which noted that:
Consequently, any reference to the Data Retention Directive in connection with the storage of server logs generated through the offering of a search engine service is not justified . . . the Working Party does not see a basis for a retention period beyond 6 months. However, the retention of personal data and the corresponding retention period must always be justified (with concrete and relevant arguments) and reduced to a minimum.

As this lengthy summary should have made clear, Eric Schmidt's statements that the company has to retain search data because of EU law are simply bogus.

Wednesday, October 20, 2010

More private data leakage at Facebook

Via an anonymous commenter at the Freedom to Tinker blog, I discovered a recent paper from some researchers at Microsoft Research and the Max Plank Institute, analyzing online behavioral advertising.

The most interesting bit is the following text:

[W]e set up six Facebook profiles to check the impact of sexual-preference: a highly-sensitive personal attribute. Two profiles (male control) are for males interested in females, two (female control) for females interested in males, and one test profile of a male interested in males and one of a female interested in females. The age and location were set to 25 and Washington D.C. respectively.

. . .

Alarmingly, we found ads where the ad text was completely neutral to sexual preference (e.g. for a nursing degree in a medical college in Florida) that was targeted exclusively to gay men. The danger with such ads, unlike the gay bar ad where the target demographic is blatantly obvious, is that the user reading the ad text would have no idea that by clicking it he would reveal to the advertiser both his sexual-preference and a unique identifier (cookie, IP address, or email address if he signs up on the advertiser's site). Furthermore, such deceptive ads are not uncommon; indeed exactly half of the 66 ads shown exclusively to gay men (more than 50 times) during our experiment did not mention "gay" anywhere in the ad text.


This means that simply by clicking on a Facebook ad, a user could be revealing a bit of highly sensitive personal information to an advertiser, simply due to the fact that the advertiser has only targeted a particular group (gender, sexuality, religion) for that advertisement. Thus, the moment you arrive at the advertiser's website, they now know that the IP address and cookie value they have assigned to you is associated with someone that is gay, muslim, or a republican.

While it may be obvious that some advertisements are targeted based on these attributes, such as gay dating sites, this study makes it clear that there are some advertisements where such targeting is not intuitive.

Given the privacy firestorm earlier this week, I have a tough time imagining that Facebook will be able to sweep this under the carpet, or, that class action attorneys won't jump on this.

As I see it, the company has two options:

1. Do not allow advertisers to target advertisements based on sensitive categories, such as religion, sexuality, or political affiliation.

2. Disclose, directly below the ad, the fact that the ad was targeted based on a specific profile attribute, and state there which attribute that was. Users should also be told, after clicking on the ad, but before being directed to the site, that the advertiser may be able to learn this sensitive information about them, simply by visiting the site.

I suspect that neither option is going to be something that Facebook is going to want to embrace.

Sunday, October 17, 2010

It is time for the web browser vendors to embrace privacy by default

Three times over the past six months, web browsers' referrer headers have played a major role in major privacy issues. Much of the attention has reasonably been focused on the websites that were leaking their users' private data (in some cases, unintentionally, but at least in Google's case, intentionally). It may be time to focus a bit of attention on the role that the web browser vendors play, and in the pathetic tools they offer to consumers to control this form of information leakage.

The root of the current focus by privacy advocates on the browser referrer header stems from a paper (pdf download) written two researchers last year, who found that Facebook, MySpace and several other online social networks were leaking the unique IDs of their users to behavioral advertising networks. Furthermore, according to a class action lawsuit filed last week, Facebook actually began to leak even more information to advertisers, including users' names, starting in February of this year. It wasn't until the Wall Street Journal called up MySpace and Facebook for quotes in May, that the two companies quickly rolled out fixes (behold, the power of the media).

One month ago, I filed a complaint with the FTC, arguing that Google intentionally leaks its users' search queries to third parties via browser referrer headers. Unlike the Facebook leakage episode, in which it is generally acknowledged that Facebook didn't know about the leakage, Google has repeatedly gone out of its way to make sure this leakage continues, and has publicly confirmed that it is a feature, not a bug.

Now today, the Wall Street Journal has another blockbuster article on referrer leakage. This time, it is Facebook apps that are leaking Facebook user IDs to third parties, including advertising networks and data aggregators like Rapleaf.

It is certainly reasonable to point the finger at companies like Zynga, whose Farmville game has been confirmed by experts to be leaking users' Facebook IDs. However, as the Electronic Frontier Foundation's Peter Eckersley told the WSJ today, "The thing that is perhaps surprising is how much of a privacy problem referers have turned out to be."

These referrer leakage problems are not going to go away, and depending on hundreds of thousands of different websites and apps to take proactive steps to protect their users' privacy is doomed to failure. As such, we need to look to the web browser vendors to fix this problem, since, after all, it is the web browser that sends the referrer header in the first place.

Referrer headers and the browser vendors

The original HTTP standard, dating from 1996, which defined the core technical standard used by web browsers noted that the referrer header feature had significant potential for privacy problems:
Because the source of a link may be private information or may reveal an otherwise private information source, it is strongly recommended that the user be able to select whether or not the Referer field is sent. For example, a browser client could have a toggle switch for browsing openly/anonymously, which would respectively enable/disable the sending of Referer and From information.

Fast forward 14 years, and only two web browsers, Firefox and Chrome, offer a feature to disable the transmission of the referrer header. Internet Explorer and Safari, which are used by 65% of users on the Internet, include no built in functionality to scrub or otherwise protect this information.

While Firefox and Chrome do include features to disable the referrer header, these features are not enabled by default, and enabling them requires technical knowledge that is beyond the vast majority of users.

For example, Firefox users must first type "about:config" into the location bar, navigate past a very scary warning, and then change an obscure preference from 1 to 2.





Likewise, Chrome requires that users start the browser from the command line with a undocumented parameter (‐no‐referrers):



It is time to embrace privacy by default

Earlier this summer, the European Article 29 Working Party released an extensive report on privacy and behavioral advertising. The report (pdf) called on web browser vendors to play a more important role in protecting users, and to embrace privacy by default. While the Working Party was primarily describing cookie controls, the same message applies to referrer headers:
"Given the importance that browser settings play in ensuring that data subjects effectively give their consent to the storage of cookies and the processing of their information, it seems of paramount importance for browsers to be provided with default privacy-protective settings. In other words, to be provided with the setting of 'non-acceptance and non-transmission of third party cookies'. To complement this and to make it more effective, the browsers should require users to go through a privacy wizard when they first install or update the browser and provide for an easy way of exercising choice during use. The Working Party 29 calls upon browser makers to take urgent action and coordinate with ad network providers."

It is time for the browser vendors to listen to this advice. Had IE, Firefox, Chrome and Safari blocked (or at least partially scrubbed) referring headers by default, the leakage from Facebook that the Wall Street Journal highlighted today would never have occurred.

Thursday, October 07, 2010

My FTC complaint about Google's private search query leakage

Today, the Wall Street Journal published an article about a complaint I submitted to the FTC last month, regarding Google's intentional leakage of individuals' search queries with third party sites.

The complaint is 29 pages long, and so I want to try to explain it to those of you who don't have the time or desire to read through the whole complaint.

The complaint centers around an obscure feature in web browsers, known as the HTTP referrer header. Danny Sullivan, a widely respected search engine industry analyst has written that the http referrer header is "effectively the Caller ID of the internet. It allows web site owners and marketers to know where visitors came from." However, while practically everyone with a telephone knows about the existence of caller ID, as Danny also notes, the existence of the referrer header is "little known to most web surfers."

This header reveals to the websites you visit the URL of the page you were viewing before you visited that site. When you visit a site after clicking on a link in a search engine results page, that site learns the terms you searched for (because Google and the other search engines include your search terms in the URL).



Google does not dispute that it is leaking users search queries to third parties. A Google spokesperson told the Wall Street Journal today that its passing of search-query data to third parties "is a standard practice across all search engines" and that "webmasters use this to see what searches bring visitors to their websites."

Thus, we move on to the main point of my complaint, which is that the company does not disclose this "common practice" to its customers, and in fact, promises its customers that it will not share their search data with others.

For example, of the 49 videos in Google's YouTube privacy channel, not one single video describes referrer headers, or provides users with tips on how to protect themselves from such disclosure. On the other hand, the first video that plays when you visit the privacy channel tells the visitor that "at Google, we make privacy a priority in everything we do." Indeed.

Deceptive statements in Google's privacy policy

Google's current privacy policy states that:

Google only shares personal information with other companies or individuals outside of Google in the following limited circumstances:

* We have your consent. We require opt-in consent for the sharing of any sensitive personal information.

* We provide such information to our subsidiaries, affiliated companies or other trusted businesses or persons for the purpose of processing personal information on our behalf . . .

* We have a good faith belief that access, use, preservation or disclosure of such information is reasonably necessary to (a) satisfy any applicable law, regulation, legal process or enforceable governmental request, (b) enforce applicable Terms of Service, including investigation of potential violations thereof, (c) detect, prevent, or otherwise address fraud, security or technical issues, or (d) protect against harm to the rights, property or safety of Google, its users or the public as required or permitted by law.

The widespread leakage of search queries doesn't appear to fall into these three "limited circumstances." Perhaps Google doesn't consider search query data to be "personal information"? However, at least four years ago, it did. When fighting a much publicized request from the Department of Justice for its customers search queries, the company argued that:
"[S]earch query content can disclose identities and personally identifiable information such as user‐initiated searches for their own social security or credit card numbers, or their mistakenly pasted but revealing text."

Until October 3, the company's privacy policy also included the following statement:
We may share with third parties certain pieces of aggregated, non-personal information, such as the number of users who searched for a particular term, for example, or how many users clicked on a particular advertisement. Such information does not identify you individually.

I don't think that it is possible to reasonably claim that millions of individual search queries associated to particular IP addresses are "aggregated, non-personal information".

Google's customers expect their search queries to stay private

In its brief opposing DOJ's request, Google also argued that it has an obligation to protect the privacy of its customers' search queries:
Google users trust that when they enter a search query into a Google search box, not only will they receive back the most relevant results, but that Google will keep private whatever information users communicate absent a compelling reason . . .

The privacy and anonymity of the service are major factors in the attraction of users – that is, users trust Google to do right by their personal information and to provide them with the best search results. If users believe that the text of their search queries into Google's search engine may become public knowledge, it only logically follows that they will be less likely to use the service."

Matt Cutts, a Google senior engineer argued similarly in an affidavit filed with the court:
"Google does not publicly disclose the searches (sic) queries entered into its search engine. If users believe that the text of their search queries could become public knowledge, they may be less likely to use the search engine for fear of disclosure of their sensitive or private searches for information or websites."

Google already protects some of its users search queries

Since May of this year, Google has offered an encrypted search service, available at encrypted.google.com (in fact, it is the only search engine to currently offer such a service). In addition to protecting users from network snooping, one additional benefit of the service is that it also automatically protects users' query data from leaking via referrer headers.

However, Google has done a poor job of advertising the existence of its encrypted search website, and an even worse job in letting users know about the existence of search query referrer leakage. If users don't know that their queries are being shared with third parties, why would they bother to use the encrypted search service in the first place.

The remedy I seek

If Google wants to share its users' search query data with third parties, there is nothing I can do to stop it. That practice, alone, isn't currently illegal. However, the company should not be permitted to lie about its practices. If it wants to share its customers' search queries with third parties, it should disclose that it is doing so. Even moreso, it shouldn't be able to loudly, and falsely proclaim that it is protecting its users' search data.

However, since the company has for years bragged about the extent to which it protects its customers data, I think that it should be forced to stand by its marketing claims. Thus, I have petitioned the FTC to compel the company to begin scrubbing this data, and to take appropriate steps to inform its existing customers about the fact that it has intentionally shared their historical search data with third parties. This, I think, is the right thing to do.

Tuesday, October 05, 2010

US Marshal Service's Electronic Surveillance Manual

Last week, the FOIA fairy delivered 25 pages of internal rules that outline when and how the US Marshal Service uses electronic surveillance methods. According to the cover letters accompanying the documents, the policies are "obsolete" and that "the office is preparing to rewrite/revise it, which could take 30 days or longer to complete."

The full document can be downloaded here (pdf)

The most interesting things that jumped out to me:

1. One of the most heavily redacted sections relates to the use of trigger fish, or cell site analyzers, which allow the government to locate phones without the assistance of the phone company.


(click for a larger image)


2. The special rules that USMS investigators must follow before wiretapping VIPs such as Members of Congress, Governors and Judges:


(click for a larger image)


3. The revelation that USMS advises investigators to always seek "hybrid" 2703(d) + pen register orders, rather than plain pen register orders when they are investigating a suspect.


(click for a larger image)