Friday, January 21, 2011

The History of the Do Not Track Header

Last month, both the FTC and Commerce Department published privacy reports that mentioned the possibility of a Do Not Track mechanism. Most people, even those who follow privacy issues, didn't really understand how such a mechanism would work, or where the idea came from. The goal of this lengthy blog post is to try and shed a bit of light on that.

The History of Do Not Track

In 2007, several public interest groups, including the World Privacy Forum, CDT and EFF, asked the FTC to create a Do Not Track list for online advertising. In a very savvy move, these groups named their scheme such that it instantly evoked the massively popular Do Not Call list. That is, even if the average person did not know how the Do Not Track list worked, it would sound like a good idea.

The public interest proposal would have required that online advertisers, not consumers, submit their information to the FTC, which would compile a machine readable list of the domain names used by those companies to place cookies or otherwise track consumers. Browser vendors and 3rd party software makers could then subscribe to this list, and effectively block many forms of tracking. It sounded like a great idea, but, it went nowhere, and as the Google Trends chart below shows, it was largely forgotten by the media until 2010.

What happened to bring Do Not Track back to life? FTC Chairman Jon Leibowitz.

On July 27 2010, the Senate Commerce Committee held a hearing on the topic of online privacy. In his oral testimony at the hearing, Leibowitz stated that the commission was exploring the idea of proposing a "do-not-track" list (he appears to have gone off the official script, as the phrase "do not track" does not appear in his formal written remarks.)

Once the concept (even in the abstract) of Do Not Track has been brought back to life, journalists covering the story assumed that it was the public interest groups' proposal that was now actively being considered by policymakers. However, over the space of a few months, a completely different mechanism, one which relies on web browsers sending a header, seemed to gain momentum.

This seems to have caught many in industry and the press off guard. No one knows where the idea came from, or how it managed to displace the previous public interest groups' effort. The purpose of this blog post is to try and clear that up.

Opt Out Cookies

For more than a decade, the major online advertisers have offered "opt out" mechanisms, through which consumers could signal to the companies that they did not want to receive advertisements targeted based on their online browsing habits. These opt outs worked via cookies (one specific to each ad network), which a consumer could either obtain by visiting each advertising network's website, or (if the company was a member of the Network Advertising Initiative (NAI), from the NAI website.

While certainly a step in the right direction when they were first offered, the opt out cookies have numerous flaws, the most important of which, is that as cookies, they are deleted whenever consumers attempts to protect their privacy and erase other tracking cookies. Quite simply, using the built in browser controls, consumers cannot instruct their browser to "keep the opt out cookies, but delete everything else." Consumers thus have to re-obtain these opt out cookies each time they delete their cookies, or, perhaps more likely the case, privacy conscious consumers gave up on the formal opt outs, and instead relied on frequent cookie deletion as a more reliable means to opt out.

In March 2009, Google released a browser add-on that made Google's own behavioral advertising opt out cookie permanent. Thus, with the add-on installed, users could freely delete their cookies whenever they wanted without accidentally removing Google's opt out cookie. While this was a great move on Google's part, there were more than 100 other advertising networks, and so even if Google's opt out cookie persisted, these other opt out cookies would be erased whenever a consumer took steps to protect their privacy.

My Targeted Advertising Cookie Opt-Out (TACO) add-on

A few days after Google released their opt out tool I bumped into security researcher Dan Kaminsky at a conference. I'm afraid I don't remember the specifics of our conversation anymore, but generally, we spoke about flaws in the opt out system, Google's new tool, and possible technical alternatives to cookie based opt outs, including a browser header.

Soon after (and likely inspired by) my conversation with Dan, I downloaded Google's tool (which the company had released under an open source license) and modified it to include the opt out cookies for several other behavioral advertising networks. I published my TACO add-on and within days hundreds of people downloaded and installed it.

A few days later, Dan emailed me, and urged me to include a browser header in TACO -- not because it would have any immediate impact (since no ad network would look for it), but because it would be a clear expression of user intent:
The reality is you can be tracked no matter what you do or don't set. However -- humor me: Just add an "X-No-Track: user-opt-out=explicit" header to all HTTP requests, and add window.tracking-opt-out=explicit to every DOM.

Oh, and put a comment in the source above it, calling it the Holy Hand Grenade :)

Trust me :)
At the time, I dismissed Dan's suggestion. I wanted to build a tool that would actually improve user privacy, and since cookies were the only way for consumers to opt out, I thought my time was best spent improving that experience. However, on the TACO home page, I noted that a header mechanism would be a far superior replacement for opt out cookies:
The use of individual opt-out cookies for each advertising company is sub-optimial (in fact, the current situation totally sucks). We shouldn't have to identify and seek out each company that might track us in order to opt out. This tool currently supports 90 different advertising networks, some of which require multiple cookies (for different domain names). As a result, this tool installs 90+ opt-out cookies into the browser (they're all generic, and contain no unique, or identifiable information). Since there are still quite a few networks that the tool does not support, it is quite easy to see that the tool could eventually install 100 or more cookies in a user's browser. This solution simply does not scale.

In an ideal universe, we would be able to set a single cookie in the browser stating our preference to be not tracked, without needing to first identify individual advertising networks. Consider, after all, the approach taken with the hugely successful do not call list. You add yourself to a single list, which all telemarketers are then required to honor.

However, for privacy reasons, cookies cannot be accessed by websites hosted in domains different than those that set the original cookie. That is, if sets a cookie in your browser, won't be able to read it if you visit their site. For 99% of cookies (such as the session cookies used to authenticate your login to Facebook), this is a really good idea. However, for a universal opt-out cookie, this presents significant problems.

As a result, cookies are the wrong technology for a universal opt-out mechanism.

One alternative approach would be to permit the browser to send an opt-out HTTP header, which it could then transmit to every web server which the user connected. Such a scenario would require that Microsoft, Mozilla, Apple and Google sit down to design such a technical spec. It would also require that the big advertising networks agree to honor such a HTTP header based method for opt-out.
I spent much of the summer of 2009 immersed in the world of online advertising. This included numerous conference calls with attorneys at advertising networks, and evenings spent on the web, locating new advertising networks with opt out cookie I could clone, and add to TACO. This lead to several updates of my increasingly popular tool, which eventually grew to include more than 100 different opt out cookies.

However, it was never my intention to maintain a browser plugin (even a successful one) -- I am a researcher and an activist, and so my goal in creating TACO was primarily to poke the advertising industry in the eye. As such, within weeks of creating TACO, I reached out to the folks at Mozilla, and begged them to take TACO off my hands by building similar functionality into the Firefox browser.

While several individuals at Mozilla were receptive to the idea of TACO (and had installed it onto their own computers), they weren't so in love with the idea of shipping 100 different opt out cookies with their browser, or having to maintain and update the list for new add networks. Quite simply, TACO was an inelegant kludge, and didn't scale. In March of 2009, Mozilla's VP of Engineering Mike Shaver emailed me to state his own preference for a header:
Could we not just standardize/promote a header like X-Tracking-Opt-Out, and ask the tracking groups to honour it? Simpler to specify, simpler to update (the null case, in fact), forward-effective as new ad networks add support, and separated from the implications and implementation of cookies.

The Do Not Track Header

The header approach suffered from a serious chicken and egg problem. No ad network was willing to look for, or respect the header (primarily because no one was sending the header). Likewise, because no one was looking for the header, the browser vendors weren't ready to add support for it to their products.

In July of 2009, I decided to try and solve this problem. My friend and research collaborator Sid Stamm helped me to put together a prototype Firefox add-on that added two headers to outgoing HTTP requests:
X-Behavioral-Ad-Opt-Out: 1
X-Do-Not-Track: 1

The reason I opted for two headers was that many advertising firms' opt outs only stop their use of behavioral data to customize advertising. That is, even after you opt out, they continue to track you. There are a handful of firms though that do promise to no longer track you when you opt out. One big problem is that it is very difficult for consumers to figure out which company is doing what -- since they all use the term opt out.

I assumed that any header-based system would be voluntary, and so by using two different headers, I would be able to play nicely with whatever a firm was willing to do. That is, if a firm currently agreed to opt consumers out of all tracking, then the firm could look for the Do Not Track header, but if the firm refused to provide a tracking opt out, they could at least agree to respect a behavioral advertising opt out header.

In mid July 2009, the Future of Privacy Forum organized a meeting and conference call in which I pitched the header concept to a bunch of industry players, public interest groups, and other interested parties. I was perhaps slightly over-dramatic when I told them that the "day of reckoning was coming", for opt out cookies, and that it was time to embrace a header based mechanism. I told them that I planned to add the headers (enabled by default) to my TACO add-on in a future release, after which, I would be able to argue that hundreds of thousands of consumers were sending this signal that the advertising firms were ignoring.

In the end, none of the advertising firms showed any interest in the header. A couple months later, I started working at the Federal Trade Commission, and ultimately decided against including the header in TACO, as I thought it might rock the boat at my new job.

In mid 2010, when the FTC Chairman breathed life back into the discussion of Do Not Track, the header I had implemented and lobbied for somehow managed to catch the attention of privacy advocates, public interest groups, regulators and even browser vendors. Ultimately, the Behavioral Advertising Opt Out header seems to have been discarded, and instead, focus has shifted to a single header to communicate a user's preference to not be tracked.

The policy of Do Not Track

The technology behind implementing the Do Not Track header is trivially easy - it took Sid Stamm just a few minutes to whip up the first prototype. The far more complex problem relates to the policy questions of what advertising networks do when they receive the header. This is something that is very much still up in the air (particularly since no ad network has agreed to look for or respect the header).

Over the last few months, a number of privacy experts, including Arvind Narayanan and Jonathan Mayer at Stanford, Lee Tien and Peter Eckersley at the Electronic Frontier Foundation, and Harlan Yu at Princeton University have worked to come up with a solid proposal that will help to shape this more complex part of the debate.

If industry (or the FTC, Commerce and Congress) ultimately settle on the header based approach, there will likely be an intense lobbying effort on industry's part to define what firms must do when they receive the header. Specifically, they will seek to retain as much data as possible, even when they receive the header. As such, the devil will be in the details, and unfortunately, these details will likely be lost on many members of Congress and the press.


Anonymous said...

Great write-up Chris. I would love to see these disparate discussions get pulled into a formal meeting on do-not-track in a way that truly addresses consumer concerns and permits online business to flourish. We can make it a topic of discussion for CFP and PETS this year. Let’s come up with a solution that is easy to deploy and validate.


Anonymous said...

How does the Cookie based opt out works? Could you please share some appropriate link(s) to understand more about it.

Anonymous said...

Please supply a link to the "solid proposal" mentioned in your penultimate paragraph

alanthehat said...

Are there any updates to this?
It has been 4 years now