slight paranoia: google

Showing posts with label google. Show all posts

Friday, April 22, 2011

How can US law enforcement agencies access location data stored by Google and Apple?

Note: I am not a lawyer. US privacy law is exceedingly complex. If I am wrong, I hope that someone who knows this better will chime in.

Over the past day, the iPhone location scandal has expanded beyond location data retained on the phone to data sent by iPhones and Android devices back to Apple and Google. This raises some really interesting issues, particularly regarding the degree to which these companies can be compelled to disclose that data to law enforcement agencies. In this blog post, I am going to try and examine the limited legal protections afforded to this data.

Introduction

Today, the Wall Street Journal reported that Apple's iPhones and iPads and Google's Android mobile phones all collect and transmit back to the companies data about a device's nearby WiFi access points, geo-location data, and in Google's case, a unique identifier.

According to the Journal, Android phones collect the data every few seconds and transmit it to the company at least several times an hour. Apple, meanwhile, "intermittently" collects data and transmits that data to itself every 12 hours.

The motivation for this data collection appears to be in order to create a large database of WiFi access points and their associated location, which can then be used by mobile devices to determine the user's approximate location information (doing so via WiFi uses far less battery power than using the GPS chip).

While such collection is likely entirely commercial in nature, this also raises serious privacy concerns regarding the ease with which law enforcement agencies can access this sensitive data.

A quick primer in location privacy law

The primary law in the US that governs the privacy of information kept by Internet and communications companies is the Electronic Communications Privacy Act (ECPA). This law dates back to 1986, long before cloud computing, email inboxes larger than 5 megabytes, or GPS enabled smartphones. To be quite blunt, the law is hopelessly out of date, and it is for this reason that the House and Senate held multiple hearings over the last two years focused on ECPA reform.

For user data to be protected by ECPA, it needs to fall into one of two categories:

An "electronic communication service" ("ECS") is "any service which provides to users thereof the ability to send or receive wire or electronic communications." Examples of this include telephone email services.

A "remote computing service" ("RCS") is a "provision to the public of computer storage or processing services by means of an electronic communications system." Roughly speaking, a remote computing service is provided by an off-site computer that stores or processes data for a user. Examples of this likely include data stored in the cloud, such as online backup services.

ECPA provides varying degrees of protections for communications content and non-content data stored by an ECS or RCS (without going too far into the details, communications content generally required a warrant, and most non-content data can be obtained with a lesser court order). However, if the service is neither an ECS, nor an RCS, law enforcement agencies can obtain the information with a mere subpoena, without getting a judge to sign off on the order.

Location data under ECPA

Law enforcement agencies routinely obtain location data from wireless telephone companies. Depending on the kind of data sought (historical or real time, fine-grained or approximate tower data), the kind of court order varies between a probable cause warrant, or an order based upon facts showing that the information will be relevant and material to an ongoing investigation.

It is important to note that the wireless carriers are providing their customers with a communications service, and that the location data is usually generated in the process of the users' phone transmitting voice or other data to a tower. While most consumers probably do not realize that the phone companies know where they are whenever they make a call or check their email, consumers are at least knowingly making a call or checking their email. As such, the location data obtained by the government quite clearly falls into the ECS category under ECPA.

Internet companies, location data and ECPA

In 2009, Google launched Latitude, its mobile location check-in competitor to Loopt and Foursquare. Shortly after the launch, the EFF reported that both Loopt and Google had pledged to require that user location data would only be delivered to law enforcement agencies in response to a warrant.

As EFF explained at the time:

When it comes to friend-finding services, we think it’s clear that your location information is the content of a private communication between you and your friends, and that it deserves the same legal protections against wiretapping as the content of your phone calls or your emails.

Because the text of ECPA doesn't actually include the word "location", Loopt and Google tried to get the best protections they could for users' check-in data by arguing that it is in fact a communication transmitted through their service to users' friends. That is, these firms argued that check-in location data is is an ECS.

(Note to legal experts: I am simplifying this a little bit, since these companies actually insisted on a wiretap order. The companies don't keep any historical location data by default, other than the most recent data-point, so they insisted on an intercept order before they would start retaining future location data).

iPhone/Android location data: ECS, RCS or neither?

Now, with this in mind, lets consider the location data transmitted covertly by iPhones and Android devices. Given that the existence of this information collection and transmission wasn't widely disclosed to users (other than in privacy policies that no one reads), that it didn't hit the press until this week, and that users are not knowingly transmitting the information to their friends or anyone else, I think it is going to be pretty tough for these two firms to be able to claim that this location data falls into the ECS protections of ECPA. This location data is simply not a communication by the user.

Similarly, I don't think that these companies can reasonably claim that this location data falls into the category of an RCS, since it isn't a storage or processing service provided to the user. Quite simply, the companies are collecting this data for their own benefit, not the user's, who probably has no idea that it is being collected and transmitted to a server somewhere.

What this means, I think, is that this location data likely does not fall under the protections of ECPA, which means that law enforcement agencies can likely obtain it with just a subpoena.

Now, it is quite possible that if and when these firms receive a request for this data, they could refuse to comply with the subpoena, and argue that it should be subject to the protections of the 4th Amendment. Certainly, some judges around the country have decided that mobile phone location data is sensitive enough to require a probable cause warrant issued by a judge. However, many other judges do not agree with that theory. Without the protections of ECPA, if the courts do not think this data deserves 4th amendment protections, there is nothing to stop law enforcement agencies from getting it with a subpoena.

Conclusion

What should be clear after reading this post is that privacy law in this country is hopelessly out of date. The collection of location information by Apple and Google raises some really troubling questions regarding the degree to which existing law restricts law enforcement access to the data when it is not associated with a communication by the user, but rather, is collected without their knowledge or consent.

As I noted at the beginning of this post, I am not a legal expert (but a computer scientist by training). There are several fantastic privacy law experts out there, and I really hope that they look into this issue, and write their own, far more extensive analysis.

Wednesday, January 19, 2011

Google: Iranian Internet users deserve communications security -- Americans, not so much

From The Guardian today:

Google Earth, Picasa and Chrome will be available for download in Iran for the first time from today after the technology firm was granted a communications trade licence by the US government.

...

[Scott Rubin, Google's head of public policy and communications for Europe, Middle East and Africa] said Google had decided not to make downloads of Google Talk available in Iran because it may have security implications if dissidents used it to communicate. "We're not confident with the security we could provide to keep those conversations private," he said. "Any government that wants to might be able to get into those conversations, and we wouldn't want to provide a tool with the illusion of privacy if it wasn't completely secure."

I am actually quite pleased to see Google acknowledging 1. That it is often very dangerous to offer insecure tools that users might mistakenly believe are in fact secure, and 2. That government agencies can easily monitor the communications of users using insecure tools.

The problem of course, is that Google Talk is widely used by Google's millions of customers in the United States, Europe, Asia and the Middle east, all of whom are at risk of government surveillance.

Here in the United States, the Federal government for years abused its surveillance powers to spy on the phone calls and Internet communications of US citizens without ever seeking a court order. The FBI has abused its National Security Letter powers that were expanded under the USA Patriot Act, and for years, the agency even embedded phone company employees at its offices, who repeatedly disclosed user data in response to requests submitted on post it notes.

All this begs the question: Why is Google more concerned about the privacy of Iranian users than those millions of Google users in the United States?

Google is a US company, is subject to US law, and must disclose communications to the government when law enforcement and intelligence agencies follow the appropriate legal process. As such, no one expects Google to refuse to comply with the law (especially, as Eric Schmidt has acknowledged, the government has guns, and Google does't).

What would be nice though, would be if Google was equally as committed to not giving its US customers the illusion of security and privacy, when, as the firm has acknowledged here, its Google Talk product is simply not capable of delivering anything approaching reasonable security.

Monday, November 22, 2010

DOJ has granted itself new surveillance powers

Update @ 8PM 11/22/2010: EFF first sounded the alarm about roving 2703(d) orders back in 2005, which were being used to obtain phone information.

Electronic communications privacy law in the United States is hopelessly out of date. As several privacy groups have noted, the statute that governs when and how law enforcement agencies can obtain individual's private files and electronic documents hasn't really been updated since it was first written in 1986.

Over the past year, privacy groups, academics and many companies have gotten together to push for reform of the Electronic Communications Privacy Act (ECPA). These stakeholders have lobbied for reform of this law, and in turn, both the House and Senate have held hearings on various issues, ranging from cloud computing to cellular location data.

Of course, complaints about the existing statute are not limited to those wishing to protect user privacy -- law enforcement agencies would very much like to expand their authority. However, as I document in this blog post, rather than going to Congress to ask for new surveillance powers, the Department of Justice, and in particular, the US Marshals Service, have simply created for themselves a new "roving" order for stored communications records.

Let that sink in for a second. Rather than wait for Congress to give it new authority, the Department of Justice has instead just given itself broad new surveillance powers.

Roving Wiretaps

For nearly 15 years, law enforcement agencies have had "roving wiretap" authority, meaning that they can get a court order that does not name a specific telephone line or e-mail account but allows them to wiretap any phone line, cell phone, or Internet connection that a suspect uses. In order to use this expanded authority, prosecutors have to show probable cause that they believe that the individual under investigation is avoiding intercepts at a particular place.

Although there are more than 2000 wiretap orders issued each year, as the table below reveals, federal and local law enforcement agencies rarely seek to use this roving authority.

Roving Pen Registers and Trap & Trace orders

While wiretap orders are used for the real-time interception of communications content, pen register and trap & trace orders are used to intercept, in real-time, non-content information associated with communications. This includes the numbers dialed, to/from addresses associated with emails, etc.

Traditionally, like wiretap orders, pen register/trap & trace orders had to name the recipient (phone company or ISP) in the order. If the government wished to go to a different ISP, they'd need to return to the judge to get another order. However, the USA PATRIOT act expanded the scope of pen register and trap & trace orders, essentially turning them into roving orders by default:

The [pen register] order . . . shall apply to any person or entity providing wire or electronic communication service in the United States whose assistance may facilitate the execution of the order.

Whenever such an order is served on any person or entity not specifically named in the order, upon request of such person or entity, the attorney for the Government or law enforcement or investigative officer that is serving the order shall provide written or electronic certification that the order applies to the person or entity being served.

Thus, post PATRIOT Act, by using a wiretap or pen register authority, law enforcement agencies can use a single court order to obtain real-time non-content data from any 3rd party that may have it, even if the service provider was not named in the original court order.

Stored communications and customer records

The vast majority of surveillance requests are not for real-time data, but for historical information. That is, rather than seeking to intercept emails or web browsing activities as they are transmitted, law enforcement agencies often seek information after the fact. This is both easier, and often much cheaper.

For example, existing surveillance reports reveal that 1773 wiretap orders were issued in 2005, 625 of which were for federal agencies. Similarly in 2005, a total of 6790 pen registers and 4393 trap & trace orders were obtained by law enforcement agencies within the Department of Justice (the FBI, DEA, ATF and the Marshals).

In that same year, Verizon received 36,000 requests for customer information from federal law enforcement agencies and 54,000 requests from state and local law enforcement agencies.

That is, Verizon's requests alone dwarf the number of publicly reported wiretaps and pen registers, by nearly 700%. This doesn't mean that the wiretap numbers are incorrect -- merely that the vast majority of requests that Verizon received were for stored records, such as historical information on the phone numbers its customers dialed, old text messages, and stored emails. It is quite reasonable to assume that other major telecommunications carriers receive a similar number of requests.

2703(d) orders

Federal law requires that law enforcement agencies first obtain a special court order (known as a 2703(d) order) before they can compel third party service providers to deliver many types of stored user non-content data. Such court orders must name the service provider that has the data, and unlike in the case of wiretaps and pen registers, Congress has not granted roving authority to law enforcement agencies. This means that law enforcement agencies are supposed to obtain a 2703(d) order naming each ISP or phone company that has data that the government would like to get.

Roving 2703(d) orders

Updated at 8PM on 11/22/2010 to give credit to EFF for first discovering roving d orders

In 2005, the Electronic Frontier Foundation filed a brief in federal court, objecting to a request by the Department of Justice for an order requiring "relevant service providers… to provide subscriber information about [all] numbers obtained from the use of… pen/trap devices" upon oral or written demand by relevant law enforcement officials.

Section 2703 of 18 USC provides that:

"a governmental entity may require a provider of electronic communications service…to disclose a record or other information pertaining to a subscriber or customer of such service…only when the government… obtains a court order for such disclosure under subsection (d) of this section."

As the EFF told the court:

"This language [in 2703] clearly contemplates orders that require disclosure of particular records regarding particular customers of particular providers, not general orders that the government can use on its own discretion to continuously demand unspecified records about unspecified people from unspecified providers, for the entire duration of a related pen-trap surveillance.

. . .

The Stored Communications Act simply does not authorize open-ended or "roving" orders that are enforced based on the government’s oral or written representations of its pen-trap results. Indeed, such orders would leave the government in a dangerously unchecked position to obtain subscriber information for any telephone number without court oversight or approval."

The EFF's 2005 brief objected to the government's attempts to get roving 2703(d) orders for subscriber records from phone companies. It seems that the government has since expanded its use of these roving 2703(d) orders to email providers.

I recently obtained a copy of the US Marshals Electronic Surveillance Manual, which I obtained through a Freedom of Information Act (FOIA) request. As I highlighted in a previous blog post, that handbook reveals that the US Marshals have adopted a policy of always obtaining a 2703(d) order whenever they seek a pen register.

The surveillance manual lists several advantages to obtaining such "hybrid" 2703(d)/pen register orders - such as the ability to get geo-location data from providers, who are prohibited by law from revealing "any information that may disclose the physical location of the subscriber" in response to a pen register order. It is not until a few paragraphs later, when another advantage of the hybrid order (and its limitations) is hinted at.

What is happening here is a bit complex. In essence, federal surveillance law does not permit for roving 2703(d) orders, but it does permit for roving pen register authority. Therefore, DOJ believes that when it staples together a pen register order and a 2703(d) order, that the roving aspect of the pen register order automatically transfers to the 2703(d) order.

Thus, DOJ believes that law enforcement agencies can send a copy of a hybrid 2703(d)/pen register order to ISPs not named in the order, and force them to disclose stored subscriber records and communications non-content data, such as email headers.

DOJ's reason for doing this, at least according to the Marshals' surveillance manual, is "because we say so":

Although compelling compliance with a Pen/Trap order that also required disclosure of stored records (e.g. subscriber) is unclear under this section, investigators should assert that compliance with the entire order is mandatory irrespective of whether a provider is specifically named in the order.

Again -- even though the law does not grant the government this expanded authority, DOJ urges investigators to still assert that that companies must comply with the request.

DOJ is using this authority

Nearly a year ago, I obtained an invoice from Google to the US Marshals Service related to a pen register order from December 2007.

The invoice states that:

"We understand that you have requested customer information regarding the user account specified in the Pen Register/Trap Trace, which includes the following information: (1) Subscriber information for the gmail account [redacted]@gmail.com; (2) Information regarding session timestamps and originating IP addresses for recent logins by this account; and a CD containing (3) Header information for the specified date range."

The phrasing of this text reveals that the Marshals first delivered the pen register order to a different ISP, and that the gmail.com account appeared in the data delivered by that other service provider in response to the pen register request. As such, neither Google nor the particular gmail.com address were named in the original pen register order issued by the judge.

Google likely received a hybrid 2703/pen register order from the US Marshals Service, and, even though the company was not named in the original order, it provided historical, stored non-content data and subscriber information to law enforcement officials. The company could very easily have told the Marshals to get lost, and come back with a 2703(d) order signed by a judge, naming Google.

I'm not sure what is more alarming, that the US government abuses its already broad surveillance powers, or that Google, a company that pledges to "be a responsible steward of the information we hold" is not in fact insisting that law enforcement agencies follow the letter of the law.

Monday, October 25, 2010

Eric Schmidt's blames the EU for Google's data retention policies

Google CEO Eric Schmidt was interviewed by CNN this past week.

The most interesting bit of the interview is at the beginning:

Schmidt: We keep the searches that you do for roughly a year/year and a half, and then we forget them.

Question: You say that, but can somebody come to you and say that we need information about Kathleen Parker.

Schmidt: Under a federal court order, properly delivered to us, we might be forced do that, but otherwise no.

Question: Does that happen very often?

Schmidt: Very rarely and if its not formally delivered, then we'll fight it.

Question: You say you keep stuff for a year/year and a half. Who decides?

Schmidt: Well, in fact, the European Government passed a set of laws that require us to keep it for a certain amount, and the reason is that the public safety sometimes wants to be able to look at that information.

Somehow in just a few sentences, Schmidt manages to misrepresent the facts several times (the question of if Schmidt is merely misinformed, or actively lying is left as an exercise for the reader).

First, on the subject of retention, it is completely false to say that after a year/year and a half, Google "forgets" searches.

Google's actual data retention policy is that after 9 months, the company deletes the last octet of users' IP addresses from its search logs, and then modifies the cookie in the logs with a one-way cryptographic hash after 18 months.

The company never deletes or "forgets" users' searches. It merely deletes a little bit of data that associates the searches to known Google users.

For those of you who may be inclined to give Schmidt the benefit of the doubt, regarding the difference between "forgetting" searches, and deleting a couple bits of an IP address in a log, remember that Schmidt has a PhD in computer science.

Google searches and the EU data retention directive

In May of 2007, Google's Global Privacy Counsel claimed that the European Data Retention Directive might apply to search engines.

Google may be subject to the EU Data Retention Directive, which was passed last year, in the wake of the Madrid and London terrorist bombings, to help law enforcement in the investigation and prosecution of "serious crime". The Directive requires all EU Member States to pass data retention laws by 2009 with retention for periods between 6 and 24 months. Since these laws do not yet exist, and are only now being proposed and debated, it is too early to know the final retention time periods, the jurisdictional impact, and the scope of applicability. It's therefore too early to state whether such laws would apply to particular Google services, and if so, which ones. In the U.S., the Department of Justice and others have similarly called for 24-month data retention laws.

One week later, the European Article 29 working party wrote a letter to Google, informing the company that:

As you are aware, server logs are information that can be linked to an identified or identifiable natural person and can, therefore, be considered personal data in the meaning of Data Protection Directive 95/46/EC. For that reason their collection and storage must respect data protection rules.

A month later, Google's Global Privacy Counsel replied to the Working Party:

Because Google may be subject to the requirements of the [Retention] Directive in some Member States, under the principle of legality, we have no choice but to be prepared to retain log server data for up to 24 months

Soon after, the European Commission's Data Protection Unit issued a statement to the media, stating that:

The Data Retention Directive applies only to providers of publicly available electronic communications services or of public communication networks and not to search engine systems . . . Accordingly, Google is not subject to this Directive as far as it concerns the search engine part of its applications and has no obligations thereof

Speaking of Google's claims, Ryan Singel of Wired News wrote that:

"It's a convincing argument, but it’s a misleading one. . . [Google's Global Privacy Counsel] Fleischer has been making this argument for months now, and even Threat Level bought it the first go-round. But let’s reiterate: There is no United States or E.U. law that requires Google to keep detailed logs of what individuals search for and click on at Google’s search engine. It’s simply dishonest to continually imply otherwise in order to hide the real political and monetary reasons that Google chooses to hang onto this data.

Professor Michael Zimmer, an expert on search engine privacy issues similarly debunked Google's false claims.

Finally, in 2008, the Article 29 Working Party issued an opinion on data retention issues related to search engines, which noted that:

Consequently, any reference to the Data Retention Directive in connection with the storage of server logs generated through the offering of a search engine service is not justified . . . the Working Party does not see a basis for a retention period beyond 6 months. However, the retention of personal data and the corresponding retention period must always be justified (with concrete and relevant arguments) and reduced to a minimum.

As this lengthy summary should have made clear, Eric Schmidt's statements that the company has to retain search data because of EU law are simply bogus.

Wednesday, August 12, 2009

Google's commitment to transparency

From Google's Privacy Page:

"At Google, we’re committed to transparency and choice."

From a February 2009 post to the Official Google Blog by Jonathan Rosenberg, Senior Vice President of Product Management:

"Everyone should be able to defend arguments with data ... Information transparency helps people decide who is right and who is wrong and to determine who is telling the truth ... This is why President Obama's promise to "do our business in the light of day" is important, because transparency empowers the populace and demands accountability as its immediate offspring."

From the February 2009 contract signed between Google and the US General Services Administration, enabling government agencies to use YouTube videos on their web sites:

Confidentiality

The parties shall not disclose to any third parties Confidential Information disclosed by one party to the other under this Agreement. Each party shall protect Confidential Information by applying the same degree of care used by the parties to protect their own confidential information. If any Confidential Information is required to be produced by law, the noticed party will promptly notify the other party, and to the extent allowed by law, cooperate to obtain an appropriate protective order prior to disclosing any confidential information. Both parties agree that, notwithstanding any other provision of this Agreement, Provider may be bound by the Freedom of Information Act, as well as other federal laws and regulations that may require disclosure of information, including disclosure of the fact that an agreement is in place between the parties. Provider agrees that any disclosure of information pursuant to the Freedom of Information Act or other law, regulation or compulsory process requiring disclosure will not, to the extent lawfully permitted, include any Confidential Information. Any required disclosure by Provider of documents that may contain Google Confidential Information will be preceded by notice to Google in accordance with applicable law, regulation and policy including 5 USC 552 and applicable agency rules.

....

Provider acknowledges that, except as expressly set forth in this Agreement, Google uses persistent cookies in connection with the YouTube Video Player. To the extent any rules or guidelines exist prohibiting the use of persistent cookies in connection with Provider Content applies to Google, Provider expressly waives those rules or guidelines as they may apply to Google.

Friday, July 17, 2009

More Mistruths from Google on Privacy

When it comes to discussing the details of the company's privacy policies, Google is rarely forthcoming. Company statements, while technically truthful, are usually very deceptive to all but the expert reader. This allows Google to say one thing, while meaning another.

A fantastic example of this can be seen in statements made during a recent newspaper interview by Marissa Mayer, Google's vice president of search products and user experience:

"When you look at, for instance, search history, which is what personalised search is based on, you can actually see all of the information that Google has about you and you can understand how it's being deployed and you also can decide to opt out of the service entirely, or you can even delete various parts of the data that you don't like or you'd rather we didn't have. So there's a lot of transparency and control available to the user there, and we want to operate with a lot of transparency, because we want our users to be informed about what's going on."

The casual reader might see Mayer's comments, and wrongly believe that they can log in to the Web History page on Google's site, delete the information on their previous searches, causing the information to be deleted from Google's various log files, and thus protect their data from a subpoena submitted by a government investigator, the entertainment industry or divorce lawyer. Anyone believing this is, unfortunately, dead wrong.

Consider this snippet from the Frequently Asked Questions page for the Google Web History service:

You can choose to stop storing your web activity in Web History either temporarily or permanently, or remove items, as described in Web History Help. If you remove items, they will be removed from the service and will not be used to improve your search experience. As is common practice in the industry, Google also maintains a separate logs system for auditing purposes and to help us improve the quality of our services for users. For example, we use this information to audit our ads systems, understand which features are most popular to users, improve the quality of our search results, and help us combat vulnerabilities such as denial of service attacks.

As this page makes clear, Google does not promise to delete all copies of your old search records when you delete them using the Web History feature. No, the company will merely no longer show them to you, and will no longer use that information to provide customized search.

I'm sure this was an honest mistake on Mayer's part, right? As the company's vice president of search products and user experience, its not like she should actually be expected to understand the fine grained details of the company's policies for search and user privacy.

A pattern of deception

Unfortunately, Mayer's misstatement of the facts is not the first time that Google has given misleading statements to the press about its privacy policies.

Last September, Google announced to the world that:

Today, we're announcing a new logs retention policy: we'll anonymize IP addresses on our server logs after 9 months. We're significantly shortening our previous 18-month retention policy to address regulatory concerns and to take another step to improve privacy for our users.

The usually fantastic Ellen Nakashima at the Washington Post was the first to announce the news via an exclusive interview with Google Privacy Czar Jane Horvath. Unfortunately, Nakashima allowed her article to be used as a tool of the Google politburo.

[Horvath] said Google also would anonymize the IP addresses associated with search queries typed in by users into Google's standard search bar nine months after they have been collected. "This really just illustrates how seriously we do take data anonymization,"

Miguel Helft at the New York Times didn't do much better.

It wasn't until I took the initative to contact Google's PR team a few days later with a series of in-depth technical questions about the specifics of the policy that the the truth emerged

Writing at CNET, I revealed that:

Google announced on Monday that the company will be reducing the amount of time that it will keep sensitive, identifying log data on its search engine customers. To the naive reader, the announcement seems like a clear win for privacy. However, with a bit of careful analysis, it's possible to see that this is little more than snake oil, designed to look good for the newspapers, without delivering real benefits to end users....

Google has now revealed that it will change "some" of the bits of the IP address after 9 months, but less than the eight bits that it masks after the full 18 months. Thus, instead of Google's customers being able to hide among 254 other Internet users, perhaps they'll be able to hide among 64, or 127 other possible IP addresses .... this is a laughable level of anonymity.

Once I pointed out how useless Google's new privacy policy actually was, the tech press soon jumped onboard. The Register called it "Google Privacy Theatre", while ZDNet called it a "farce." Robert X. Cringley, wrote that "the announcement was designed to make headlines and appease regulators while doing nothing to release Google's stranglehold on your data."

Google and the Press

In this instance, Google was technically telling the truth. After all, at 9 months, the company does delete some information from their logs. It just happens that the act of deleting one or two bits of data does almost nothing to protect user privacy, and to describe it as "anonymity" is arguably false and deceptive advertising.

Unfortunately, most of the folks in the tech press are simply not up to the task of reading between the lines of Google's privacy doublespeak -- doing so usually requires the rare combination of expertise in the law as well as strong technical skills.

The true meaning of opt-outs

Don't worry though -- all is not lost. When government officials and regulators turn their gaze upon Google, they are often able to cut through the propaganda, and get to the truth. For some reason, Google seems far less able to lie to the Feds.

A fantastic example of this can be seen in the video clip embedded below, which is from the Behavioral Advertising hearing in the House of Representatives one month ago. Rep. Bobby Rush gets execs from both Google and Yahoo to admit that the companies do not allow consumers to opt out of the collection of data, but merely the use of that data. This is something that most firms are loathe to admit in public, and instead leave the consumer hopelessly trying to read between the lines of their multi-page privacy policies.

Tuesday, June 16, 2009

An open letter to Google

This six page letter (pdf) to Google's CEO, Eric Schmidt, is signed by 38 researchers and academics in the fields of computer science, information security and privacy law. Together, they ask Google to honor the important privacy promises it has made to its customers and protect users' communications from theft and snooping by enabling industry standard transport encryption technology (HTTPS) for Google Mail, Docs, and Calendar.

Google already uses industry-standard Hypertext Transfer Protocol Secure (HTTPS) encryption technology to protect customers' login information. However, encryption is not enabled by default to protect other information transmitted by users of Google Mail, Docs or Calendar. As a result, Google customers who compose email, documents, spreadsheets, presentations and calendar plans from a public connection (such as open wireless networks in coffee shops, libraries, and schools) face a very real risk of data theft and snooping, even by unsophisticated attackers. Tools to steal information are widely available on the Internet.

Google supports HTTPS encryption for the entire Gmail, Docs or Calendar session. However, this is disabled by default, and the configuration option controlling this security mechanism is not easy to discover. Few users know the risks they face when logging into Google's Web applications from an unsecured network, and Google.s existing efforts are little help.

Support for HTTPS is built into every Web browser and is widely used in the finance and health industries to protect consumers. sensitive information. Google even uses HTTPS encryption, enabled by default, to protect customers using Google Voice, Health, AdSense and Adwords. Google should now extend this degree of protection to users of Gmail, Docs and Calendar.

Rather than forcing its customers to "opt-in" to adequate security, Google should make security and privacy the default.

View the full letter at cloudprivacy.net

Friday, March 13, 2009

Freedom from evil cookies

Executive Summary: I've modified Google's new Advertising Cookie Opt Out Firefox Plugin to allow users to opt-out of the tracking by 16 other advertising companies. The software is super alpha right now (the result of a few hours hacking this afternoon), and will hopefully be available on addons.mozilla.org in the next few days. If you're not a developer, please don't download it yet. If you are, you can find it here

A large number of commercial companies now track users' browsing across the web, in order to profile them, and then serve them targeted advertising. This so called behavioral advertising is a threat to the average user's privacy.

An industry group, The Network Advertising Initiative, provides an easy way for users to opt-out of the tracking performed by its member companies. Users can visit a single web page, and then easily set opt-out web cookies for all of the NAI members advertising networks.

The problem with this is that the moment a user clears his or her cookies, they also lose the opt-out cookies. Regularly clearing browser cookies, or better, setting the browser to erase them all at the end of a session, is a recommended practice. Unfortunately, by doing this, users are then required to re-visit the NAI opt-out page each time they start browsing the web. This is obviously not a reasonable thing to expect.

Google recently announced that it would be engaging in the large scale collection and use of targeted advertising information. However, in addition to offering an opt-out cookie, the company has also developed a Firefox add-on, so that users can maintain the opt-out cookies, even if they regularly erase the other cookies.

Google should be commended for releasing such a useful privacy enhancing technology (even though their use of targeted advertising is creepy, and should be prohibited by the FTC). If only this add-on could be used to protect people from the prying eyes of the other advertising networks.

Since Google released the Firefox-addon as an open-source project (under the Apache 2.0 license), I have forked the code, and added in the opt-out cookies of 16 other advertising networks.

By installing this add-on, you will receive long-term opt-out cookies for the following NAI member advertising networks:

Google / Doubleclick

Collective Media

Acerno

Turn

Next Action

Audience Science

BlueLithium

Advertising.com

[x+1]

Fox Audience Network

AlmondNet

Safecount

Tacoda Audience Networks

Traffic Marketplace

Tribal Fusion

Undertone Networks

The Bad News

All of the above companies use a cookie similar to "OPT_OUT=1". Unfortunately, some other NAI member companies force a unique tracking ID upon users in the process of opting out of the targeted ad tracking. That is, in addition to an "OPT_OUT=1", they'll also force a "USER=12345678" cookie, which could enable them to uniquely track visitors to their site.

For example, when trying to opt out of Yahoo's tracking, I was given the cookie

B=c97l3894rlqpf&b=3&s=cf.

Similarly, Akamai gave me this cookie

AOOC=368398094.

Simply put, we shouldn't have to trust these companies to not track us. Users should not be given unique IDs in order to opt-out.

The following companies force unique IDs upon users wishing to opt-out. This add-on does not currently provide opt-out functionality for these networks, since I don't want to encourage their sketchy ways. Hopefully, being listed here might shame them into providing a more pro-privacy way of opting out.

These companies are:

Akamai

Atlas

Blue Kai

BlueLithium

FetchBack

Interclick

MindSet Media

Media 6 Degrees

24/7 Real Media

Specific Media

Yahoo

Disclaimer: This code is based on the Advertising Cookie Opt Out Plugin by Valentin Gheorghita, a Google Engineer. It was not sanctioned by Google, the Network Advertising Initiative. While the folks at the Berkman Center (who pay me) are huge supporters of privacy, I have done this in my personal capacity, and this is not an official blessed Berkman project.

Thursday, May 10, 2007

Lessons learned from the patent filing process (three times bitten)

As of the end of last summer, I had three patent applications in the works. I searched the european patent database today, and it seems that my very first application is now public, and available online.

The name is typical lawyer-speak: Method or apparatus for managing a server process in a computer system. The paperwork was filled out in the final weeks of my internship with IBM Research (Switzerland) in August of 2004. It was made public by the European patent office at the end of February 2007.

I am quite proud of this idea, although, the circumstances that followed proved to be a fairly negative, yet extremely useful learning experience. The full description can be seen online. Essentially, the idea involves having a number of short-lived virtual machines each running on one physical machine. Incoming connections are routed to a new virtual machine every few minutes, and thus, the existing virtual machines can be brought down and restarted from a 'known' good state. In addition to making it really difficult for a hacker to leave a machine in a permanently hacked state, it also accelerates the rate at which forensic information can be gathered, and potentially, automatically reacted to.

However, the purpose of this blog post is not to discuss the technical details of the patent. It is to describe the problems that came after the patent filing. This problem was not unqiue to IBM, and thus the very same thing happened to me two years later at Google. I would like to say that I've finally learned. It's also quite reasonable to argue that these two experiences have done much to shape my current ill-feeling towards the patent system.

While the details are slightly different, my two internship patent filing experiences at both IBM Research and Google can essentially be generalized to the following:

1. Come up with cool idea while an intern at company x.
2. Tell my boss/the legal department, and begin the process of filing an invention disclosure.
3. Begin writing code that implements the idea.
4. Run out of time on the internship.
5. Am told by bosses/company lawyers: Here is your $1000 (give or take) patent filing bonus. Thank you very much. The idea is now ours, you are forbidden from telling anyone about the idea, you are forbidden from writing any/completing any code that implements the idea, and you may not publish a research paper on the subject.

Now that my patent application from IBM is public, I suppose that I can finally finish implementing it, should I wish to do so, but arguably, the idea is stale now (it's 3 years old), and the field has moved on.

The two ideas that I came up with at Google were the same. For one, I had a good chunk of demo code that at least fleshed out the idea. It would have made a perfect research paper, but my several attempts to get permission to work on the idea post-internship have gotten me nowhere.

The moral of the story, I suppose, is to be careful about disclosing inventions.

Interns really don't have any room to negotiate an NDA. Thus, the only real power you have, is over your choice to disclose inventions or not. Sure, you get a cash bonus, but in the grand scheme of things, it's chump change - esp. if it means that you won't be able to publish a research paper on the subject. After all - research papers are the currency of academia. And were it not for my desire to be an academic, I'd be making real money now, instead of being $100 per month shy of food stamp eligibility.

Saturday, February 03, 2007

Avoiding the NSA through gmail

I've been thinking a fair bit about the EFF's lawsuit against AT&T. According to court papers and press reports, AT&T is giving the NSA a direct network tap at multiple locations around the country, giving the US government access to all unencrypted email/IM conversations and web traffic that flow through AT&T's network. It's probably fair to assume that a few other backbone providers are also doing the same thing.

Consider the following situation:

Alice sends an email from her home computer (connected via Verizon DSL Connection) to her friend Bob, who checks his email from his desktop computer at work. Alice uses Hotmail, and Bob uses his company's email servers.

Alice's web connection to hotmail will most likely flow across AT&T's backbone, and if it doesn't, it'll cross one of the other Big Boys, like Level 3. Once Alice has created her email, it'll flow from Microsoft's email servers to Bob's employer's email server - unencrypted, again, probably over one of the major backbones, until it reaches Bob's desk.

There will be at least a couple chances for the NSA to sniff this.

What if Alice sends an email to her pal Charlie, who also uses hotmail?

Well, again, the spooks will have a chance to watch Alice construct the email, and then will be able to see Charlie login to hotmail and read it. Key to note here, is that since the email stays within Hotmail's network, it never has to flow across the Internet to go from Alice to Charlie.

Which brings me to the subject of gmail.

Google is nice enough to allow SSL encrypted sessions. Whereas Yahoo and Hotmail merely allow you to login via SSL (just to stop a passive network sniffer learning your email password), google allows the entire session to remain encrypted. Thus, any interaction between a user at their home computer, and Google's gmail servers remains secret, providing the user changes the url to be https://

Let us now consider a situation where Alice and Charlie each have gmail accounts, and each login via ssl. Alice's connection to google is encrypted, the email flows from one gmail user to another, so it never leaves google's network as it is transmitted from Alice's outbox to Charlie's inbox, and then Charlie's connection to Google is SSL encrypted, so the contents of his email is not revealed to anyone watching his packets cross the backbone.

Right now, very few of gmail's users are using SSL. It us turned off by default (mainly for performance reasons, I'm guessing. 10 million users all requiring an SSL handshake is expensive in processing power).

As gmail's user base grows, and if their users can be convinced to embrace SSL, the NSA's wholesale data slurping from the backbone will increasingly become less useful.

"If we all use encrypted email (PGP/GPG), we won't have this problem" - this is the very true. However, I cannot convince my less technically savvy friends/relatives to use PGP. It has far too many usability problems - still.

However, most of my friends already use gmail - due to the way accounts were given out in the early days, gmail has a very geeky user base. All I need to do now, is to convince them to use SSL... Which is where the Customize Google firefox extension comes in useful.

Customize Google is mainly used to screen out google's advertising - both in gmail, and in the "ads by goooogle" that you see everywhere on the web. I typically install this on the computers of most of my less tech savvy friends. In addition to blocking out ads, Customize Google also turns on SSL for all gmail/google calendar sessions, without requiring that the user do any fiddling themselves. Problem solved!

Small Print:

This only stops the massive sniffing of data currently done by the US government of backbone traffic. This in no way protects you from the feds asking Google for the contents of your email - either by presenting a warrant, or more likely (since it doesn't involve asking a judge), a national security letter. I have good reason to believe that the FBI did this to me - but that's beside the point. This at least requires them to know who you are, and to be interested in you - whereas under the current NSA sniffing scheme, they can watch all email flow by, and analyze it without knowing who they're interested in spying on.