Out-Law / Your Daily Need-To-Know

Is Google legal?

27 Oct 2006, 1:33 pm

FEATURE: A Belgian court ruled against Google’s use of newspaper stories in early September. If you believe Google, it did nothing wrong and failed to defend itself because it was unaware of the publishers’ lawsuit. If you believe the publishers, Google is lying and infringes copyright on a colossal scale. The parties return to court on 23rd November in a case that finds legal uncertainty looming over the world’s leading search engines.

The case focused on Google’s news aggregation service, which automatically scans the websites of newspapers, extracting headlines and snippets of text from each story. These are displayed at Google News and the headlines link users to the full stories on the source sites. Newspaper group Copiepresse, which represents leading Belgian, French and German publications, said this amounted to copyright infringement and a breach of database rules because its members had not been asked for permission.

Copiepresse could have stopped Google without going to court but chose not to. Instead, it wants Google to continue directing traffic to its sites – and it wants Google to pay for the privilege.

The court also ruled that Google’s cache, which is not part of Google News, infringed copyright.

When a person performs a search at Google, results are displayed with a link to the page on the third party site and also a link to a ‘cached’ copy of the same page stored at Google’s own site. The newspapers say this copy undermines their sale of archive stories. Why buy an archived story if you can find it in Google’s cache? Again, newspapers could have stopped their pages being cached.

Margaret Boribon, Secretary General of Copiepresse, told OUT-LAW that Google’s behaviour is “totally illegal” because it does not seek permission before extracting content for Google News or copying pages to its cache. Google disagrees.

Understanding Google’s position within the law means understanding how the search engine works.

Google uses an automated program to crawl across the internet, known as its Googlebot. It locates billions of pages and copies each one to its index. In doing so it breaks the page into tiny pieces, analysing and cross-referencing every element. That index is what Google interrogates to return search results for users. When the Googlebot visits a page, it also takes a snapshot that is stored in Google’s cache, a separate archive that lets users see how a page looked the last time the Googlebot visited.

It is easy for a website to keep Googlebot or other search engine robots away from all or particular pages. A standard has existed since 1994 called the robots exclusion standard.

Add ‘/robots.txt’ to the end of any site’s web address and you’ll find that site’s instructions for search engines. Google also offers a simple way to prevent a page being cached: just write the word ‘NOARCHIVE’ in the code of a page.

When asked why her members’ news sites didn’t follow these steps to exclude Google, Boribon replied, “then you admit that their reasoning is correct.” She said all search engines should obtain permission before indexing pages that carry copyright notices.

But the real reason for not opting-out with a robots.txt file or mandating against caching is that Belgium’s newspapers want to be indexed by Google. “Yes, we have a problem with Google, but we don’t want to be out of Google,” Boribon said. “We want Google to respect the rules. If Google wanted to index us, they need to ask.”

Copiepresse also wants Google to pay for indexing sites. Boribon declined to discuss how or how much. “That has to be negotiated,” she said.

The argument is not unique. The World Association of Newspapers (WAN), which represents 18,000 newspapers in 102 countries, said in January it would “explore ways to challenge the exploitation of content by search engines without fair compensation to copyright owners.”

At that time, WAN did not have a strategy for challenge. Copiepresse did. It took direct action and convinced the Brussels Court of First Instance to order Google to withdraw from its sites all the articles and photographs of Copiepresse member sites. Google was given 10 days to comply with the threat of a €1 million fine for each day of delay.

Since the ruling, Google has pulled the plug on the news sites in the lawsuit. They are not just missing from Google News Belgium, they have disappeared from Google’s main index and cache too.

“They have done it to punish us,” said Boribon, who didn’t want Google to go that far. “They have a bad attitude.” Yet Boribon went on to complain that some of her members’ content can still be accessed via Google News France. “They don’t apply the judgment fully so we will ask for the fine,” she said.

Boribon does not seem to think she is cutting off her nose to spite her face. “What I’m achieving now is getting all the information to my European colleagues so we will have other publishers taking part in the court case. Then maybe Google will change its mind. If they see this is not a Belgian case but a concern for all publishers all over the world, they will have to review their business model.”

Her hope is that if enough publishers withdraw their content, Google will have significantly less content to index – and that will force it to the negotiating table.

Copiepresse is using the law as leverage in a commercial argument: its content contributes to Google’s $10 billion-a-year in revenue and newspapers want a cut. That argument should not focus on Google News because Google News does not display ads. It is only when newspapers’ pages appear in the results of the main search engine that Google serves the ads that fuel the $125 billion company.

Copiepresse told the court that Google damages the publishers’ ad revenue by bypassing their homepages. “We want search engines to send people to our homepage,” she said, explaining that only the homepage always carries ads.

Google says its practices are lawful. It acts as an intermediary that connects users to sites. Europe’s Copyright Directive and E-commerce Directive recognise the role of intermediaries and afford them special legal protection, including a special right for intermediaries to cache material. Confusingly, however, Google’s cache may not be what the lawmakers had in mind.

Internet service providers use caches to save bandwidth on delivering frequently-accessed web pages. Rather than deliver a live page, it is more efficient to deliver a cached copy to customers. The customer will never know the difference because the cached copy is updated when the live page changes. The E-commerce Directive doesn’t distinguish internet service providers from search engine service providers. Instead it says “a service provider is not liable for the automatic, intermediate and temporary storage of that information, performed for the sole purpose of making more efficient the information’s onward transmission to other recipients of the service”. There are other conditions, including that “the provider does not modify the information” and that “the provider complies with conditions on access to the information”.

Google has explained the purpose of its cache before, when the function was challenged in a US court in January. Google listed three purposes for the Nevada District Court: it allows users to view pages that the user cannot access directly, perhaps because the destination site has gone down; it allows users to make comparisons between a live and cached web page; and it allows users to identify search query terms (which are highlighted wherever they appear in the cached page). Copiepresse might argue that these purposes go too far beyond the Directive’s “sole purpose of making more efficient the information’s onward transmission to other recipients of the service”.

Even the legality of the primary search function of a search engine is open to question. The Directive’s condition that a provider “does not modify the information” is arguably breached as soon as a search engine breaks a page into tiny elements for analysis and cross-referencing in its gigantic index. That argument was not raised in court but would cut to the heart of almost any search engine’s operation.

Google won the Nevada case. Its opponent, a lawyer called Blake Field, had “decided to manufacture a claim for copyright infringement against Google in the hopes of making money from Google’s standard practice,” according to Judge Robert Jones. Field knew how the system worked and he placed copyrighted articles on his site, waiting for Google to find and cache his work. When it did, he sued.

The court endorsed Google’s opt-out approach: because Field knew about the robots protocol and the NOARCHIVE command, Field’s conduct was interpreted by Judge Jones “as the grant of a licence to Google for that use.”

Google could use the implied licence argument when the Copiepresse case returns to court. The robot exclusion standard has been around for 12 years; Google could argue acquiescence.

Field also argued that Google’s cache was not “intermediate and temporary storage”, as required by a US law. Judge Jones said that Google’s caching for approximately 14–20 days at a time is temporary. That may or may not influence a European court if it has to decide the same issue: the wording is common to laws on both sides of the Atlantic.

If the legality of the cache is uncertain, the legality of Google News is no clearer. The Belgian court heard that it is an information portal , not a search engine. It uses 4,500 English-language news sources and a few hundred Belgian sources, in many cases without prior permission. Google says that’s okay.

“Copyright law allows for snippets to be published from results,” Google spokesman D-J Collins told OUT-LAW. “That’s why we have argued that the court order was flawed. Google News does not break copyright law.”

Copiepresse disagrees with Google’s view that snippets of text are unprotected. Copyright only protects against substantial copying; but publishers would argue that a snippet can be substantial in a qualitative sense, just as courts will protect short samples from songs. Google takes each story’s headline – the craft of a subeditor; and sometimes the entire first sentence or more from the intro – the most labour-intensive part of a journalist’s writing. The legality has never been fully resolved.

The publishers might also argue that thousands of snippets in aggregate amount to substantial copying in a quantitative sense. Google might counter that it is taking only one

snippet of each copyright work – i.e. its thousands of snippets are from thousands of works, not one work.

The Belgian court found that Google had also infringed database laws. The EU’s Database Directive says that the repeated and systematic extraction of insubstantial parts of a database can amount to infringement of a database right.

Some courts have characterised websites as databases and ruled against sites that aggregate content. But that was before controversial rulings by the European Court of Justice in 2004 over the use of horseracing and football fixtures data.

The upshot: many databases are only protected if the owners do not ‘create’ their own data but obtain the data from others.

Google told OUT-LAW that it does not believe that Google News breaks this database law. It did not elaborate, but might argue that a newspaper’s site is not a protected database because the database right does not cover the investment in creating the news; it would only cover the obtaining of news from others. It might say that there is no systematic extraction of a single database; it is systematic extraction from lots of databases. But publishers could argue that news stories are not the same as raw facts such as when two football teams will play each other; and that their websites are not a mere byproduct of investment, unlike the databases in the fixtures cases.

WAN and other publisher groups will watch the rematch between Copiepresse and Google with interest. A week after the September ruling they identified the strategy that they had been seeking since January: the Automated Content Access Protocol, or ACAP.

A briefing paper was sent to OUT-LAW. It describes a system very similar to the robots exclusion standard: “a standardised way of describing the permissions which apply to a website or webpage so that it can be decoded by a dumb machine without the help of an expensive lawyer.”

Angela Mills, executive director of the European Publishers’ Council, told OUT-LAW: “This isn’t about blocking content, it’s about enabling it but with more sophisticated rules than are currently possible. Right now we can say ‘don’t index’ – but that’s not sophisticated enough. It’s very boring to have the choice of yes or no.”

ACAP might say that text can be taken but not images; or that images can be taken on condition that the photographer’s name appears. Demanding payment for indexing might also be part of the protocol, said Mills.

The plan is for ACAP to be a voluntary system. “If people wanted to ignore the rights expression they could,” Mills said, “but that obviously puts them in a much weaker position if challenged in court.”

When asked what it thought of ACAP, Google’s Collins told OUT-LAW, “We welcome any initiative that enables search engines and publishers to work together more closely. We look forward to discussing this proposal with the WAN and in particular how it can build on robots.txt”. But asked if Google would pay publishers to index their content, Collins replied, “That’s not something we do.”

This feature, by OUT-LAW Editor Struan Robertson,originally appeared in Issue 15 of OUT-LAW Magazine. If you don't already receive the 16-page Magazine, you can get a free subscription. Contact: [email protected].