Message boards on genealogy sites and blogs lit up this past week as Ancestry.com announced the new Internet Biographical Collection. The pros and cons have been discussed ad infinitum elsewhere, so I won't repeat them here. If you have not yet read about this controversy, perform a Google search on the words "Internet Biographical Collection."
Many of this week's discussions debated claims and counterclaims about copyrights, legalities and such. I read a lot of these messages but never found any written by anyone who claimed to have a law degree or other appropriate credentials. It seems a lot of people, including me, were writing about legalities without having the academic qualifications to back up their claims. To be blunt, I don't know if anyone was correct. I also noticed that nobody cited legal precedent, at least not with a case title and source citation.
Shame on all of us! We genealogists should know better than to make claims without source citations.
I have now found one case where a court ruled on the exact issue of the legality of caching other web sites' content and on the copyright laws involved. This landmark case should be required reading for all of us who posted messages either for or against the recent ill-fated Ancestry.com product.
NOTE: For the rest of this article, I am only looking at U.S. copyright issues. If you live outside the United States, laws and court actions may be different in your country.
In Field vs. Google, Blake A. Field objected to Google's caching of web pages from a site he created. Mr. Field is an attorney, so we can assume he has some expertise in copyright law. He at least sat through a number of classes on the topic when he attended law school, which is probably more than the rest of us who posted messages recently can say.
Mr. Field brought the copyright infringement lawsuit against Google after the search engine automatically copied and cached a story he posted on his own website. Mr. Field claimed that Google had violated his copyrights by caching his information and making it available elsewhere without asking for his permission. (Does that sound familiar?) Google responded that its Google Cache feature, which allows Google users to link to an archival copy of websites indexed by Google, does not violate copyright law.
How many cases can you cite in which this specific question on copyright law concerning web site caching has already been decided in Federal court?
On January 12, 2006, the Honorable Robert C. Jones, United States District Court Judge in Nevada, ruled that no copyright infringement had occurred, and that Blake A. Field was not entitled to damages. Specifically, the court granted summary judgment in favor of Google on four independent points:
- Serving a web page from the Google Cache does not constitute direct copyright infringement because it results from automated, non-volitional activity by Google servers;
- Field's conduct (failure to set a "no archive" metatag; posting "allow all" robot.txt header) indicated that he impliedly licensed search engines to archive his web page;
- The Google Cache is a fair use; and
- The Google Cache qualifies for the DMCA's 512(b) caching "safe harbor" for online service providers.
Keep in mind that this case was decided in FEDERAL court, a point that greatly increases its credibility as a landmark case concerning Federal copyright laws.
You can read the full text of the Judge's ruling at http://www.eff.org/IP/blake_v_google/google_nevada_order.pdf.
As you can see, this landmark case clearly states that caching a web site is not an automatic copyright infringement, and that a cache constitutes fair use under U.S. copyright laws. It also clearly states that the web site owner who does not want his web site cached must take steps to prohibit indexing and caching via a ROBOTS.TXT file or any similar methods available. (ROBOTS.TXT is the only current method I know of.)
Again, the court ruled that it is up to the web site owner to determine if distribution of their web pages will be different than the normal, default methods in use throughout the Web. (Those are my words, not those of the judge.)
I do not find this unusual. The same has been true for many years for printed text: unless the copyright holder specifically states otherwise, the copyrighted work is handled in the same manner as all other copyrighted works under current laws and court interpretations. If the author wishes something different, he or she must specifically say so. In a printed document, that action is accomplished by words in the printed copyright statement. It seems natural that in a copyrighted digital file, the copyright holder must specifically state exceptions by use of a digital file. With today's technology, the exception(s) are stated in a specific format within ROBOTS.TXT.
In all cases, the requirement clearly is on the copyright holder to specifically state the exceptions to industry norms, if any.
If the copyright holder elects to place their content on a web service that does not offer ROBOTS.TXT capability, that person has given implied licensing to others to cache the contents. Again, it is up to the copyright holder to make sure that the exceptions are clearly available to all. In this case, that obligation includes selecting a web service that allows for the clear statement of exceptions.
The court decision also refers to the DMCA's 512(b) caching "safe harbor" for online service providers. You can read an explanation of this at http://en.wikipedia.org/wiki/OCILLA. Pay close attention to the section entitled "Requirements to obtain safe harbor."
One item not mentioned in the Field vs. Google case is whether or not hiding the URL of the originating site makes a difference. This was not a factor in the Field vs. Google case, but many message writers felt it was a major issue with Ancestry.com's service. Although the question is interesting, the topic quickly became moot when Ancestry.com added URLs a day or two after product launch.
Does this precedent-setting ruling by a Federal court apply equally to the late Ancestry.com Internet Biographical Collection? I have no idea. I do not have the legal training to make a qualified guess. But I do know it makes for very interesting reading.
In the one legal case I found to directly reflect the same issues as the recent Ancestry.com controversy, the court clearly ruled that web site caching is not a copyright infringement. I am also guessing that, if anyone ever took Ancestry.com to court over the Internet Biographical Collection, the plaintiff would have to clearly show why the Ancestry.com web caching is different from Google's web caching. Otherwise, the judge probably would dismiss the suit by referring to the earlier Field vs. Google case. A clearly established difference would be critical to the new case's success.
Such a court case involving Ancestry.com should be an interesting case with far-reaching ramifications for genealogists. Since Ancestry.com has now canceled the service, I doubt if we ever will find out.
