Internet Biographical Collection is Free at Ancestry.com
Ancestry.com recently introduced a new genealogy-specific web search engine, called the Internet Biographical Collection. The service looks great but created a bit of controversy amongst web site owners whose sites were being indexed and cached. Today, Ancestry.com converted the new search engine to a free service, available to subscribers and non-subscribers alike. That conversion to a free service should eliminate most of the concerns.
Users must first register before using this free service, which seems like a trivial issue to me.
You can read Ancestry.com's announcement at http://blogs.ancestry.com/circle/?p=1785.
I have used the search engine and found it is rather good. Not perfect, but good. I was able to use the Internet Biographical Collection to find information on several web sites about a number of my ancestors quickly. So far, it hasn't uncovered any new ancestors but I plan to keep trying over the next few days. I suggest you try it at http://ancestry.com. A more direct route is to http://ancestry.com/search/db.aspx?dbid=1162.


The first I heard about this was the folks at www.usgennet.org were upset on Sunday that their web pages had been cached by Ancestry and the pages were behind the Ancestry paywall. Then the genealogy bloggers found out that some blogs were being cached, and they complained.
Ancestry reacted by putting a "View Live Web Site" link on the search result page today, which leads to the actual web page - not the cached version. Then they made the database free today as you noted.
There is still anger in the bloggers and probably some web site masters that their web pages have been "stolen" or "scraped" and put behind the Ancestry "membership" wall (some people won't register).
Obviously, there are potential big issues with this type of "database" - what if everybody excerpted web pages and linked to the pages, especially if they were subscription sites.
There are still legal issues, I think, and Ancestry should explain their legal basis for doing this the way they did it. Without that, it is a black eye for Ancestry.
I blogged about this in the morning at http://randysmusings.blogspot.com/2007/08/ancestrycom-is-caching-some-web-site.html.
Posted by: Randy Seaver | August 29, 2007 at 01:25 AM
For starters, I have been doing genealogy sites on USGenWeb for years. I really think you have some nerve, stealing our hard work and making money off of it when you don't pay us (not even a subscription or a thank you). Further, I don't make everybody that looks at my sites (and I have many) register to look at them, that is none of you business either. Indexing the information is one thing, linking to it is alright too, but to outright swipe it isn't Kosher. We can't even attempt to make a dime on any of the work we do, not even your own affiliation program. I will never trust ancestry.com again, add your link to any of my pages (I am removing them and they are on at least 15 of my sites and I won't recommend you to another soul. You say you are testing the free Biographical Collection, I just wonder how long that will last.
Noneya Business
Posted by: Noneya Business | August 29, 2007 at 02:27 AM
The Point is not where the have now put it. They have pillaged our hard work and made it part of "their" works. Whether it is paid or unpaid, they have violated a ethical code among genealogist.
How better to weed out competition that provides free service than to take all of their work and place it with your site. Am I the only one that thinks this smells and sounds like a familiar childhood game? Can one say MONOPOLOY....... ?????
Look at it in these terms, if you own a construction company and you go take all of xyz's customer lists, tools, etc., what do you thinks gonna happen, ? they will fold up their tents and go home ?
Ain't gonna happen here baby.... We in for the long haul, so take our websites down and put up work that you pay people to put together from original documents (like we had to do, we had to pay for those books, those copies, we had to spend thousands of hours getting blisters on our fingers and rears to get this information online).
Now do we have to watch closely to make sure our stuff does not start appearing in different formats in different databases without our entire pages being displayed and with no credits to that???????
One also has to wonder if the marriage, death, obits, census, etc. etc. won't be the next "Internet Collection" to go up.
" If Ma Bell could not have a monopoly why should they? "
Posted by: Still Upset | August 29, 2007 at 03:47 AM
Why did any of us post our hard work to free sites. For others to see. I applaud Ancestry for making the information free. I also support their request for persons to register before accessing the free data. It serves a means to shut down automatic electronic data mining. (A process where a computer accesses a database and obtains all the data in a directory or file and then uses it to spit out other material.)
Ancestry didn't steal anything. All they did was create a search engine that goes out and finds things of particular interest for others. Then they get the item for them. The only thing they did was put it behind their door. A process that is done all the time. How many Societies have libraries and you gotta' pay to get in the door. What's the diference. How many Public Libraries require you to have a "Library Card" to use the facility? The card is free but you have to register for it. What's the difference. Their name isn't Ancestry.com.
So let's not bash Ancestry too much when they do something that is good and make it free. I agree some of their marketing tactics approach being shady or shoddy (whichever you prefer) and need fixing. But, that's another matter. Let's bask in the light this new effort brings to our quest and take advantage of it often.
Posted by: Gerald Eberwein | August 29, 2007 at 06:30 AM
I just clicked on the link in the first "comment" here to see what this collection was like, and got this message: "Check Back Soon
We're sorry but search. is temporarily unavailable. We are undergoing routine maintenance or we may be experiencing unexpected technical problems. We apologize for the inconvenience and ask for your patience as we work to correct the situation. Check back with us shortly."
I then went to a "basic" search on Ancestry and it worked fine. Kind of strange-usually when "search" goes down in Ancestry it's for all searches. But then, maybe there were too many people trying to access it. Who knows...maybe I'll try later, maybe not.
Posted by: Wanda | August 29, 2007 at 07:12 AM
Did you folks get so angry when Google indexed your sites--and then all of the other search engines? Did you demand that they "delist" them?
I assume that you put information on a free site so that others can find and use it. But how do they find it? Normally, by using a search engine.
I find Ancestry.com's search engine provides fewer, but cleaner, hits, so I am thankful for it. I also check the Live Feed and use it to wander around the various sites. I'd think at least some of the site owners would be grateful that they are now getting more attention from their most desired audience.
It's a good thing.
Posted by: Dennis | August 29, 2007 at 07:26 AM
One comment: the so-called "pillaging" and "theft" and all the other words are a bit strange. There is nothing new here. I don't recall anyone complaining about Google, Yahoo, Dogpile and all the other search engines. Yet they have been doing exactly the same thing for years, only on a bigger scale. The general-purpose search engines index and cache ALL the web sites. There are even a number of for-pay search engines that perform specialized searches for paying customers only. NorthernLights.com is one such for-pay search engine. Again, NorthernLights.com has been charging customers for specialized web searches for years.
Ancestry.com's Internet Biographical Collection is a much smaller search engine: it is also a specialized search engine as it only indexes what it believes to be genealogy-related pages. It is designed to make genealogy information much easier to find, regardless of where it is hosted on the web.
I don't see any difference between Ancestry.com and all the other free and for-pay search engines. They all do the same things.
As always, if you do not want search engines to index your site, create a ROBOTS.TXT file with the proper parameters and place that file on your web site. That is easy to do. Then the search engines will not add your site to their databases.
Of course, your site will be a lonely place as users will no longer be able to find you by using search engines.
- Dick Eastman
Posted by: Dick Eastman | August 29, 2007 at 07:35 AM
Many, many individuals have chosen to make the fruits of their labors available on the web as free historical/genealogical information. Those individuals have chosen to make this information in the manner in which they have. They also had the choice (or not) to make the same information available to Ancestry.com. Had they wished to have this free information available on a for pay basis (remember WorldConnect-OneWorldTree) they would have chosen Ancestry.com. They did not. Now Ancestry.com has spidered websites on a nation-wide scale without the knowledge and consent of the owners and copyright holders of that information and make it available (free for now?) on a paid site in which the user must register to search for information and to have it displayed from a cache on Ancestry.com servers. If they choose to remove the information from the web, it will still be displayed from a cache on Ancestry.com servers. Ancestry.com has so far overstepped the bounds of decency and law, this is no longer a copyright issue, it shold be turned over to the Dept. of Justice for investigation and action.
Posted by: Callawegian | August 29, 2007 at 07:47 AM
Legal or not, by initially charging for content created by other people Ancestry has not endeared themselves to many members of the genealogical community. I want people to find my webpages. I want people to use that information. I'm not going to "take it down" or install a robots.txt file to prohibit them or any other search engine from providing links to my site. That would be stupid and defeat the purpose of publishing it in the first place. The problem I had was that the pages, as they were first displayed, were made to appear as though they were content on Ancestry.com and the link to the "live" page was not obvious. The description that Ancestry provides for this "database" is misleading and the url for the web page is not displayed anywhere within the context of the detail for the "hit" which still makes it appear as though it is their content. A big blunder on Ancestry's part. Sure. Am I going to cancel my subscription? Probably Not.
Posted by: Becky Wiseman | August 29, 2007 at 08:15 AM
At this hour, the search engine IS working: quite well, in fact. So apart from the ethics issues, I find myself grateful for another fine tool. I would wish for a more advanced input criteria form, however, especially for those of us researching more common names. And it is NOT a complete database. Is there such a thing? I searched for Melzar Canright (how's that for uncommon?) and received ZERO results. Keep collecting, Ancestry, there's more out there.
Happy Dae.
http://www.ShoeStringGenealogy.com/ssg1.htm
Posted by: Happy Dae | August 29, 2007 at 09:17 AM
I may be the rarity here, but I would love for my site to be listed on ancestry.com. That would mean that more people would have access to it. As it stands now, it takes a few days for my new pages and information to hit the search engines, and then people only get to my site if they search for particular words - or so it seems from my point (I am able to view the searches that lead to my page). Do I want my page to be behind a fee based site? No - that's not fair to me or those who want to see my page. Do I want it listed on ancestry for free? Absolutely - that gives me a chance to share my info with more people.
As for this comment from above: "If they choose to remove the information from the web, it will still be displayed from a cache on Ancestry.com servers." Any person can have a cache of a site - it does not have to be a major entity. For instance, when I leave this page, there is a cache file on my computer that will come up if I go to the file. So if Mr. Eastman were to delete his page today, I could still view it tomorrow, the next day or weeks from now, as long as I haven't cleared my cache files. Just because one removes info from the web doesn't mean that it is gone from everyone - just those that have not yet viewed it.
Posted by: Marsanne Petty | August 29, 2007 at 09:26 AM
Dick, theft is a pretty accurate description of what Ancestry.com have done here.
This is not the same as Google indexing a site for their search engine. When you do a Google search, they show you the URL of the site and, when you click on the link in the search result, it takes you to the original site.
Ancestry makes a complete copy of your website and takes people to the cached copy by default. They show it on a page with Ancestry.com headers, in its entirety. While facts can not be copyrighted, the textual presentation of those facts can. Ancestry ignores this by showing potentially copyrighted works, in their entirety, with the consent of the copyright holders. Finally, their source for the database does not credit the original website in any way, making it look like the data was generated by Ancestry.
Finally, it is also "theft" in the sense that they are stealing traffic from the original website. Any advertising, or revenue generating content, placed on the original website after they spidered the site will not be present in the cache. By not directing traffic back to the originator, they are stealing the potential revenue dollars from the creator of the data.
Posted by: David Watkin | August 29, 2007 at 09:26 AM
Ductm are you deliberately being obtuse?
I understand now why you claim "straight talk"; everything you post is either a press release or crooked as hell.
Posted by: Shawshank Redemption | August 29, 2007 at 09:29 AM
Since they have now placed all this in the free section and added a link to the live sites, I feel a little better.
However, Ancestry.com has a lot to learn from Google, et al. Ancestry's search results are still appalling and result in the inclusion of totally nonsensical "matches" through which one must scroll to find any truly pertinent ones.
I feel like a broken record complaining to Ancestry about their lack of good search engines and a way to use Boolean searches. Efficient searches would be the best news ever for those of us who break down and actually pay this company to access information.
Penny
Posted by: Penny Hayes | August 29, 2007 at 09:34 AM
As the years pass, I am becoming increasingly disturbed with the trend toward corporatizing (aka "privatizing") public documents and public functions, not to mention the exorbitant fees charged for private or corporate profits from public documents. I do not understand why our public documents have been turned over to Ancestry, nor why Ancestry uses our public data for their corporate gain (and this whole thing about passwords and registration is sheer nonsense). I'm for taking corporations out of the loop when it comes to publishing images or transcriptions of public documents that we've already paid for with our tax dollars.
Public documents are public documents, whether on a local, state, or federal level. They are not now, nor were they ever, private documents intended for private corporations to gain a private profit. Remember, the information genealogy researchers want was freely given to local, state, and federal agencies at one time (census data was given freely, for instance, even if it won't be made public until 72 years after it was obtained), and we all pay taxes so that people at those governmental agencies can keep our records straight and/or look up and copy those documents for us (but we pay a fee to public agencies to obtain copies of those documents, even if it's for/about us). Our tax dollars pay the salaries of public employees - I don't object to that; everyone needs to earn a living and good public servants more than amply deserve their salaries, and some are woefully underpaid (I was once a public employee, albeit in a different field). Our tax dollars also go toward building and maintaining the locations where those documents are stored and employees work. We pay for the maintenance and safekeeping of those public documents through our tax dollars on a local, state, or federal level, in other words.
I strenuously object when private corporations (like Ancestry.com and others) use our public documents for private gain. If our tax dollars were spent to host a free public web site where we could obtain those same documents (local, state, or federal level) after a name search on a flexible search engine and/or the ability to scroll through microfilm images, I would not object, although like census documents there needs to be a cutoff date for privacy purposes. (I was not pleased when a friend who just got a one-month trial membership at Ancestry found my birth data recently from the state index; since I was born after 1930, this crosses boundaries where identity theft could become an issue. Prior to that, I was unaware that Ancestry had any birth data on me or anyone else of my generation in their database because I'm not a member of their web site; I think Ancestry's fees are outrageous, and other friends have been very inconvenienced when Ancestry has charged their credit cards without authorization.)
My shining example of "how to do things right" for hosting a free public web site for public documents or transcribed data from public records with a flexible search engine ('starts with' option is necessary because of multiple alternate spellings with local dialects) remains Norway. There are two sections for the Digitalarkivet web site. One section is for transcribed data (census, emigration, other transcribed records), and the newest additions are the microfilm images of church records one may scroll through (.jpg images can be downloaded) which should be completed by sometime next year (privacy cutoff date in the 1930s for microfilm images). Digitalarkivet updates their web site with new data at least weekly. No registration, no passwords. One just goes to the Digitalarkivet web site and selects English for the language and then chooses the section in which one needs to search (records are in Norwegian, of course, as is the transcribed data, but column headers and search criteria are displayed in English for transcribed data if one has selected the English language option).
Denmark is a close second and their information is also free, search criteria and column headers can also be displayed in English, but they have two web sites; one for transcribed census and emigration data, and the second web site has microfilm images of church records one may scroll through (registration and password required, but it's free). I'm more familiar with the Norwegian Digitalarkivet web site because once I documented my family I expanded and started researching data for people who married into my maternal and paternal families, and most of them had Norwegian ancestors, so I do research on their web site weekly, if not daily. In many cases, including my own Norwegian lineage, I've been able to document genealogies clear back to the 1600s. The only "cost" is the time investment involved with searching and entering the data in my genealogy program.
In other words, people in other countries grasp the concept of 'public' documents for genealogy purposes, and display them as 'public' documents (not for profit of corporations who use 'public' documents for private or corporate profits). The closest any government web site has in the US to what I consider good use of public tax money is the BLM web site where images of Land Grants can be viewed and downloaded for free.
Otherwise, here in the US the only time I can find public documents or transcriptions of public documents is when generous and selfless volunteers have gone to the trouble to add information to a few trusted genealogy web sites where the information can be obtained. I symbolically bow in humble gratitude at their feet every time I find accurate transcribed or image data online that has been put there by volunteers; after documenting thousands of people in my own genealogy research (and sharing it on my web sites, along with links back to the information I find online if I haven't transcribed it from microfilm images or other documents I obtained years ago), I know how difficult it can be to make sure data is as accurate as humanly possible.
For Ancestry.com to cache or otherwise use freely available public data from other web sites as their own (whether "free" or for a fee, and only with registration and passwords) is at least unethical, if not illegal. The rest of us who have genealogy web sites where we freely publish our genealogy data that we've obtained with lots of hard work (sometimes lots of money in a few cases) over many decades - with sources listed - become marginalized on a Google search, and Ancestry.com or other web sites who are only seeking profits become the top listing on search engines. That takes all the fun out of a very enjoyable avocation and the purpose of freely sharing genealogy data in the first place.
Posted by: Bev Anderson | August 29, 2007 at 09:41 AM
The comment was made that Google does not cache web pages. This is not true. Although the primary link for each Google search citation is to the actual web page, at the bottom right of each citation is a link to the cached page. I have often used the cached page when the original page is not available for some reason. Also, for a large or complex page, the cached version has an advantage since the words in the search criteria are clearly highlighted.
Posted by: Clay Nenno | August 29, 2007 at 10:10 AM
Dick wrote: " Ancestry.com's Internet Biographical Collection is a much smaller search engine: it only indexes what it believes to be genealogy-related pages."
There are many things other than genealogical content contained in the "Collections" of ancestry
Try putting in the first three letters of a forbidden word and see what you get. If "it" believes that is genealogy related, then their filters are not any good.
Unlike the normal search engines which pull up the cached page from the last spidering only this one is not a search engine, a search engine was used to create this "Collections", the search part of it is only to search and index what has been cached on their servers where code has been added to private pages by way of a no responsibilty dislclaimer.
Posted by: Noname | August 29, 2007 at 10:16 AM
I have mixed feelings about Ancestry.com's pirating other programs for information. I've done lots and lots of research in original records. I was new to genealogical research, when the wonderful 1880 census was opened to the public. Now, we have the censuses up to and including the 1930 census available to us. I spent a lot of time typing letters and sending letters before the days of computers as we know them today. The first computers I ever saw weighed several tons each and reached almost from floor to ceiling and had small boards, which were "wired." The company, where I worked, paid IBM $1,000 a month rent for each of these machines and then had to hire one person, whose sole job was to "wire the boards." I still prefer looking in the original records in a state and then copying those records. I never opened a website for myself, for the very reason I didn't want it pirated. I will gladly share information with anyone and do so appreciate all, who have shared information with me. When I share information I give my source, so that that person can double check my data. ALWAYS check the original source of any information you receive from another person or from a website. There are rare times, when one has to accept preponderance of evidence, as there is no one source with all the data you feel you need for that person.
Posted by: Jennie Vertrees | August 29, 2007 at 10:33 AM
Public records are public records. We can still get them the "free" way, just not always while sitting at our computers.
Posted by: Laura | August 29, 2007 at 10:40 AM
Dick,
The problem with creating a robots.txt file is that I don't want my data picked up by Ancestry.com or NorthernLights.com. I don't mind other search engines picking it up. To specifically block it, however, you need to know the name of their spider.
For example, Google's spider is called "Googlebot", and if you don't want your site picked up by Google, then you write an exclusion for it.
So, as a person of some stature in the genealogy community, can you ask Ancestry.com on your readers behalf to give us the name of their spider so we can CHOOSE whether or not to be a part of this database?
Posted by: Concetta | August 29, 2007 at 10:58 AM
In answer to Bev's comments: Yes, the documents, birth certificates, marriage licenses, death certificates - are free, possibly, IF you go get them yourself. But, IF you contact my local genealogical society, by phone or website, for a copy of a local death certificate, you will have to donate dollars to have someone go get it for you! - not for the document itself. Ancestry is just "getting it for you", so I don't think your analogy fits. Ancestry pays thousands, or maybe millions, who knows, for all the 'stuff' they do to keep that site running. You can go find all the censuses on record - if you know where to go, and how to look - but, how nice to sit at a home-office and see them all for a 'click'? It is marvelous! The amount I pay yearly is nothing compared to my getting to Salt Lake City, staying in a hotel, buying meals and more... This doesn't address the issue of "stealing" one's web work. However, I'm curious about those who put up their "work" on a free site! If it's really your hobby and you're trying to find an extended family, what better way than to have a the biggest genealogy site in the world point people in YOUR direction! Just a thought...
Posted by: JINNY COLLINS | August 29, 2007 at 11:16 AM
All I have to say is that Ancestry must be getting desperate to enter into such sleazy operations. One day they will meet their match - what goes around comes around.
The only reason I subscribe to Ancestry is for their Census and its search capabilities and in order to have the Census and its research capabilites I have to buy the whole US package.
Ancestry should stick to what it is good at and stop trying to be a glutton - in the long run I believe they will do themselves in. There are ways around them but do the novices they suck into membership realize this? It seems Ancestry wants to suck in novices before they get cyber-wise - Let the buyer beware!
Footnote.Com is a great site - constantly growing - with lower fees to join and much better viewing capabilites for original documents archived by NARA to cite just one example.
Why couldn't Ancestry just provide links to the Live Web Pages without uploading Cached images?
Posted by: Ginny in Poughkeepsie | August 29, 2007 at 11:23 AM
Of course, the Wayback Machine caches web pages for YEARS! You can see your old web pages at http://www.archive.org
Again, hundreds of web sites do exactly the same thing that Ancestry.com does. Some of them, such as NorthernLights.com, even charge money to access their specialized searches. Some of them are significantly more expensive than an Ancestry.com subscription.
- Dick Eastman
Posted by: Dick Eastman | August 29, 2007 at 11:28 AM
I'd like to make a comment on this discussion as I've subscribed to this newsletter for free (thank you Mr. Eastman for providing this!) and also subscribe to Ancestry with a full membership since 2000.
I realize there are many who have gone before me and worked very hard without compensation to record and make available records that I have accessed and incorporated into my family file without my acknowledgement or thanks, to which I would now like to do publicly, although it comes up short as those folks may not ever read this. Kind of like the many hours of my own research on my family! So with this discussion I must stop and ask myself at any point along the way, why am I doing this, am I being compensated (how does working pretty hard for free and without a thank you or compensation make me feel, REALLY??)I know that if it weren't for the internet, or Ancestry or now footnote I wouldn't have ANY interest or participation let alone data. For this I am very truly grateful, and hopefully the work I have done will someday provide another with this marvelous experience I've had over the last 7 years or so. And that's my best attempt at essentially a selfless act of giving which I perceive this study to really be. And if all of the above bothers me, well I'll quit and take "MY" data offline. I guess I just don't see it as "MINE". But I can understand how a person could see it that way. But is it really? Great discussion and I enjoy it. Thanks again for providing the space for it!
Posted by: Ronda | August 29, 2007 at 11:43 AM
It beats me why anyone should want to use this new Ancestry database when they have Google.
Posted by: Caroline Gurney | August 29, 2007 at 11:57 AM