Free Genealogy Books Available on The Internet Archive

When I started researching my family tree more than thirty years ago, I purchased a reprint of a genealogy book first published in 1920: The Harmon Genealogy, comprising all branches in New England” written by Artemas C. Harmon. The book mentions my great-grandmother, Lucy Harmon, and documents her Harmon ancestry back to 1667. It is a wonderful resource, and I have referred to this book often over the years.

I paid more than $100 for this reprinted book many years ago. Today I found the same book online. Anyone may find the book online and then download it to their own computer or to a flash drive or even print out.T(hat last option will consume a LOT of paper and some storage space on a shelf, however.) The cost is ZERO.

Which version would you prefer?

I can download the entire book to my hard drive or to a jump drive or save it to an online storage service. I can print one page, multiple pages, or even the entire book. Even better, I can electronically search the entire book within seconds for any word or phrase. Not only can I search for names, but I can also search for towns, dates, occupations, or any other words of interest. Try doing that with a printed book!

The Internet Archive, also known as “The Wayback Machine,” is a 501(c)(3) non-profit that was founded to build an Internet library. Its purposes include offering permanent access for researchers, historians, scholars, people with disabilities, and the general public to historical collections that exist in digital format.

The Internet Archive is well known for storing terabytes of old web pages. However, the organization has also expanded its role to digitize and store all sorts of public domain material, including old books, movies, audio recordings, radio shows, and more. I have also found a few modern books on The Internet Archive that were legally contributed by the copyright holders themselves.

The site’s Text Archive contains a wide range of fiction, popular books, children’s books, historical texts and academic books. The list includes genealogy books as well. The Internet Archive is working with several sponsoring libraries to digitize the contents of their holdings. In addition, private individuals are invited to scan the public domain books in their personal libraries and upload them as well. (See http://www.archive.org/about/faqs.php#195 for information about contributing your books.)

The result is a huge resource of books in TXT, PDF, and other formats, books that you can download to your computer, save, and then search for any word. The same books are also visible to Google and other search engines, including online every-word searches.

The PDF versions contain images of every page in the original books. That makes them easy to read. I prefer to look at PDF versions of a book whenever possible. However, searching the PDF versions electronically does not work very well.

You should be aware of a couple shortcomings of books converted to plain text, however. First, the TXT files have lost the formatting of the original books; there is no bold or italics or underlining since such formatting is not supported by TXT formatting. In addition, paragraph indentations and other “spacing” often is lost.

Secondly, many of the books available were converted to TXT format by OCR software. OCR never converts all words perfectly; so, you can expect to find numerous OCR errors in these documents. For instance, “The Harmon Genealogy, comprising all branches in New England” has some words mis-scanned, and many dates have errors in them. The common one was substituting the letter “I” in place of the number one, such as “I920” instead of “1920.” This will cause difficulty if you are electronically searching for specific words or numbers.

The Internet Archive presently digitizes more than 1,000 books a day and presently has millions of “texts” (Books and other printed material) available online. There is also a collection of 300,000 modern eBooks that may be borrowed or downloaded by the print-disabled at OpenLibrary.org. If you do not find what you want today, come back in a few months and try again. It may have been added by then.

Of course, the Internet Archive is not the only source of digitized books. In fact, Google Books is a well-known source of digitized books. Operated by a well-funded commercial company, Google Books gets most of the publicity. However, with commercial ownership come proprietary business methods. Google Books has now stopped adding new books to the collection. However, all books previously digitized remain available online at http://books.google.com.

The Internet Archive also provides most books in http, EPUB, Kindle, Daisy, and DjVu formats in addition to TXT and PDF. As a result, the books and other documents can be read on almost any ebook reader as well as on computers, iPads, and most cell phones that have web browsers.

The Internet Archive does not yet have all the genealogy books ever published. In fact, nobody seems to know how many genealogy books are available this way. Even the folks at The Internet Archive don’t know. They simply scan everything they can find and don’t worry much about classifying the topics. However, it is known that the Archive’s ever-expanding collection of genealogy resources includes items from the Allen County Public Library Genealogy Center in Fort Wayne, Indiana; Robarts Library at the University of Toronto; the University of Illinois Urbana-Champaign Library;Brigham Young University in Provo, Utah, the National Library of Scotland, the Indianapolis City Library’s Indianapolis City Directory and Yearbooks Collection, The Leo Baeck Institute Archives of German-speaking Jewry, the Leo Baeck Institute Archives, and the Boston Public Library.

Resources include among many things books on surname origins, vital statistics, parish records, census records, passenger lists of vessels, and other historical and biographical documents, as well as individual volumes contributed by thousands of users from around the world. Most of the genealogy books are published in English but there are numerous exceptions.

I searched the “Texts” section of The Internet Archive for the word “genealogy” and found 128,198 results. By searching in “Texts,” I was able to ignore the “hits” found on the Internet and in other sources. That’s not a definitive answer as the word “genealogy” obviously will exist more than once in most books. However, it does provide a rough idea of the popularity of the word in The Internet Archives’ books, magazines, and other texts. Whatever the true number, there must be thousands of genealogy books available today on The Internet Archive, and the number is growing rapidly.

The Internet Archive also has scanned and digitized the U.S. Census records from 1790 through 1930. Unlike the commercial providers of census data, the versions provided by The Internet Archive have not been indexed. They are useful only if you already know where to look for your ancestors. Small towns can easily be searched one page at a time while cities probably are best searched if you already know the enumeration districts involved.

Also unlike the commercial providers of census data, the census information on The Internet Archive is available free of charge to everyone.

In fact, everything on The Internet Archive is free. There is never a charge for anything on The Internet Archive. As a non-profit, however, the organization does accept donations which are tax-free to Americans.

In a casual search, I found all sorts of material of interest to genealogists on The Internet Archive, including these:

Compiled service records of soldiers who served in the American Army during the Revolutionary war

Polk Lafayette, Indiana, city directory (Volume yr. 1891)

Preakness and the Preakness Reformed church, Passaic County, New Jersey: a history, 1695-1902, with genealogical notes, the records of the church and tombstone inscriptions

The history of ancient Wethersfield, Connecticut: comprising the present towns of Wethersfield, Rocky Hill, and Newington, and of Glastonbury prior to its incorporation in 1693: from date of earliest settlement until the present time (Volume 1,pt.2)

Ziegler Genealogy by John A. M. Ziegler

Genealogy of the Beaudry Family of Northern Ontario and Relatives

Morse genealogy by Morse & Leavett

Genealogy of the Spotswood family in Scotland and Virginia

The Lenher family: a genealogy by Sarah Marion Lenher

The above is only a tiny fraction of the many books available free of charge on The Internet Archive.

The Internet Archive isn’t perfect, but it does provide a great resource for genealogists, historians, and others. If you are looking for information about your family tree, I’d suggest that you check out The Internet Archive at http://www.archive.org. You can read about the Internet Archive’s genealogy collection at https://archive.org/details/genealogy.

If you are interested in The Harmon Genealogy, comprising all branches in New England, go to https://archive.org/details/harmongenealogyc00harm. Caution: This book is great; but, like most genealogy books, it does contain a few errors. Author Artemas C. Harmon did a very good job of research, but his work was not perfect.

9 Comments

I do want to respond to one statement you make here:
“However, searching the PDF versions electronically does not work very well.”
Although it is true that not all PDF versions of digitized books at the Internet Archive have been saved with ‘searchable text’ most seem to have been. If you have Adobe Acrobat (admittedly commercial software) or FlexiPDF (from the German company SoftMaker and also commercial but less expensive than Acrobat) you can convert most of the PDFs you download from the Internet Archive (but are not searchable) to searchable/editable text while maintaining the book’s original form. Clearly you can only do that with type-set text, and some PDFs at IA have quite a strong sepia background and faint text and those may be a challenge. I do notice that dates and numbers very often do not convert to proper searchable form because for one thing, just as Dick notes here, the number one in printed old documents is confused by OCR as “I”, serif capital i, which in many (but not all) old typefaces is what the number one, “1”, looks just like, in fact like the Roman numeral one. I have, however, downloaded old digitized PDF books printed using what is now the standard “1” and numbers can be searched fine in those documents. That shortcoming has never caused me a lot of grief, though.
I have around 30 gigabytes of material, almost all published books, downloaded from the Internet Archive and there is a significant volume of valuable genealogical material you will only discover by downloading digitized books from the Internet Archive (or possibly other sources) and exploring them.

Like

I want to join Dick in Highly Recommending Internet Archive for books, books, and more books on early American ancestors. Most are now in digital (colored) formats, and many years ago when I first discovered the site they also had Google Books titles and microfilm pdf files for books (Google used to be better and all free; then they started publishing reprints for a fee so its increasingly difficult to get a link to the free volumes they have/had). Of the 250+ titles in my PDF-downloaded Books file (on my laptop plus backups on jump drives), most are from Internet Archive. I use Firefox as my web browser (because it has a side bar where I have files of links, one of them I made for Books, and most of those links are to genealogy-related volumes at Internet Archive – yes, I keep the links as well as the downloads in case I need to send the links to others who have the same ancestors and they can download the volumes from there).
Some of my book “gems” include a little-known pdf of a book (with photos in most cases) of people who fought in “the war” (WWI) and a little three or four line blurb of where and when they served. Another is a local history of the Red River Valley with bios which sometimes yield names of areas where they came from in “the old country.” I know others have found little-known books of “local histories” (often published when communities have jubilee or centennial celebrations). Yes, they’re often “vanity publications” in connection with local community celebrations, but at least they give researchers some idea about what their ancestors were like as people, if not more concrete facts about their lives. There is a copy of a local history of Vinalhaven, Maine (lots of mentions of my ancestors, altho I had already purchased the paperback of the book several years earlier from Picton Press which is no longer in existence; I miss that publishing house because they had lots of New England genealogy and history books). Internet Archive even turned up a little-known Nordmændene i Amerika (Gothic text, no less) in Norwegian – specifically, Dano-Norsk (Astri My Astri publishing company has a modern English translation of the book as well as other Norwegian historical and biographical texts translated into English, all in my private library before I discovered the little Dano-Norsk text online).
In the nineteenth century, Rhode Island’s state legislature ordered their early records be published (I know; legislators used to do things beneficial to the people who elected them at one time in our nation’s history – Amazing!). “Vital Record of Rhode Island: 1636-1850” and a great many of those 20+ volumes have found their way to Internet Archive files – info for BMD, deeds, court proceedings, etc., from their original local records. There was/is a large Quaker population in RI, and their calendar dates in their official records have been rendered into a format used by most of the rest of the US. [No such luck with Find-A-Grave records where people have incorrect dates translated from numbers on Quaker headstone markers because they didn’t convert the Quaker dating system to what non-Quakers use for everyday language.] Internet Archive also has the privately-published books for my Rodman and Sherman ancestors found in RI. Caution on the Sherman name; one is from Massachusetts (and in my maternal line)…, and the other is connected to my paternal Sherman line from Rhode Island (three Sherman brothers came to the colonies; my Philip Sherman is one of the signers of the Portsmouth Compact, as well as John Coggeshall, another ancestor of mine – bios of both can be found on Wikipedia pages). Apparently there is no connection to the earlier Massachusetts Sherman line that produced William Sherman who married Desire Doty, daughter of Edward Doty of the Mayflower (the litigious Edward Doty also has a Wikipedia page).
The OCR text files I try to avoid unless it’s a matter of only a couple of paragraphs I can compare word-for-word to the image of the original since no one took the time to go through and correct the OCR mistakes before putting the text online. My “work-around” for that is to do a screenshot of book pages (or a portion of a page thereof, then include the specific source info below the image), even if I have to cut and paste to get a larger version of a text for clarity (depending on how much info I’m dealing with, how many pages), and include that as an image file for certain ancestors. In Increase Mather’s tome of King Philip’s War this comes in handy because it’s a good example of when printers used f for s in so many words. While one can transcribe and/or “translate” that into modern words, it’s not the same as seeing the original text in a colored image format. [Two of Mather’s footnotes pertain to two different families in my files, one being a direct maternal ancestor, the other a brother of one of my direct paternal line ancestors, only Mather uses one of the alternative spellings of the name.]
A search for my first Joseph Jenckes ancestor turned up a paragraph in an obscure text called “Catalogue of an Exhibition of Early American Engraving Upon Copper 1727-1850” that mentioned my first Joseph Jenckes on page iv in the Introduction (Jenckes made the dies for the coin); fast forward to a few years ago and finding the cornerstone contents of a building in MA and one of the coins was a 1652 Pine Tree Shilling (pictured in a newspaper photo online)…!
In any case, if one has early American ancestors, a name and/or a book title to reference (old or “new” – some 20th century book titles from local histories and bios are in Internet Archive’s files), search Internet Archive’s database (or do a regular online search and pick the Internet Archive link to look at), and one may be in luck. I admit, however, I don’t understand the newest feature about “borrowing” books. If I can’t download the file to peruse at my leisure or share with other people doing research or interested in our family’s history, it’s of no use to me. [I did go to the time, trouble, and expense of getting reprints of two of the more reliable and researched family history books written on my family via Higginson Books several years apart; it takes a few weeks because they don’t print a book without an order, but I got lucky and got very nice hardcover volumes, so to me it was worth it. YMMV.]
And let’s face it: the price is right at Internet Archive – their info is free and easily accessible and downloadable…!!! What more could one want? A genius IQ to sort through tangled pedigree lines in a population bottleneck…??? 😉
Happy Searching and Good Luck!

Like

I just want to clarify one thing: The Wayback Machine is a specific “utility” at the Internet Archive (i.e., the Internet Archive is not known as the Wayback Machine.) The Wayback Machine saves “snapshots” of web pages over the years. Say your local historical society used to have content on their website that you would like to see again. You can use the Wayback Machine to do so. It does have limits, though, since it can’t possibility save the complete functionality of old sites. It can be very handy, though, for finding “old” online resources and for deciphering who owned old websites with cryptic names!

Like

Anyone can contribute to Internet Archive. I put up the books my great-grandparents wrote about their families and church cookbooks that are out of copyright. There’s tools to convert pdfs to e-books.

Like

In addition to finding family history in digitized books, it is also important to search for published information about individuals.
I was researching a man born in 1893 in St. Louis. He was one of nine siblings. His widowed mother died when he was nine. At eleven, he worked in a brick manufacturing plant and was paid $.50 per day. He slipped on grease on the floor and broke his right arm above the elbow. In a 1907 book about appealed court cases, his attorney sued to set aside a $500 award in favor of a larger settlement. He didn’t prevail.
The man’s identity was confirmed in a note on his WWII draft registration. The doctor wrote that he could not fully extend his right arm due to an injury.
I found the book of appealed court cases at Google Books, but I also search at Internet Archive.

Like

I agree that the Internet Archive is a great source. They are a fantastic organization.
This book, and many others, are also available in the FamilySearch Digital Library.

Like

There are many local histories and history journals as well. A valuable source, and it’s not trying to sell something, but it needs donations to keep going.

Like

It should also be noted that Reclaim the Records uploads all the records they retrieve to archive.org.

Like

What’s new on the bookshelf:
(genealogy) AND mediatype:(texts) AND scandate:(202009*)

Like

Leave a Reply

Name and email address are required. Your email address will not be published.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <pre> <q cite=""> <s> <strike> <strong> 

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: