Google Books Reduces its Digitizing and Preservation of old Books while Internet Archive Increases its Efforts at the Same Thing

An article in The Message states that Google is reducing its efforts at digitizing old books. That certainly is a loss for genealogists, historians, and many others. In what appears to be an unrelated move, the Internet Archive is INCREASING its efforts at digitizing old books, adding 1,000 books to the online collection EACH DAY. Perhaps there is hope for genealogists after all.

In 2004, Google Books signaled the company’s intention to scan every known book, partnering with libraries and developing its own book scanner capable of digitizing 1,000 pages per hour. Since then, the company has digitized millions of old books, creating a valuable archive. Google Books is still online, but has curtailed its scanning efforts in recent years, likely discouraged by a decade of legal wrangling still in appeal. The Google Books Blog stopped updating in 2012 and the Twitter account has been dormant since February 2013.

In contrast, the Internet Archive, a non-profit organization, has created one of the world’s largest open collections of digitized books, over 6 million public domain books, and an open library catalog. The digitized books available from the Internet Archive also are available in many more formats than those from any other online service, including PDF, Kindle, EPUB, and more. Of course, you can also read any book simply by displaying it on your screen in a web browser.

The Internet Archive has also digitized 1.9 million videos, home movies, and 4,000 public-domain feature films. It has also added 2.3 million audio recordings, including over 74,000 radio broadcasts, 13,000 78rpm records, and 1.7 million Creative Commons-licensed audio recordings, more than 137,000 concert recordings, nearly 10,000 from the Grateful Dead alone. Other items added to the FREE online archives include more than 10,000 audiobooks from LibriVox, 668,000 news broadcasts with full-text search, and the largest collection of historical software in the world.

The Internet Archive also offers scanning services. The non-profit offers FREE and open access to scan complete print collections in 33 scanning centers, with 1,500 books scanned daily. Best of all, the scanning of books is performed in a non-destructive manner. That means there is no need to cut the bindings off the books before scanning. The Internet Archive either operates or partners with 33 scanning centers on 5 continents.

You can read more about the demise of Google Books and the rise of the Internet Archive at http://goo.gl/DFYq7W. The Internet Archive may be found at http://archive.org. Information about the Internet Archive book digitization efforts may be found at http://archive.org/scanning.

My thanks to newsletter reader Doris Wheeler for telling me about the business shift in Google Books.

23 Comments

I use the Wayback Machine at the Internet Archive at least several times a week, due to the ever-growing collection of dead links in my bookmarks folder. I usually find what I’m looking for there, with rare exceptions. It is truly THE library of the internet.

Like

I will take Internet Archive anytime over Google if there’s a copy. Quality is very high.

Like

Google Books obviously doesn’t make any money for them.

It might have helped if it were halfway competent. In the Internet Archive, if I find a book, I can read it. In Google Books, there might not be any view at all or there might be only a snippet view. I have seen a number of references in mailing lists to books published back in the 1800s that are visible to users of Google Books in the USA – I can’t see anything inside them from the UK. Either Google couldn’t be bothered to find out what the copyright rules were for these books (unlike other books from the same era that are visible in the UK) or they just couldn’t be bothered.

Oh – don’t forget HathiTrust http://www.hathitrust.org/ for archived books.

Like

    Hathi Trust is the only one that has the one family history book I’m missing. There is no download button…, but there is a message to subscribe (to apparently use their books only online?), and that one must be connected to an institution to use their facilities.

    I don’t understand it.

    Internet Archive is the best and easiest to use (but NOT the OCR text – I wish they’d get someone to clean up all those errors). I just go straight for the books (genealogy, vital statistics, history), usually download them, and bookmark the interesting volumes.

    Google is not so “free” about turning up links to their free downloads; they usually show only the ones they are reprinting for a profit now, even though the copyrights have long since expired.

    Like

One of the problems with Google books: they do not make all their books available, even those long out of copyright. For many books, all we can see is a description. Access to the content seems reserved for these companies specialized in print-on-demand service…

Liked by 1 person

I can certainly attest to how fantastic some of these books are from Internet Archive and Google Books…, and they DO have certain family genealogy volumes online for free that are staples of Colonial New England family history (caveat: some of the details in earlier works are wrong, other books have corrected previous errors, so newbies need to be aware of that little factoid).

Between the time I got my first PC and before Google Books plans were announced, I splurged and got a nice hardcover reprint of a nineteenth century volume for one of my family lines for $50. A few years ago I found the same work in microfilm on Google Books and not too much later, the same work in colored digital format online through Internet Archive (I am an enthusiastic fan of colored digital images since they’re easier to read than dark or light microfilm images). I downloaded each and added them to backup files on jump drives I reserve exclusively for downloaded books, and keep links to both in my files. On a name search I found a letter to the editor written by my grandfather in 1915 in a magazine that is no longer published.

Another fluke of a search for something else came up with three or four nineteenth century Swedish-English translating dictionaries, so one sees both Gothic typeface and plain typeface. Translating dictionaries are handy for genealogy researchers who do research in other countries.

I happened to run across a mention of a book about King Philip’s War (Increase Mather wrote the book, Cotton Mather authored another section), and it has the old style printing where f replaces s. It makes for interesting reading. [I had both maternal and paternal ancestors who were there. One came away with PTSD, the other was killed and his son was executed for treason.]

One can never have too much knowledge! 🙂

Like

    Incidentally, it isn’t a letter “f” that replaces some occurrences of the letter “s”. If you look really closely,
    the “f” has a cross-bar that goes across the vertical but the ‘cross-bar’ on the long “s” is only on the left of the vertical – that because it’s actually part of a single stroke of the pen that goes from the bottom left of the ‘cross-bar’, up through the curled top, to the right-hand top of the curl.

    Like

    Bev,
    Can you remember how you pulled up the 19th century Swedish-English dictionaries? The earliest I could pull up on a direct search was one from 1900. Thanks,
    Martha

    Like

    Martha –

    I included the titles, authors, year of publication, and links in a very long comment to Dick’s posting on 17 March 2015 which sort of ended up being a kind of primer for Swedish Genealogy Research 101 (other links with online dictionaries, maps, etc.).
    https://blog.eogn.com/2015/03/17/arkivdigital-in-sweden-is-having-another-free-weekend-this-weekend/

    The short answer about the translating dictionaries (copy-paste from that comment):
    If you understand the Gothic alphabet, this is a free Google Book to download – words in Swedish (Gothic typeface), definitions in English:

    Swedish and English pocket dictionary, 1829:
    http://books.google.com/books?id=BMEDAAAAQAAJ

    Svensk-engelsk ordbok [Swedish-English Dictionary; Carl Gustaf Björkman, 1889]:
    http://books.google.com/books?id=ZXstAAAAYAAJ

    Svensk-Engelsk hand-ordbok [Swedish-English hand dictionary; Victor Emanuel Öman, 1872]:
    http://books.google.com/books?id=oOAFAAAAQAAJ

    Svenskt och engelskt lexicon [Swedish and English Dictionary; Gustaf Widegren, 1788]:
    http://books.google.com/books?id=hIECAAAAQAAJ

    If you want to know what words sound like when pronounced, copy-paste or type them into the Text box, select the Language and click Say It:
    http://www.oddcast.com/home/demos/tts/tts_example.php
    ~~~~~~~~~~~~~
    I downloaded the books and also bookmarked the links; I have a file folder for Books in my browser.

    When I want some full-immersion language sounds for Swedish or Danish, I watch the various movie and TV shows on Hulu (I haven’t found any Norwegian shows on Hulu yet, but in the early ’80s I did study Norwegian for two years – only to find out twenty years later that their pre-1910 records are in Dano Norsk – long story that started ca 1347-49). I’m always pleased when actors speak slow enough that I don’t have to rely on the close captioning. I know the common genealogy terms for finding data in their records, spelling differences or similarities after all these years. Of the three Scandinavian languages, Danish is the most difficult to understand (for me) because it sounds the most Germanic and gutteral to my ears.

    Have fun! 🙂

    Like

    Bev,
    Thanks for replying. I hadn’t looked at the original blog item about AD free days since I get a notice from them. I have a number of modern Swedish dictionaries, but researching in the 17th to early 19th century time period, I often run into words that are not in the modern dictionaries. I bought a wonderful Swedish-French Dict. from the mid-1800s at a college book de-acquisition sale , but unfortunately only the first volume was there. These will be a big help. I studied Swedish for two years and have done a lot of Swedish research. There are three other web sites that might be of interest to you, one free, one with a very modest fee, one another data base with some differences from AD. They are Anbytarforum —
    aforum.genealogi.se/discus (free, but must register to post questions) where you can ask questions about people and places and get reading help with Gothic script and undefined words, DIS http://www.dis.se – – secondary source where people have contributed their genealogy and where you may encounter relatives, and SVAR Svensk Arkivinformation — http://www.svar.ra.se which is a complete data base of parish registers and hfl and also has early tax records from 1642 which sometimes will help extend a lineage before the parish examination rolls (hfl) begin. It also has National Census records from 1880-1910 or 20 for most provinces. If you are near a Family History Center where they have Arkiv Digital online, a subscription to SVAR might be be or more value to you. (Sorry, I can’t seem to find a way to make these addresses links here.)
    Norwegian detective films are broadcast here from time to time, so maybe you’ll be getting them on Hulu soon.
    Happy hunting!
    Martha

    Like

Sad news … I find most of the Internet Archive ‘stuff’ so full of weird symbols and bad formatting that it is unreadable …

Liked by 1 person

    I suspect you’re looking at the text version, which is the wrong side of the OCR problems that Bob mentions. Google’s text versions will be equally as bad. What you need to do is bring up the scanned images of the pages. It’s unfortunate that a search seems to land you on IA’s text version, rather than the corresponding scan image.

    Like

    Lynn, that’s probably the text version you’re looking at. If you go to the top of the page and click the title of the book, it will bring you to the main screen. You can then view an online scanned version of the book. If you scroll down, it will give you a bunch of different ways to download the book that looks like this:
    ABBYY GZ
    DAISY
    EPUB
    FLIPPY ZIP
    FULL TEXT
    KINDLE
    PDF
    SCAN FACTORS
    SINGLE PAGE ORIGINAL JP2 TAR
    SINGLE PAGE PROCESSED JP2 ZIP
    TORRENT

    Like

This is part of an article I’m writing, but has relevance here:
Books/Newspapers/Periodicals (Printed Matter) – There are many Internet sources for this kind of data – Mocavo, FultonHistory, Google Books, Archive.org to name a few. Not withstanding that the information they contain could be wrong, there is a basic problem in finding the data in the first place. There are 3 issues – the quality of the material to be scanned, the quality of the scanner and the quality of the OCR (Optical Character Recognition) software. I guess a fourth issue is the quality of the person(s) doing the work. The way it works is that each page of a book is scanned by a device that makes a picture of the page; similar to a camera. Then, compter software attempts to turn the text on each page into a word file that can be searched. In a perfect world, every letter of every word is converted correctly. But, even at 99.9% accuracy, millions of letters in a book can have a lot of words incorrectly converted. Let’s say there are 100,000 words, each with 6 characters. At 99.9%, 600 words could have one letter wrong. Of course, one of those words is your ancestor’s name. Here’s a way to improve your odds. assuming you find the book, check to see if there’s an index. Since most of the scanned books are old, because of copyright laws, and are difficult to scan clearly, I think you’ll find that more incidents of your ancestor’s name are in the index than you can find by on-line searching of the book. Mocavo permits multiple wildcards in it’s seach functionality; it really helps with these issues.

Like

    This is why one downloads the book – in colored digital images or in microfilm images and hope the latter’s images are not too dark or too light. Unfortunately, it must be in pdf files (I don’t have the newest high-tech gadgets or ‘readers’ so I don’t know how the other formats work), and pdf is one format I loathe.

    Ignore the bad OCR text formats unless you love editing and correcting what should have been edited and corrected by someone else before it was put online. Instead, do your own transcribing, and be sure you check for your own typos and misspellings.

    Like

Old news. As the link below explained several years ago, Google has been running out of books to scan and has pretty much completed its contracts with participating libraries. A slowdown for those reasons is hardly a “demise”.
http://chronicle.com/article/Google-Begins-to-Scale-Back/131109/

Like

    —> Google has been running out of books to scan and has pretty much completed its contracts with participating libraries.

    I suspect your statement is true in regards to Google’s contractual obligations. However, the company never fulfilled its initial promise to “digitize all the world’s books by the year 2020.”

    Like

    I just looked at the calendar. It’s still only 2015.

    Like

With respect to the OCR problem, there is the http://www.gutenberg.org Project Gutenberg and the associated volunteer who proof OCRed text of digitized books to create the eBooks (Distributed proofreaders http://www.pgdp.net/c/).

Like

DEBORAH H DAMELIO March 29, 2015 at 5:17 pm

Re: HathiTrust
It’s very easy to register, no cost, instant login.
Yes, it’s a bummer not being able to cut and paste, but on the left of a page showing text, there are options to download the whole book (if full view) or a page, or a number of pages, so I think those are great options.

(USA) I would choose Google’s display way over Internet Archive–altho the latter certainly has more books. Searching in G is a sinch. Copying is a sinch. IA’s print is hard to read and nearly impossible to search. I’ve run into many books that I’ve searched and it won’t even find a name that’s in the book. Indexes are great–if they are correct. I always use the page scan. If you get an OCR book in G, you can change it to page scan through the “A” options.
I’ll try the color in IA to see if it’s any better.

Like

to select passages to copy, just use your snip tool on your windows program, just search snip tool in search and it will pop up. I have mine pinned to my menu for ease in use. I love it. If I have various items I want to snip and save, I put each snip in my Word document and save them all at one time.

Like

Reblogged this on On Granny's Trail and commented:
I love using Internet Archive, so this is great news on that front.

Like

Excellent reading article thanks for sharing

Like

Leave a Reply

Name and email address are required. Your email address will not be published.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <pre> <q cite=""> <s> <strike> <strong> 

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: