MyHeritage (the sponsors of this newsletter) today made an announcement at the Who Do You Think You Are? Live! conference in Birmingham, England, of a new technology that has not been used previously in genealogy, to my knowledge. So far, MyHeritage is the only company with a cloud-based service that AUTOMATICALLY finds matches for people in your family tree in the company’s collection of almost 450,000 digitized historical books. Yes, that’s nearly a half million books of interest to genealogists.
Lots of other web sites contain digital images of books, typically as PDF files. These sites allow you, the user, to find the books and then to read them yourself the old-fashioned way: one page at a time. Some of these web sites will search books electronically to find words or phrases to define. Some of them are every-word-searchable but, again, only one book at time. The user has to enter the words or phrases one at a time in order to search a book. That is last year’s genealogy technology.
In contrast, MyHeritage’s new Book Matching technology searches ALL your names of interest in your database at once in nearly a half million books and then notifies you of probable matches found within any of the books. The technology goes far beyond simple name matches, which we all know is close to useless on common names. Instead, Book Matching first reads the records in your personal MyHeritage database, then goes looking for entries in the books that match not only the name of the individual but also dates, locations, and even names of parents, spouse(s), siblings, and children.
If you are looking for an ancestor named John Smith, MyHeritage’s Book Matching technology will find all the John Smiths who lived in the same place and same years as your ancestor PLUS had a father named William AND a mother named Julia AND a sister named Helen AND a brother named Ebenezer AND a son named Gabriel AND a daughter named Julia (named after her mother) AND on AND on AND on… Obviously, I am making these names up as a typical example. The Book Matching technology uses the names in YOUR database to find matches. When it finds probable matches to your ancestor and his or her relatives as defined in your database, a notice is sent to you.
The real “magic” in this is that MyHeritage uses new software that utilizes artificial intelligence methods to convert free form text into machine-searchable data. This is more difficult than you might first think.
Almost all genealogists are familiar with census records, tax lists, and other records that list names in nicely formatted columns of information. They look like spreadsheets with all the first names in one column, all the surnames in a second column, and so forth. In contrast, MyHeritage’s Book Matching technology searches books where the information is NOT listed in columns.
For instance, let’s think about the typical obituary. Again, it does not list the names of everyone in attendance in columns. Instead, the information is listed in a manner that closely resembles human speech.
In a typical obituary, the Book Matching technology searches for phrases such as “He was the son of…” and then software knows those words are normally followed by a name or followed by two names: the father and mother of the deceased. The Book Matching technology extracts those names and indexes them. Also, in the same obituary, it might say, “He is survived by his widow …” and “He is also survived by three children…” Again, these phrases are normally followed by names. Obituaries also often list the names of others in attendances, such as the deceased person’s siblings and even pallbearers, as well as friends or relatives of the deceased.
Even better, the same obituaries often mention significant events in the life of the deceased, such as “a graduate of Harvard University” or “served in World War I” and so on and so on. This data is analyzed, discovered, extracted, and indexed, all without human intervention. The number of errors in the OCR process appear to be a very low number. Not zero but certainly a very small percentage.
I am using an example here of obituaries simply because most genealogists are familiar with them. However, MyHeritage’s new Book Matching technology is NOT limited to obituaries. It can read almost all formats that are commonly found in books of genealogical interest. The mention of each person in the book has the names, dates, locations, and other important information (military service, education, occupation, and more) extracted, indexed, and saved.
Am I enthused about this new automated data analysis and extraction technology? You bet I am! I suspect a number of other genealogy web sites will follow and add similar technologies within a few years. However, today Book Matching technology is available today only on MyHeritage.
The following is the official announcement from MyHeritage:
MyHeritage users to automatically receive relevant excerpts from digitized books that reveal information about their ancestors and relatives
TEL AVIV, Israel & LEHI, Utah, April 7, 2016 — MyHeritage, the fastest-growing destination for discovering, preserving and sharing family history, has launched today a revolutionary addition to its suite of technologies: Book Matching. This innovation automatically researches users’ family trees in historical books with high precision.
In April 2012 MyHeritage launched SuperSearch™, a search engine for historical records, which has since then grown to include 6.6 billion historical records, including birth, marriage, death and census records. By implementing its vision of enhancing genealogy with technology, MyHeritage then developed a line of unique and sophisticated technologies that automatically match the records from the search engine to the 32 million family trees uploaded by its users.
In December 2015, MyHeritage expanded its data collections to include digitized historical books, with an initial corpus of 150,000 books of high genealogical value. This collection was tripled last week to 450,000 books with 91 million pages. With a team of more than 50 dedicated curators, MyHeritage aims to add hundreds of millions of pages of digitized books to the collection each year.
As of today, MyHeritage users will receive matches between profiles in their family trees and the books from this collection. The Book Matching technology analyzes the book texts semantically, understanding complex narrative that describes people, and matches it to the 2 billion individuals in MyHeritage family trees with extremely high accuracy. This breakthrough technology is the first of its kind, and is exclusive to MyHeritage.
Book Matching has produced more than 80 million matches, and this number will continue to grow as the collection grows and as the family trees on MyHeritage continue to expand. Book Matching is currently available for English books, and the technology is being enhanced to cover additional languages. In addition, de-duplication technology is being added in the next few weeks to remove duplicate books that have been scanned and OCRed more than once by different sources.
“No one has ever done this before,” said MyHeritage Chief Technology Officer, Sagi Bashari. “Our Book Matching technology reads hundreds of thousands of books for you, every hour, comparing them to your family tree and pointing you to relevant excerpts about your ancestors with almost no false positives. MyHeritage is the first to offer full semantic text analysis in this way, and the genealogical breakthroughs speak for themselves. You will be amazed at the value of books for your research.”
“I’ve personally seen what this new technology can do, using my own family tree,” said blogger and lifelong genealogist Leland Meitzler. “It found well over 500 books with information on my family, most of which I’d never seen before. All kinds of ancestors and relatives can now be added to my tree! To say that this new search technology changes everything would be an overstatement, but not by much.”
Genealogist James Tanner said: “This advanced technology from MyHeritage opens up a whole new world of research possibilities that were almost completely unavailable in the past. I have always valued the content of the older genealogy books because the people who wrote them were contemporaries with my ancestors. Being able to search these books on a large scale will change the way most of us have been doing genealogy and our attitude towards the books that have been there all along but were not searchable.”
Dick Eastman, of Eastman’s Online Genealogy Newsletter, summed up MyHeritage’s latest innovation: “MyHeritage Book Matching is like having a huge library at your fingertips, with a twist; there is a magical librarian who tells you exactly which books have information about your ancestors.”
Book Matches are available at www.myheritage.com and are generated automatically for any family tree built on the website or imported into it. A Data subscription is required to view Book Matches.
MyHeritage is the world’s fastest-growing destination for discovering, preserving and sharing family history. As technology thought leaders, MyHeritage is transforming family history into an activity that’s accessible and instantly rewarding. Its global user community enjoys access to a massive library of historical records, the most internationally diverse collection of family trees and groundbreaking search and matching technologies. Trusted by millions of families, MyHeritage provides an easy way to share family stories, past and present, and treasure them for generations to come. MyHeritage is available in 42 languages. www.myheritage.com.