The following announcement was written by MyHeritage:
We are pleased to announce the publication of a huge collection of historical U.S. city directories — an effort that has been two years in the making. The collection was produced exclusively by MyHeritage from 25,000 public U.S. city directories published between 1860 and 1960. It comprises 545 million aggregated records that have been consolidated from 1.3 billion records, many of which included similar entries for the same individual. This addition brings the total number of historical records on MyHeritage to 11.9 billion records.
The new city directories collection on MyHeritage is a rich source of information for anyone seeking to learn more about their family in the United States in the mid-19th to mid-20th century. The directories contain valuable insights on everyday American life spanning the time period from the Civil War to the Civil Rights Movement.
What are City Directories?
Cities in the United States have been producing and distributing directories since the 1700s as an up-to-date resource to help residents find local individuals and businesses. City directories typically list names (and spouses), addresses, occupations, and workplaces. Sometimes they include additional information.
Thanks to their level of detail, city directories can provide a viable alternative to U.S. census records during non-census years, as federal censuses are taken once every ten years, and in many cases city directories were published annually. They can also fill in the gaps in situations where census records were lost or destroyed. In 1921, a fire at the U.S. Department of Commerce destroyed most of the records from the 1890 census. Despite the loss of the records in the fire, much of the data can be reconstructed using the 1890 city directories on MyHeritage, which consist of directory books from 344 cities across the country, including 88 of the 100 most populated cities during that year.
Unique processing by MyHeritage
The city directories in this collection were published by thousands of cities and towns all over the U.S., and each directory is formatted differently. The huge amount of content and its variety made the project more challenging and required the development of special technology to process the city directories.
We first used Optical Character Recognition (OCR) to convert the scanned images of the directories into text. This process can result in errors in the output, and we created algorithms to detect and correct some of these errors.
Then, we needed to parse the records to identify the different fields in each record: names, occupations, addresses, and more. The differences in formatting between the books presented an additional challenge. Our team employed methods such as Name Entity Recognition (NER) and Conditional Random Field (CRF) to train an algorithm using a per-book model — meaning that for each of the 25,000 books, we manually labeled a sample of the records and used it to train the algorithm how to parse that directory. Using this model, the algorithm was able to parse the entire book into a structured index of valuable historical information.
In the example below of a city directory record for Ralph McPherran Kiner, an American Major League Baseball player and broadcaster, we see how our system overcame and corrected an OCR error. The incorrect address in the 1957 record is 55801 Yorkshire av, whereas the 1958 and 1960 records list the address as h5801 Yorkshire av, and the “h” implies that Ralph is the homeowner. We inferred that the first “5” in the first record was an OCR error and should actually be an “h”, and were therefore able to determine that Ralph lived at the same address during these years.
Consolidating records and creating a searchable index
After all the information was parsed, we consolidated the records in an unprecedented way. We identified records thought to describe the same individual who lived at one particular address over several years, as published in multiple editions of the city directories. We then consolidated all of those entries into one aggregated record that covers a span of years. This reduced “search engine pollution,” wherein a search for a person would have returned multiple, very similar entries from successive years, obscuring other records. The aggregation makes it easier to spot career changes, approximate marriage dates, re-marriages, and plausible death dates. To our knowledge, the algorithmic deduction of marriage and death events from city directories is unique to MyHeritage.
In the example below, we consolidated 31(!) records from the years 1912–1959 into a single record. Based on the information collected over the years, it is likely that Alfred and Mary Albert married circa 1914. We were also able to determine that Alfred died circa 1959.
The aggregated record also shows that Alfred changed his profession several times during these years, and he went from being a conductor to a carpenter to a motorman.
This is the power of consolidation: it converts many “dull” records into a single, rich biography that tells a life story!
Examples of challenging problems – and how we solved them
Many published city directories saved typesetting (which was expensive) and paper by using a symbol to indicate that multiple entries had the same last name, such as ditto marks or dashes. Some entries continued onto a second line, while others occupied only one. The algorithm had to understand the difference between surname text and the text that often appears directly below it.
For instance, in the example below, the record extraction algorithm successfully inferred that Bartsch is a surname and that the ditto mark in the next line also means Bartsch.
The algorithm also determines where a record begins and where it ends. For example, the record below spans one line:
This record, however, spans two lines:
If the algorithm hadn’t inferred this, we would have created an additional record for “Waller” and missed identifying it as the street name in the record about Wm F. While this process works very well, there are still some directories in which this type of record extraction is not 100% robust.
A table of common abbreviations appears at the beginning of each city directory, listing abbreviations for names, occupations, residence status, and addresses that are used throughout the book. The records are often hard to decipher without the use of the abbreviation tables.
To integrate the abbreviation tables into the collection, we manually keyed in the table from each book and used it to expand the abbreviations in the records.
Our handling of first name abbreviations in this collection is particularly helpful, because if you’re searching for a “Patrick”, we’ll find him for you even in records where he’s listed as “Patk”, so that you won’t have to think about all the possible ways to search for each name – we’ve got you covered!
In the following example, we’ve expanded the abbreviations for the occupation sten to stenographer, clk to clerk, the workplace Fla Natl Bank to Florida National Bank, and residence status r to rents. This improves readability and enables searching and matching to family trees with much higher accuracy.
Important insights from the collection
Inferred life events
Consolidated city directory records enabled MyHeritage to automatically infer dates of marriage or death based on changes in the record data.
In the example below, Henry Bennett from Oakland, California most probably got married in late 1923 or early 1924, and the Oakland City Directory from 1924 lists Nancy as his wife. We therefore created a marriage event with Nancy clearly marked as implicit, dated circa 1924.
In the example below, Matthew and Sally Lewin are listed as spouses and reside together at 305 New Scotland Ave in Albany, New York until 1945. In the 1946 listing Sally appears as widowed, so we inferred that Matthew died circa 1946.
Change in homeowner status
Throughout the records we can see if the person living at any address was a renter, denoted by an “r” in most records, if they were a boarder, denoted by a “b”, or if they were the homeowner, denoted by an “h”.
By following a consolidated record over the years, we could see if someone changed from renting to owning their home at the same address.
In this example, we see that James Thompson was a renter until 1921. Sometime between 1921 and 1923 he became the owner of his residence.
Finding others who lived at the same address
The city directories collection allows users to see who else has lived at the same address. Simply click on “See who else lived at this address” in the record page to run a search by address.
This feature can be useful for locating ancestors, descendants, or other family members of the person you are researching who lived at the same address in other periods. Often multiple generations of a family lived at the same address, or a family home may have been passed on from one generation to the next.
In the following example, James and Glenna Japhet lived at 623 W Olmos Drive in San Antonio, Texas.
When checking to see who else lived at the same address in city directory records, we see that aside from James and Glenna, another person with the last name Japhet is also listed in the directories as having lived at that address: a woman named Laverne Japhet.
It seems as if Laverne is either James’ second wife or the same person as “Glenna L”. This opens new avenues for more research.
Searching the U.S. City Directories is free, but a subscription is required to view the records.
Users with a Data or Complete subscription can view the full records including the high-resolution scans of the original directories, confirm Record Matches, extract information from the record straight to their family trees, and view Related Records for the person appearing in a historical record they are currently viewing.
The U.S. City Directories collection on MyHeritage is a treasure trove for anyone searching for more information about their ancestors in the United States. We have worked very hard to prepare this collection for our users, and believe it is the smartest online U.S. city directory collection ever made. Over the next few months, we are planning to expand this important collection even further by publishing thousands of additional city directories. This addition will include directories from more cities, and directories published prior to 1860 and after 1960.