(+) OCR Explained

The following is a Plus Edition article written by and copyright by Dick Eastman. 

Do you have a document or even a full-length book that you would like to enter into a computer’s database or word processor? You could re-type the entire thing. If your typing ability is as bad as mine, that will be a very lengthy task. Of course, you could hire a professional typist to do the same, but that is also expensive.

We all have computers, so why not use a high-quality scanner? You will also need optical character recognition (OCR) technology.

OCR is the technology long used by libraries and government agencies to make lengthy documents available electronically. As OCR technology has improved, it has been adopted by commercial firms, including Archive CD Books USA, Ancestry.com, ProQuest (producers of HeritageQuest Online), Google Books, and many other companies.

For many purposes, OCR is the most cost-effective and speedy method available. OCR is much better and cheaper than hiring an army of clerk typists. In some cases, you may be able to have an image of a document converted to text free of charge by using OCR services “in the cloud.” OCR does, however, have drawbacks.

OCR is actually the second step in the conversion process. The first step is to scan the document or book in question, much the same as you would scan a photograph. The scanner converts each printed page to a bitmap file, a pattern of dots that actually comprise an electronic image of the page. Software that comes with the scanner stores the file on the computer’s hard drive in TIFF, JPG, or some other image format.

Next, specialized optical character recognition (OCR) software is used to examine every word the image and convert it to text. Older OCR software would compare the individual letters in a stored image against stored bitmaps of specific fonts.

The remainder of this article is for Plus Edition subscribers only. SUBSCRIBE NOW to read this article.

If you have a Plus Edition user ID and password, you can read the full article right now at no additional charge in this web site’s Plus Edition at http://eogn.com/wp/?p=31639. This article will remain online for several weeks.

If you do not remember your Plus Edition user ID or password, you can retrieve them at http://www.eogn.com/wp/ and click on “Forgot password?”

If you decide to subscribe to the Plus Edition right now, you will be able to immediately read this article online. What sort of articles can you read in the Plus Edition? Click here to find out.

For more information about subscribing to the Plus Edition of Eastman’s Online Genealogy Newsletter, visit http://blog.eogn.com/subscribe-to-the-plus-edition.

Follow

Get every new post delivered to your Inbox.

Join 7,503 other followers

%d bloggers like this: