NOTE: This is an update to an article I wrote eight months ago. I have added new information about several new methods of converting PDF documents to text in Windows plus two methods that will work for everyone, including Macintosh and Linux users. I have now switched from the program I was using eight months ago to a new one that I feel is better.
Converting a document to PDF format is a simple and free process, as described in my earlier “Convert Documents to PDF” article. However, converting in the opposite direction used to be more difficult. Luckily, several software tools are now available that will convert PDF files back to the original formats, be it Word’s DOC files, Excel’s XLS files, PowerPoint’s PPT files, Notepad’s TXT files, or others. You can maintain fonts, keep colors, and even preserve tables.
Extracting original data out of a PDF file would have almost impossible a few years ago. However, a number of products available today can do the job.
PDF files used to be considered "secure." That is, nobody could ever take your PDF document, import it into a word processor, and then use your data. However, that has now changed. In fact, I often use a program that easily converts PDF files to Microsoft Word files.
As an illustration of the changing security considerations of PDF files, here is an excerpt from an article that I wrote more than six years ago, in the February 25, 2002, edition of this newsletter:
By setting security options in Acrobat, the author can give his or her PDF documents a certain level of copy protection. One of the options available within Adobe Acrobat program that creates PDF files will prevent users from copying text or images, effectively disabling the normal “copy-and-paste” functions. Other options prevent users from printing the document or changing the features that the author has set. You can even set a password to prevent viewing by would-be users who do not have the password. To be sure, anyone who can view a document can always re-type the information by hand. However, PDF files make it very difficult to electronically extract bits and pieces of information from within a document.
I should point out that this protection is not 100% guaranteed. In fact, sophisticated hackers have succeeded in “cracking” Adobe PDF files and extracting the original information. However, a lot of software skills are needed to “crack” a PDF file. Even owners of the Adobe software that creates PDF files cannot easily “crack” a PDF file created by someone else. Only a handful of people have ever managed to open a PDF file.
My, how the world has changed in six short years! Today there are a number of programs that will extract data from a PDF file. Adobe has since given up all ideas of protecting their file format. In fact, PDF now is an open standard and is becoming ISO 32000.
The first programs to appear for extracting data from PDF were difficult to use. A person had to be a techie with a lot of knowledge of the underlying technology in order to use most of them. Even then, the data extracted often lost its formatting or looked a bit "weird" after being extracted. With most of these programs, the user still needed to do a lot of "clean up" work.
In the past few years, several new programs have appeared that are easy to use and require little technical knowledge. You can now easily convert PDF files to Word files or other formats with a simply point-and-click.
I often use a program that is so easy to use that anyone who can use a word processor can extract a PDF file. There is no need for deep technical knowledge. You do not need Adobe Acrobat or Reader to use this new program. In addition, 99.9% of the text, formatting, and images are preserved. The converted file will look exactly like the PDF file, except that you can then edit it and add to it as you wish. In short, using this program is almost as simple as falling off a log.
I even used this program to convert an entire genealogy book of several hundred pages from a PDF file to Microsoft Word.
The remainder of this article is for Plus Edition subscribers only.
If you have a Plus Edition user ID and password, you can read the article right now at no additional charge in this web site's Plus Edition at http://plus.eogn.comIf you do not remember your Plus Edition user ID or password, you can retrieve them at the same place: http://plus.eogn.com.
If you decide to subscribe to the Plus Edition right now, you will be able to immediately read this article online.
For more information about subscribing to the Plus Edition of Eastman's Online Genealogy Newsletter, visit http://blog.eogn.com/eastmans_online_genealogy/plusedition.html.
Recent Comments