A newsletter reader asked a question that I receive frequently. Here is a (slightly edited) copy of her message:
“I’d love to know how you handle the thousands of .JPG images of genealogy document scans and how to attach sources to them. I tried copying my .JPGs into Word, adding a title and source as text boxes. It was easy enough, but Word degraded the .JPG image so much that writing from earlier documents was almost unreadable. I’m trying it now in PowerPoint files with much better luck. I maintain .JPG integrity, can add titles and sources, and have multiple pages. I can copy the .JPG into other formats or convert the file into a .PDF. I would still love to know what you use before I get too involved in this format.”
I did answer her in email, but I thought I would also share my answer here in case others might have the same questions:
You can find dozens of methods of storing and labeling files. In fact, there are several sophisticated document management systems available for Windows, Macintosh, and Linux that are designed just for that purpose. These programs are popular in corporations that need to keep track of thousands, perhaps millions, of documents. However, most of these programs cost more money than I care to spend. Also, with most of these software solutions, you are “locked into” the producing company’s methodologies more-or-less forever. Converting from one company’s (older) solution to a newer solution provided by a different company can be a complex procedure.
My needs are a bit simpler. I have about 30,000 pictures and documents stored on my hard drives and find that I can locate most any of them within seconds, should I need to. My method is free and is very flexible. I also believe I could easily convert it to a different system in the future, should I ever decide to do that.
First, let me emphasize that I don’t believe there is any “perfect way” to file any kind of document or image. As long as the method you choose works for you, it is a good method. However, I do find it helps to be consistent. My method certainly is simplistic, but it works well for me. It may or may not work for someone else who has more demanding requirements.
First, I don’t focus only on images. I use the same filing system for ALL my documents, including newsletter articles, checkbook records, tax records, receipts, insurance papers, the electric bill in yesterday’s mail, pictures taken with my cell phone’s camera, scanned images, and most other items I create. I mix images and word documents and text files and more all together, grouping them by topic, not by file type.
I am not afraid to use subdirectories of subdirectories of subdirectories, sometimes going four, five, or more levels deep into the file tree. In my mind, this is similar to using physical filing cabinets in which every filing cabinet, and drawer has a number or a name. Use of logical file and folder names leads to an easily-retraced path to the document needed at the moment.
I put a lot of effort into folder and file names, trying to make them as descriptive as possible. For instance, here are some of my folder names:
NOTE: These examples work for Macintosh or Linux file names, Windows users could use the same names except that the slashes have to go in the opposite direction (\).
/Dropbox/genealogy/reunions/1958 Eastman family reunion/
/Dropbox/genealogy/Eastman/1842 deed for Washington Harvey Eastman/
/Dropbox/genealogy/1880 U.S. census records/
/Dropbox/genealogy/1880 U.S. census records/Maine/Bangor/
/Dropbox/vacations/2010 genealogy cruise/
I generally try to make file names equally descriptive, and I don’t hesitate to make long file names. Here are some examples of file names, including the directory path to each file:
/Dropbox/genealogy/1880 U.S. census records/Maine/Corinth/Page 24.JPG
/Dropbox/Income Taxes/2016/Federal/Complete return.PDF
/Dropbox/appliances/Samsung television/owners manual.PDF
You can see the pattern in the above examples.
I also try to digitize every scrap of paper in my possession. My goal is to keep no paper documents at all. Once a file has been scanned, I usually destroy the original unless it is something of unique value. Telephone bills received in the mail, my own hand-written notes from research trips, and similar documents that have no special value to me are destroyed as soon as I scan them. However, I do save all old original documents or anything else that I think is valuable to me or possibly to someone else. If it is an original wedding certificate or something similar, I either save it myself or give it away to someone else who cares about that document.
So far, I have achieved about 95% success with the digitization efforts, but I still must keep a few things on paper, such as automobile titles, drivers license, my passport, birth certificates, and similar documents.
One group of files that requires specialized file names is in my Receipts folder. These are digitized items that I save for income tax documents. I try to add the date of the expense and the dollar amount in the file title, such as:
/Dropbox/Income Taxes/2018/Receipts/2018-07-12-$124.95-Southwest Airlines ticket to Denver.PDF
/Dropbox/Income Taxes/2018/Receipts/2018-08-18-$34.10-Prescription for Metoprolol
By writing the date in YYYY-MM-DD format, such as 2018-08-18, the files always display in the Finder on a Mac or Windows Explorer on Windows in chronological order. I also add the dollar amount in the file name as I find that speeds things up at tax time, when I only need to see the amount and payee of each receipt and do not need to look at each receipt’s image. Of course, the complete images of all receipts are still available in case of an IRS audit.
NOTE: The Internal Revenue Service does not care about seeing ORIGINAL receipts. In case of an audit or other requests, the Internal Revenue Service PREFERS electronic copies or, if no other choice is available, photocopies. The tax examiners do not want to wrestle with hundreds of pieces of different-size pieces of paper. Looking at PDF files or JPG files is much more convenient for the tax examiner. Details may be found at: https://blog.expensify.com/2010/03/02/electronic-receipts/.
Dollar signs in file names work well on Macintosh but not so well on Windows. On a Windows system, the $ in a file name represents a hidden or administrative share. I would suggest that Windows users avoid dollar signs. However, even Windows users can write the file names without the dollar sign, such as:
/Dropbox/Income Taxes/2018/Receipts/2018-07-12-124.95-Southwest Airlines ticket to Denver.PDF
I don’t use dates in many other file names, only in items where dates are very important, such as receipts that I save for tax purposes. I do sometimes add dates to file names of photographs in order to record the date each photo was taken.
Nothing is ever perfect. In a few cases, I might store duplicate copies of files. However, I try to minimize duplicates whenever possible.
I typically leave the images in whatever format that they were created in, be it .JPG or .PDF or some other file type. Common image file formats can always be converted later to most any other format if you have a specific need for a particular file format.
Some people question how to add descriptive metadata to a digital image. I never do that. I don’t see much need to combine images and notes about that image into one file. I have a simplistic method for storing notes about images. When I wish to save notes about a particular image, I use simple text (.TXT) files to contain my notes and then store the notes in the same folder as the image and with duplicate file names of the images. However, the file name extension is not duplicated.
For instance, I might store an image of an 1880 census record as a .JPG file and the notes about that image stored as a text file with (nearly) the same name and filed in the same folder:
The image might be saved as:
/Dropbox/genealogy/1880 U.S. census records/Maine/Corinth/Page 24.JPG
and the accompanying text notes are then stored as:
/Dropbox/genealogy/1880 U.S. census records/Maine/Corinth/Page 24.TXT
The two file names are identical except that the image is a .JPG file extension while my notes have a .TXT file extension. I prefer to use ASCII text notes (.TXT) although I could use Word documents (.DOCX) or any other word processing format. I think that .TXT files will still be around long after .DOCX files have been replaced by something newer, so I prefer .TXT files. However, that’s based solely on guesswork and my personal preferences. You may prefer a different format.
NOTE: Anyone can create .TXT files with Windows NotePad or Macintosh TextEdit. However, more sophisticated (and free) text editors are available from several places. I prefer BBEdit on the Macintosh and NoteTab on Windows. You may find a different product that works better for you.
I do use software tools to quickly find any document I wish to retrieve. Every Macintosh has Spotlight installed while later versions of Windows systems have a loosely similar Search program that can be invoked by clicking on START and then on SEARCH. In both cases, the files on the hard drive have already had every word indexed, and you can quickly locate any words or phrases in a file by typing the words or phrase into the search box. If I am looking for any files about Bangor, Maine, I open the search box and type: “Bangor Maine.” That should locate all files that contain those two words inside the file(s) or as part of the file names.
NOTE: These programs are great for looking inside text files, word processing files, spreadsheets, and other text-oriented files; however, they won’t find those words inside a picture.
An excellent tutorial for searching text files on a Macintosh may be found at: https://superuser.com/questions/72774/search-through-text-files-in-mac-os-x.
Of course, I back up every single file in multiple places. On my Macintosh, I use a free program that ships with every Mac, called Time Machine. It backs up every file on the Mac’s hard drive to an 8-terabyte USB hard drive that I plugged into a USB port on the computer. It not only saves present files, but it also saves every version of every file that existed in the past. If I accidentally deleted a file last year, Time Machine still has a backup copy of that old file available.
Time Machine stores everything until the USB hard drive fills up. As your backup drive begins to fill up to its capacity, Time Machine intelligently deletes the oldest backups to make room for newer ones (and will alert you if the “Notify after old backups are deleted” option is selected in Time Machine preferences). So far, my 8-terabyte external USB hard drive hasn’t filled up; so, I have copies of all files for the past five+ years since I purchased the Mac and the external USB disk drive. There are a few limitations about old files. Details may be found on Apple’s support site at http://support.apple.com/kb/HT1427.
In addition, all my documents and images are stored in a Dropbox folder or in a folder used by some other service that saves files in a private and secure area of the cloud.
NOTE: You can find an excellent article that compares several of the more popular cloud-based file storage services at: https://www.zdnet.com/article/best-cloud-storage-services/.
I don’t save all files in Dropbox. Instead, I only save the documents and images that are important to me. The result is that all the specified documents and images are copied to Dropbox.com’s web servers in the cloud for off-site storage. (I do pay for extra file space on Dropbox. I consider this to be a cheap investment to protect my files. In fact, paying for 50 gigabytes of space on Dropbox is cheaper than buying a hard drive and is more reliable and more secure as well.)
If I ever need to retrieve one or more of my files, even when traveling and away from my computer, I can retrieve them at any time by opening a web browser on anyone’s computer by going to www.Dropbox.com and logging in with my user name and password. I can even retrieve files to my cell phone or tablet computer via wireless networking. Dropbox also copies every one of those documents and images to my laptop’s hard drive the next time I power on the laptop and connect to the Internet. Therefore, I always have an additional backup copy on the laptop.
Today’s cloud-based backup services generally provide more security than simply storing the same things on your computer’s hard drive, where it is accessible to hackers around the world as well as to babysitters, carpenters, plumbers, delivery men, and other visitors to your home. I once had a laptop stolen from the trunk of my automobile, so the thief obtained easy access to all the files on my laptop. Off-site storage in the cloud is much more secure than that. (I also now use encryption on my new laptop.)
If you are concerned about the security of your files while stored on Dropbox’s servers, read the section about security on the Dropbox web site at: https://www.dropbox.com/security. Amongst other things, that web page states:
“Dropbox protects files in transit between our apps and our servers, and at rest. Each file is split into discrete blocks, which are encrypted using a strong cipher.”
Of course, Dropbox has many competitors these days and almost all of them have somewhat similar security policies. However, I would always read the security policy of any service before signing up for it in order to make sure I am comfortable with the company’s security policy. In addition, I manually encrypt a few of my more sensitive documents before moving them to my Dropbox folder.
In short, I trust cloud file storage services, such as Dropbox. Some other people who do not understand encryption are not as trusting. You need to make up your own mind about security.
Finally, I also back up all documents and images to still another cloud-based backup service: Amazon S3. If I wished, I could create even more backups to flash drives, portable hard drives, CD-ROM or DVD-ROM images, or whatever I wish. One can never have too many backups! A fire, flood, tornado, hard drive crash, or burst water pipe will never destroy all my files. I worked hard to create them, and I want to make sure they are available to me forever.
In summation, I find that grouping images and documents within logical and easy-to-remember folder names works for me. I probably have 30,000 or so images and documents filed that way. Again, this “system” might not work for others, but it works well for me.