GEDCOM Explained

I frequently mention the acronym “GEDCOM” in this newsletter. This week a reader wrote to me with an excellent question: “What is GEDCOM?” I realized that I haven’t explained this buzzword in a long, long time. So, here is a brief, non-technical explanation of the term for the newer subscribers to this publication.

GEDCOM is an abbreviation that stands for GEnealogy Data COMmunications. In short, GEDCOM is the language by which different genealogy software programs talk to one another. The purpose is to exchange data between dissimilar programs without having to manually re-enter all the data on a keyboard.

To illustrate the importance of GEDCOM, step back in time with me for a moment.

Back before the invention of GEDCOM and before the invention of the home computer, I used 80-column punch cards to record the names and limited information about 200 or so of my ancestors. I did this after work hours in my employer’s data center. I then used the employer’s mainframe computer that cost hundreds of thousands of dollars to sort the data and to print a few crude reports. Luckily for me, my employer allowed me to use all the mainframe time I wanted late at night, after the company finished its daily work.

Around 1980, I built my own home computer. I decided to put my genealogy database onto the new system, but it would not read the 80-column punch cards I had used earlier. I manually re-typed every bit of data into a dBASE-II program that I wrote. My had discovered more ancestors by that time, so I had to enter manually data about 400 or so individuals. I stored the information on 8-inch floppy disks attached to my homemade 8-bit CP/M computer, which had 64 Kb (kilobytes) of memory.

Some time later I discovered a CP/M genealogy program that would operate on my home computer. (CP/M was an operating system that was popular before MS-DOS which, in turn, was popular before Windows.) Unlike my crude, homemade dBASE-II program, this new genealogy program printed pedigree charts, family group sheets, and other reports. I decided to convert to the new, more powerful program (although I must say that it was rather elementary when compared to today’s powerful programs). At this point my database had grown to about 600 individuals, and I could not find any method of easily copying that data into the new program. I first printed out the information from the dBASE-II database. Then I sat at my computer for several evenings, reading the information on paper and re-typing every bit of it into my new program.

I bet you can guess the next step: I purchased an IBM clone in 1984 and decided to move my data to this new powerhouse. After all, it had 640 kilobytes of memory and a 20-megabyte hard drive, which I was certain that I could never fill. Having been rather active in my genealogy research, I now had information about 1,200 people to re-enter. I printed out the entire database from the old system onto paper and then manually re-typed it into the new PC powerhouse. That effort took weeks, and I promised myself, “Never again!”

Newer genealogy programs appeared in the following years, each with new features that I found enticing. However, I continued to use the same program simply because I didn’t want to go through the keyboard effort again.

Roughly twenty-five years ago, the Church of Jesus Christ of Latter-day Saints announced something new: a file format called GEDCOM. This new proposed standard file format was designed to allow different genealogy programs to exchange data. There was only one problem at the time: the only program that could read and write GEDCOM data was the one written by the Church of Jesus Christ of Latter-day Saints.

GEDCOM is a standard, not a program. As such, genealogy programs that are going to use the same data have to be written by the programmers to handle GEDCOM files. If you are trying to transfer data from one program to another, only to discover that one of the programs does not support GEDCOM, you are out of luck. To complete the exchange of data, both programs have to support GEDCOM.

Slowly, over a period of several years, other genealogy programs began to add the ability to read and write GEDCOM files. It became possible to move data from one genealogy program to another without manually re-typing everything. Now you can simply export your file from one genealogy program in GEDCOM format and then import that GEDCOM file into another genealogy program.

GEDCOM files usually have a file name ending in “.GED”, such as myfamily.GED.

You can use GEDCOM files to exchange genealogy data with your distant cousin in Poughkeepsie as well as to upload data to the many online databases.

The author of the genealogy program that I used never did add GEDCOM capability. Luckily for me, someone else eventually wrote a small routine that would export data from this program in GEDCOM format, and I was then able to move my data to increasingly powerful new programs.

By 1990, I was writing articles on CompuServe, advising everyone to never use a genealogy program that lacked GEDCOM capabilities. Luckily, that is no longer an issue. All of today’s genealogy programs will import and export GEDCOM data. Data transfer may still be a problem for those using older genealogy programs without GEDCOM capability; many people still find their data trapped in these “islands.” For them, there is no easy solution.

Unlike the “dark ages” of the 1980s, it is now common for people to use two or three or even more genealogy programs, including those on handheld smartphones and tablet computers. You may find one program that you prefer to use for storing all the bits of information that you encounter in your research efforts. However, you might prefer the printed reports or multimedia scrapbook features of a different program. Thanks to GEDCOM, you can easily move your data from one program to another. You can also share information with distant cousins using yet other genealogy programs by sending GEDCOM files to each other by e-mail.

The instructions for creating or reading GEDCOM files will vary from one program to another. You need to consult the program’s HELP files to find the exact sequence of instructions your genealogy program requires.

GEDCOM files can be read by a human although it would be tedious to do so. Here is an extract from the beginning of a typical GEDCOM file:

1 SOUR Legacy
2 VERS 4.0
2 NAME Legacy (R)
2 CORP Millennia Corp.
3 ADDR PO Box 66
4 CONT El Mirage, AZ 85335
1 DEST Gedcom55
1 DATE 16 Oct 2004
1 SUBM @S0@
1 FILE Kennedy.ged
0 @S0@ SUBM
1 NAME Not Given
1 ADDR Not Supplied
0 @I1@ INDI
1 NAME Joseph Patrick /Kennedy/
2 GIVN Joseph Patrick
2 SURN Kennedy
2 DATE 6 Sep 1888
2 PLAC Boston, MA
2 SOUR @S2@
3 PAGE pg 56
3 QUAY 3
2 DATE 18 Nov 1969
2 PLAC Hyannis Port, MA

(rest of file omitted)

The file contains genealogy data in a structured format. It utilizes numbers to indicate the hierarchy and tags to indicate individual pieces of information within the file. A number of zero indicates the first line within a single record, and the letters, or tag, after the zero indicate the type of record. The top line in any GEDCOM file is the HEADER record, indicating that it is the beginning of the file. Words that are more than four letters long are typically abbreviated. In this case, the word HEADER is written as HEAD.

A number ”1” shows that the line in question is one level below the “zero” line. This indicates that this line is one level subservient to the zero line and contains additional information. In the case of the second line in the above file, the entry of “1 SOUR Legacy” indicates that this file was created by (SOURCE) Legacy, a popular genealogy program for Windows.

The number “2” on the next line shows that it is subservient to the preceding line with a number 1 in it. In this case, the line of “VERS 4.0” indicates that the file was written with version 4 of Legacy. Below that you see a line labeled ADDR (address) and another labeled CONT (the previous line is CONTinued here).

Scanning a bit further down the file, you will see the following:

0 @I1@ INDI

Again, the zero indicates this is the beginning of a new record. The “at” signs bracket the record number. In this case, the record is of an INDIvidual, and it is individual #1 (I1) in the database. Succeeding lines show events, such as birth, marriage, and death, along with subsequent data listing dates and places. You will also note an entry of “2 SOUR @S2@,” which indicates that a source citation for the event can be found in SOURce entry S2 to be found later in this file.

INDI, NAME, BIRT, DEAT, SEX, SOUR and the other record types are called GEDCOM “tags.” There are many available tags within the GEDCOM standard and even a capability to create user-defined tags for those situations not covered by the standard. Of course, user-defined tags are usually not understood by the receiving program, so they seem to be somewhat useless. They may help define data within the program in which they were created, but they will not translate to a new program via the GEDCOM format.

This is a very abbreviated explanation of the internals of a GEDCOM file. You can a detailed explanation at

Later versions of GEDCOM have been proposed but have never been widely implemented. In February of 2012 at the RootsTech 2012 conference, FamilySearch outlined a major new project around genealogical standards called GEDCOM X, and invited collaboration. In August of 2012 FamilySearch employee and GEDCOM X project leader Ryan Heaton dropped the claim that GEDCOM X is the new industry standard, and repositioned GEDCOM X as another FamilySearch open source project. However, no genealogy programs that I am aware of support GEDCOM X.

You can read more about GEDCOM X at

You need to be aware that the creation of the GEDCOM standard was not a perfect implementation. For one thing, not all the data fields are specified precisely in the GEDCOM specifications. Next, not all the programmers of the various genealogy programs interpreted the specifications in exactly the same manner. For instance, your present genealogy program might be perfectly happy with a birth date listed as, “after 1847 but before 1852.” However, once that information is exported in a GEDCOM file and then imported into a different program, the birth date may say something else. The receiving program may expect exact dates and not be able to handle anything that says “after” or “before,” especially not both in the same statement. Typically, the receiving program simply leaves the line blank. Sadly, one or two genealogy programs will accept the first date found on the line and then will disregard any further information.

Another problem is that not all genealogy programs have the same ideas about databases. One program may have only one field for “occupation,” assuming that every person on the face of the earth never, ever changed careers. Another genealogy program may have the ability to record multiple occupations during the person’s lifetime. When transferring data via GEDCOM from the more powerful program to the simpler one, some of these occupations will be lost.

The GEDCOM standard was invented long before genealogy programs started saving pictures, videos, and other multimedia files. As a result, transferring information from one genealogy program to another results in the loss of multimedia files.

Another problem is the transfer of notes. Some genealogy programs store only one note per individual while a different program may store different notes for physical description, medical history, occupations, DNA information, and more. GEDCOM was invented long before genealogy programs reached that level of sophistication so it cannot accurately transfer all variations of notes.

These are a few simple examples; you can find numerous other inconsistencies when moving data between dissimilar programs.

There is another problem with translating from one program to another: that of data integrity. Translating from one program’s database to GEDCOM is sort of the same as translating from one spoken language to another. The basics work, but subtleties and details sometimes do not translate well. Then, when translating to the third language (the receiving genealogy program’s database), more translation losses creep in. I well remember reading a technical manual some years ago that had been written in Japanese and then translated into Chinese. At a later date, the Chinese version was translated into English. The resultant English manual was barely readable. The same may happen with translating a database from Program A into GEDCOM and then from GEDCOM into Program B.

A new method of transferring data between different genealogy programs was announced several years ago by Wholly Genes Software. Their Bridge technology reads data from one program directly into a second program without requiring a “double translation” via GEDCOM. The result is a much more accurate transfer process. However, very few genealogy developers have adopted GenBridge. To date, this technology is only available in a few programs.

Despite all the shortcomings, GEDCOM is still a simple and somewhat effective method of transferring genealogy data from one program to another. Most of the data will transfer properly, and then there are easy ways of reviewing the data to look for errors. The names, dates, and locations normally transfer correctly. Text, events, notes, and source citations may not always work perfectly. The exact problems encountered will depend upon the two genealogy programs involved.

Most modern genealogy programs will create an error log of GEDCOM data imported but not understood by the receiving program. You can read that log file to see what the program detected as inconsistent, then manually go in and fix the errors. While tedious, this is still a lot better than re-keying everything!

I offer this article as a non-technical explanation of GEDCOM plus some commentary on its use. For more details and for technical explanations of the inner workings of GEDCOM, I would suggest that you read the following:


Thanks, John


Back in the 90s when I was compiling info on the Parkmans to publish a book I had a few people offer to send me Gedcom files of their info instead of printouts. I tried it. I discovered that I actually spent more time checking and doublechecking to find all the pieces that did not get transferred properly than if I had just keyed the info in from the printouts. So while it involved more typing I achieved my results in less time and with greater accuracy by not using Gedcom.

Over the years I’ve just assumed that it must have been improved upon and that all those problems would have been addressed. If I’ve understood this post correctly, that is not the case. I’m amazed that people are still trying to use a flawed standard from so long ago.


I’m so glad someone asked this question and thank you for explaining.


Chiquita Hutchinson May 25, 2014 at 10:01 am

I made the mistake of entering all the family lines that I researched into the same family tree file in My Heritage software (mine, my husbands, my half-brothers, my aunts, my brother-in-laws). I regret that now because some of these lines I do not want to continue updating with matches but I don’t want to delete the work that I did on them. Is there a way with GEDCOM to load my file into additional trees and then delete the people from each tree that do not belong there … without losing that data from the original tree?


    —> Is there a way with GEDCOM to load my file into additional trees and then delete the people from each tree that do not belong there.

    Yes, if your software allows that.

    MOST of the better genealogy programs of today allow you to export a GEDCOM of just SOME of the people in your database. For instance, you should be able to create a GEDCOM file of “all the ancestors of John Smith” or “all the relatives of John Doe” or something similar. The exact terminology and the menu commands to accomplish that will vary from one program to another, however. Once you have exported that data and saved it someplace, you can then delete those people from your primary database.

    I created multiple databases in my favorite genealogy program: one is for my ancestors, another is for my ex-wife’s ancestors, another is for John F. Kennedy’s family tree (I use that often to demonstrate genealogy software to others) and my favorite is a database of Donald Duck’s family tree. (Seriously!) With almost all genealogy programs, it is easy to switch from one database to another while keeping them all separate and isolated from each other.

    Again, that is MOST of today’s genealogy programs, but not all. Some of the simplistic genealogy programs will not do that.


A year or so ago, I purchased an excellent genealogy support program called GenDetective – I think you wrote about it. Upon exporting a gedcom from Family Tree to this program, I discovered that there was no support in the FT output for media files. When I called FT, I was informed that Version 12 provided that output – I was on Version 11. I would add this requirement to anyone searching for a genealogy program; not just exporting, but, also, importing.


A timely article, I am thinking of moving to Family Historian database which claims to be 100% GEDCOM compatible (that is far from the only reason for my move), it seems to be gaining popularity so perhaps GEDCOM is far from dead. Although this might limit things like Witnesses, without a new “GEDCOM” standard, portability of data is a pain. I have tried to design my own data structure within TMG such that I can port it to things like Rootsweb Family Trees and Ancestry,com and other programs and share with others. GEDCOM might be past its sell by date, but its all we’ve got.


This may be a stupid question but – is this how I can update my “tree” in Family Search from Ancestry tree?


how can you convert a pdf to gedcom without typing in over 20 000 names 1 by 1 as i belong to 2 enormous trees


    I am afraid I do not know of any method of converting a PDF file to GEDCOM. Can anyone else help?


    It depends on the format of the PDF file. If it’s a genealogical report format, and can be OCR’d, you can copy and paste each piece of information into your genealogical software. That isn’t retyping, but it would still be quite tedious.


    Much depends on the format in which your data is displayed, but have you considered OCRing the PDF and then converting it to GEDCOM format?
    MANY years ago I got tired of retyping information on Family Group Sheets, and set up a system to store my genealogy data on a computer. (Yes, Dick, I used 8″ floppy disks too!). When they came out with PAF I jumped for joy, but despaired of having to retype all my data, so I contacted Family Search to find out the data storage format used in PAF. They were in the process of developing GEDCOM at the time, and they sent me the GEDCOM structural format information. From that I was able to write a routine to convert my data to the GEDCOM format and then load it into PAF.
    If you would like to send me a sample of your data, I will be happy to review it and see if I can help you to convert it to GEDCOM.


Great intro to GED format. I can’t help but notice a passing resemblance to the XML format standard.
It would be great if an XML data dictionary for genealogical information were created and made a standard.
Some background on progress to date is at:


Is cause of death one of the items which is not accepted into Gedcom?


I am using Generations 6.0 Easy Tree (SIERRA) and while it indicates that it is exporting a GEDCOM file, other programs do not recognize it as a GEDCOM file. Does not even have the .ged estension. Can anyone help me with this?
I really need to get my data off an old computer into a newer program.


Leave a Reply to Carol M. Cancel reply

Name and email address are required. Your email address will not be published.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <pre> <q cite=""> <s> <strike> <strong> 

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: