« Video Interview: JoAnne Rockower of Geni.com | Main | Ancestry.com Announces New Partnership with the Shanghai Library »

August 04, 2008

GEDCOM Explained

I frequently mention the acronym "GEDCOM" in this newsletter. This week a reader wrote to me with an excellent question: "What is GEDCOM?" I realized that I haven't explained this buzzword in a long, long time. So, here is a brief, non-technical explanation of the term for the newer subscribers to this publication.

GEDCOM is an abbreviation that stands for GEnealogy Data COMmunications. In short, GEDCOM is the language by which different genealogy software programs talk to one another. The purpose is to exchange data between dissimilar programs without having to manually re-enter all the data on a keyboard.

To illustrate the importance of GEDCOM, step back in time with me for a moment. Back before the invention of GEDCOM and before the invention of the home computer, I used 80-column punch cards to record the names and limited information about 200 or so of my ancestors. I did this after work hours in my employer's data center. I then used the employer's mainframe computer that cost hundreds of thousands of dollars to sort the data and to print a few crude reports. Luckily for me, my employer allowed me to use all the mainframe time I wanted during the evening, after the company finished its daily work.

Around 1980, I built my own home computer. I decided to put my genealogy database onto the new system, but it would not read the 80-column punch cards I had used earlier. I manually re-typed every bit of data into a dBASE-II program that I wrote. My database had grown, so I had to enter data about 400 or so individuals. I stored the information on 8-inch floppy disks attached to my homemade 8-bit CP/M computer, which had 64 Kb (kilobytes) of memory.

Some time later I discovered a CP/M genealogy program that would operate on my home computer. (CP/M was an operating system that was popular before MS-DOS which, in turn, was popular before Windows.) Unlike my crude, homemade dBASE-II program, this new genealogy program printed pedigree charts, family group sheets, and other reports. I decided to convert to the new, more powerful program (although I must say that it was rather elementary when compared to today's powerful programs). At this point my database had grown to about 600 individuals, and I could not find any method of easily copying that data into the new program. I first printed out the information from the dBASE-II database. Then I sat at my computer for several evenings, reading the information on paper and re-typing every bit of it into my new program.

I bet you can guess the next step: I purchased an IBM clone in 1985 and decided to move my data to this new powerhouse. After all, it had 640 kilobytes of memory and a 20-megabyte hard drive, which I was certain that I could never fill. Having been rather active in my genealogy research, I now had information about 1,200 people to re-enter. I printed out the entire database from the old system onto paper and then manually re-typed it into the new PC powerhouse. That effort took weeks, and I promised myself, "Never again!"

Newer genealogy programs appeared in the following years, each with new features that I found enticing. However, I continued to use the same program simply because I didn't want to go through the keyboard effort again.

Roughly fifteen years ago, the Church of Jesus Christ of Latter-day Saints announced something new: a file format called GEDCOM. This new proposed standard file format was designed to allow different genealogy programs to exchange data. There was only one problem at the time: the only program that could read and write GEDCOM data was the one written by the Church of Jesus Christ of Latter-day Saints.

GEDCOM is a standard, not a program.

As such, genealogy programs that are going to use the same data have to be written by the programmers to handle GEDCOM files. If you are trying to transfer data from one program to another, only to discover that one of the programs does not support GEDCOM, you are out of luck. To complete the exchange of data, both programs have to support GEDCOM.

Slowly, over a period of several years, other genealogy programs began to add the ability to read and write GEDCOM files. It became possible to move data from one genealogy program to another without manually re-typing everything. Now you can just export your file from one genealogy program in GEDCOM format and then import that GEDCOM file into another genealogy program.

All of today's genealogy programs support GEDCOM.

You can use GEDCOM files to exchange genealogy data with your distant cousin in Poughkeepsie as well as to upload data to RootsWeb, Ancestry.com, FamilySearch.org, OneGreatFamily.com, Geni.com, FamilyBuilder, and many other online databases.

The author of the genealogy program that I used never did add GEDCOM capability. Luckily for me, someone else eventually wrote a small routine that would export data from this program in GEDCOM format, and I was then able to move my data to increasingly powerful new programs.

By 1990, I was writing articles on CompuServe, advising everyone to never use a genealogy program that lacked GEDCOM capabilities. Luckily, that is no longer an issue. All of today's major genealogy programs will import and export GEDCOM data. Data transfer may still be a problem for those using older genealogy programs without GEDCOM capability; many people still find their data trapped in these "islands." For them, there is no easy solution.

Unlike the "dark ages" of the 1980s, it is now common for people to use two or three or even more genealogy programs. You may find one program that you prefer to use for storing all the bits of information that you encounter in your research efforts. However, you might prefer the printed reports or multimedia scrapbook features of a different program. Thanks to GEDCOM, you can easily move your data from one program to another. You can also share information with distant cousins using yet other genealogy programs by sending GEDCOM files to each other by e-mail.

The instructions for creating or reading GEDCOM files will vary from one program to another. You need to consult the program's HELP files to find the exact sequence of instructions your genealogy program requires.

GEDCOM files can be read by a human although it would be tedious to do so. Here is an extract from the beginning of a typical GEDCOM file:

0 HEAD
   1 SOUR Legacy
      2 VERS 4.0
      2 NAME Legacy (R)
      2 CORP Millennia Corp.
         3 ADDR PO Box 66
            4 CONT El Mirage, AZ 85335
   1 DEST Gedcom55
   1 DATE 16 Oct 2004
   1 SUBM @S0@
   1 FILE Kennedy.ged
   1 GEDC
      2 VERS
      2 FORM LINEAGE_LINKED
   1 CHAR ANSI
0 @S0@ SUBM
   1 NAME Not Given
   1 ADDR Not Supplied
      2 CONT
0 @I1@ INDI
   1 NAME Joseph Patrick /Kennedy/
      2 GIVN Joseph Patrick
      2 SURN Kennedy
   1 SEX M
   1 BIRT
      2 DATE 6 Sep 1888
      2 PLAC Boston, MA
      2 SOUR @S2@
         3 PAGE pg 56
         3 QUAY 3
   1 DEAT
         2 DATE 18 Nov 1969
         2 PLAC Hyannis Port, MA

            (rest of file omitted)

The file contains genealogy data in a structured format. It utilizes numbers to indicate the hierarchy and tags to indicate individual pieces of information within the file. A number of zero indicates the first line within a single record, and the letters, or tag, after the zero indicate the type of record. The top line in any GEDCOM file is the HEADER record, indicating that it is the beginning of the file. Words that are more than four letters long are typically abbreviated. In this case, the word HEADER is written as HEAD.

A number "1" shows that the line in question is one level below the "zero" line. This indicates that this line is one level subservient to the zero line and contains additional information. In the case of the second line in the above file, the entry of "1 SOUR Legacy" indicates that this file was created by (SOURCE) Legacy, a popular genealogy program for Windows.

The number "2" on the next line shows that it is subservient to the preceding line with a number 1 in it. In this case, the line of "VERS 4.0" indicates that the file was written with version 4 of Legacy. Below that you see a line labeled ADDR (address) and another labeled CONT (the previous line is CONTinued here).

Scanning a bit further down the file, you will see the following:

0 @I1@ INDI

Again, the zero indicates this is the beginning of a new record. The "at" signs bracket the record number. In this case, the record is of an INDIvidual, and it is individual #1 (I1) in the database. Succeeding lines show events, such as birth, marriage, and death, along with subsequent data listing dates and places. You will also note an entry of "2 SOUR @S2@," which indicates that a source citation for the event can be found in SOURce entry S2 to be found later in this file.

INDI, NAME, BIRT, DEAT, SEX, SOUR and the other record types are called GEDCOM "tags." There are many available tags within the GEDCOM standard and even a capability to create user-defined tags for those situations not covered by the standard. Of course, user-defined tags are usually not understood by the receiving program, so they seem to be somewhat useless. They may help define data within the program in which they were created, but they will not translate to a new program via the GEDCOM format.

This is a very abbreviated explanation of the internals of a GEDCOM file. You can a detailed explanation at http://homepages.rootsweb.com/~pmcbride/gedcom/55gctoc.htm.

You need to be aware that the creation of the GEDCOM standard was not a perfect implementation. For one thing, not all the data fields are specified precisely in the GEDCOM specifications. Next, not all the programmers of the various genealogy programs interpreted the specifications in exactly the same manner.

For instance, your present genealogy program might be perfectly happy with a birth date listed as, "after 1847 but before 1852." However, once that information is exported in a GEDCOM file and then imported into a different program, the birth date may say something else. The receiving program may expect exact dates and not be able to handle anything that says "after" or "before," especially not both in the same statement. Typically, the receiving program simply leaves the line blank. Sadly, one or two genealogy programs will accept the first date found on the line and then will disregard any further information.

Another problem is that not all genealogy programs have the same ideas about databases. One program may have only one field for "occupation," assuming that every person on the face of the earth never, ever changed careers. Another genealogy program may have the ability to record multiple occupations during the person's lifetime. When transferring data via GEDCOM from the more powerful program to the simpler one, some of these occupations will be lost. These are a couple of simple examples; you can find numerous other inconsistencies when moving data between dissimilar programs.

Another limitation is the fact that the present GEDCOM standard was created before the popularity of multimedia. You can transfer textual data, such as names, dates, and locations rather well in GEDCOM. However, transferring scanned images, sound clips, and movies from one genealogy program to another is almost impossible to accomplish via GEDCOM files. The present GEDCOM implementation can point to the location of multimedia files on a hard drive. In theory, this should suffice. However, in my experience of moving data around in many genealogy programs, I have rarely seen multimedia files handled properly.

There is another problem with translating from one program to another: that of data integrity. Translating from one program's database to GEDCOM is sort of the same as translating from one spoken language to another. The basics work, but subtleties and details sometimes do not translate well. Then, when translating to the third language (the receiving genealogy program's database), more translation losses creep in. I well remember reading a technical manual some years ago that had been written in Japanese and then translated into Chinese. At a later date, the Chinese version was translated into English. The resultant English manual was barely readable. The same may happen with translating a database from Program A into GEDCOM and then from GEDCOM into Program B.

A new method of transferring data between different genealogy programs was announced some time ago by Wholly Genes Software. Their Bridge technology reads data from one program directly into a second program without requiring a "double translation" via GEDCOM. The result is a much more accurate transfer process. However, only a few genealogy developers have adopted GenBridge.

Despite all the shortcomings, GEDCOM is still a simple and somewhat effective method of transferring genealogy data from one program to another. Most of the data will transfer properly, and then there are easy ways of reviewing the data to look for errors. The names, dates, and locations normally transfer correctly. Text, events, notes, and source citations may not always work perfectly. The exact problems encountered will depend upon the two genealogy programs involved.

Most modern genealogy programs will create an error log of GEDCOM data imported but not understood by the receiving program. You can read that log file to see what the program detected as inconsistent, then manually go in and fix the errors. While tedious, this is still a lot better than re-keying everything!

A new GEDCOM standard has been proposed that is to be based upon XML, a programming language that is popular on the World Wide Web. This new standard should greatly improve data transfer accuracy. See http://www.familysearch.org/GEDCOM/GedXML60.pdf for details. However, don't look for this new GEDCOM 6.0 any time soon. It has been a proposal for several years, and nothing has happened in that time. GEDCOM 6.0 appears to be going nowhere.

Older versions of GEDCOM have been around for more about twenty years, and only minor improvements have been made in that time. I expect that GEDCOM 6.0 will not appear in genealogy programs for several more years, if ever.

I offer this article as a non-technical explanation of GEDCOM plus some commentary on its use. For more details and for technical explanations of the inner workings of GEDCOM, I would suggest that you read the following:

   

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Hi Dick

The dates in the sample Gedcom file that you have used from Legacy does not actually comply with GEDCOM 5.5 Standard.

Keith

What's wrong the dates? They look fine to me.
But it is not a real Legacy file. Legacy does not add indentation.
Dick must have added that himself. It is wrong.

I have been taught, that if you are going to bring a question to the table, please bring a solution.

Dick,

Thanks for the explanation of GEDCOM. I appreciate that you gave both non-technical and technical aspects of the data standard, because I am referring several friends to the article, and they would not understand the database explanation.

I remember dBASE II and can't imagine writing a genealogy database program in that language. Ugh.

Dick,

Thanks for the information on gedcoms.

Since genealogy programs are databases, is it possible to import data from either another database or a spreadsheet? (assuming some familiarity with the process). What are the pros and cons of this process, if possible?

genejoan

Each genealogy program is different, depending upon the design parameters of the different programming teams.

The Master Genealogist allows for import/export via spreadsheets and databases. I believe that RootsMagic allows for exports only. I am not aware of any other genealogy programs with that capability, although there may be some.

- Dick Eastman

Reunion for Macintosh can both import and export a text based "spreadsheet" file.

The import doesn't allow for any linking to happen automagically, but it will let you choose what columns from the spreadsheet go to what fields in each person's record as they're imported.

Roger

Dick:

Like you I started computers back when we wired boards and shot packs of cards all over the floor when we dropped the boxes. I started on an Osborne CP/M system at home in 1980 and then had to re-enter the data when I changed programs. But fortunately when I went to DOS I went to PAF and stayed there until GEDCOM came along. I also beat the system some by turning my family group sheets into LDS in the early 1980's and they input my data into there system. I then downloaded it from their CD's at Salt Lake City to a GEDCOM and input it into my PAF program. They did a lot of the work for me.

The trouble is that GEDCOM 6.0 is not coming at all. No single sign despite the fact that each year new genealogical programs or new versions of old programs are released. Although it is XML based, and has a better, clearer and more versatile format, it seems that no one started to push for. And the question is why? Here are a few hypotheses and my answers.

H1: Gedcom 5.5 does a decent enough job in converting genealogical data between various systems. Maybe there is no need for more features?
A1: true but there is room for a lot of improvements.

H2: No vendor wants to be the first to invest on writing programs to read and write gedcom 6 data where no one will follow them.
A2: true, but XML has other intrinsic advantages, for example, it can be displayed by internet explorer

H3: Maybe the new standard was not developed with more players such as genealogical software companies and with their commitments?
A3: Likely true. Was developed by LDS.

H4: Does support of Gedcom 6 bring any business advantage to a company or to an end user?
A4: It reduces the dependency of the end user from the software tools.

And maybe a de-facto standard will replace GECOM 5.5.

If Gedcom 6 does not show up, then something will come instead.

Dick,

The current GEDCOM 5.5 standard does, in fact, support embedded multimedia information. This is explained
here: http://homepages.rootsweb.ancestry.com/~pmcbride/gedcom/55gcch2.htm#MULTIMEDIA_RECORD
here: http://homepages.rootsweb.ancestry.com/~pmcbride/gedcom/55gcch2.htm#MULTIMEDIA_LINK and
here: http://homepages.rootsweb.ancestry.com/~pmcbride/gedcom/55gcappe.htm

The problem lies with the genealogy programs. As you can imagine, embedding multimedia information makes a GEDCOM file extremely large. My family tree is stored in a 1MB GEDCOM file, but my scanned documentation takes up over 1GB.

There are some genealogy programs that can export spreadsheet and database files, these include Family Tree Legends, MyHeritage Family Tree Builder (which incorporates funtionality from FT Legends), Clooz and The Master Genealogist to name just a few.

Just as propietary file formats such as Word .doc, Excel .xls have become popular despite not being designed as universal formats, Family Tree Maker's .ftw format is capable of being imported into some rival programs that offer this functionality (such as FT Legends, TMG and RootsMagic). Despite some flaws FTM is one of the most popular genealogy programs on the market, so may be the .ftw format may become a rival to GEDCOM over time.

It seems from my perspective that companies are starting to focus much more on direct interaction between the internet and genealogy programs. Many genealogy programs are being changed to communicate directly with websites like FamilySearch and Ancestry. Because of this there is a dwindling need for gedcoms. Many programs now also allow you to import information directly from another popular program like Legacy, PAF, RootsMagic, and Family Tree Maker. Gedcoms are still useful, but seem to be going out of style.

---> The current GEDCOM 5.5 standard does, in fact, support embedded multimedia information.

Yes, as I wrote in the article, "The present GEDCOM implementation can point to the location of multimedia files on a hard drive. In theory, this should suffice. However, in my experience of moving data around in many genealogy programs, I have rarely seen multimedia files handled properly."

- Dick Eastman

----> Despite some flaws FTM is one of the most popular genealogy programs on the market, so may be the .ftw format may become a rival to GEDCOM over time.

Please no!!!!

Roger

Would love to know what everyone's favorite genealogy program is as I am shopping for one. Not so much interested in sharing but more in having enough room to write a short bio or notes for each person as well as a good intense publishing program. Want something formal with out the "cheesy" robotic info I see with some of them. Leaning towards the Master Genealogist but wonder what anyone might not like about it? Mary B.

I've seen this word in writing but have never heard anyone say it out loud. Is is pronounced with a G like "garbage" or a G like "genealogy"?

Thanks from someone new to genealogy.

I have only heard it as "JEDCOM" and since the "G" stands for Genealogy I assume it is correct, but its a good question.

Mr Eastman,

I find it sad that the release information on the various versions of GEDCOM on wikipedia mentions that GEDCOM 5.5 is the defacto standard from 1996, where as the "The Church of Jesus Christ of Latter-day Saints" use (also from the wikipedia article) FamilySearch.org & "PAF 5.2 uses UTF-8 as its internal character set, a feature which was introduced in the GEDCOM 5.5.1 draft, and can output a UTF-8 GEDCOM."

I also notice that a number of alternate file formats using XML are listed genealogy programs eg:

1)Microsoft Family.Show - .familyx file format (Open Package Convention)(new format)

2)GenoPro uses XML as its core file format.(no one else seems to be using it)

3)GRAMPS - GRAMPS XML. (been around a while)(wikipedia list that other programs support this, but it looks more like grabbing at straws at the moment)

I hope that this is progress(!), maybe the program makers can be convinced to merge their formats and and support each other?

I would like to be able to export my own projects to other software. The GEDCOMs I work with today seem to fail when it comes to exporing or importing sources and citations.
We seem so in need of third party solutions that will allow us to communicate not only names, dates and events, but the important record of the evidence that supports our findings.
Me thinks we have grown much since 5.5. --GJ

Did anyone ever respond to a questioner as to what is their favorite genealogy program?

I love using iFamily for Leopard, but also wonder if there's one that would come in second to that. I've taken a very brief look at Mac software at the Apple store, but it seemed somewhat busy.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.

Receive FREE daily newsletter updates by email

  • Enter your email address


    Click here to see a typical e-mail message you will receive.

    I promise that:

    1. I will never sell, rent, or give away your address to any outside party, ever;
    2. I will never send you any unrequested e-mail, besides newsletter updates; and
    3. All unsubscribe requests are honored immediately, period.

My Photo

Search This Site for Past Articles

Meet Dick Eastman in Person

  • Sept. 2 to 5, 2009 - FGS National Conference - Little Rock, AR

    Sept. 26, 2009 - Maine Genealogical Society Annual Conference - Bangor, Maine

    Feb. 13, 2010 - Pinellas Genealogical Society - Largo, Florida

    Feb. 26 to 28, 2010 - Who Do You Think You Are? LIVE! - London, England

    March 27, 2010 - Clayton Library - Houston, TX

    April 10, 2010 - Indiana Genealogical Society (IGS) Annual Conference - Ft. Wayne, IN

July 2009

Sun Mon Tue Wed Thu Fri Sat
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31  

Amazon Kindle

Offers

Blog powered by TypePad

Amazon Picks

Receive daily newsletter updates by email

  • Enter your Email


    Preview

    (Don't worry, I hate spam as much as you do and you will be able to UNSUBSCRIBE within seconds at any time!)