Don’t Want to Lose (Parts of) Your Genealogical Data?

The following is an article written by guest author Bob Coret and is copyright by him. The article is published here with the permission of Bob Coret:

Don’t want to lose (parts of) your genealogical data?

A recent research report by Genealogy Online shows that genealogists have a high risk of losing (parts of) their genealogical data when transferring a GEDCOM file from their family tree program or service to another family tree program or service. This is caused by the fact that most family tree programs and services do not follow the GEDCOM specification to the letter and because a lot of undocumented “user-defined tags” are used.

Recently, Nigel Munro Parker, made his GEDCOM validator GED-inline [http://ged-inline.elasticbeanstalk.com/validate] available for re-use. GED-inline reads a GEDCOM file and checks if the file follows the rules of the specified GEDCOM specification. You get a report nearly instantly (and free). Besides statistics it shows the number of warnings and user-defined tags, as well as a list of all warnings. Genealogy Online (a service for easily publishing your family tree online) recently deployed the open-sourced GED-inline in its infrastructure. Genealogy Online [https://www.genealogieonline.nl/en/] now checks all GEDCOM files it receives to publish online. When there are warning in regards to the GEDCOM file, Genealogy Online notifies the user.

In order not to lose genealogical information when it is transferred from “A” to “B”, agreements on how the information is recorded are of great importance. If both “A” and “B” adhere to these agreements, then the information will come across properly – without loss of information! Agreements about the format of genealogical information are laid down in the GEDCOM specification. The most recent GEDCOM version is 5.5.5, which is published on http://www.gedcom.org [https://www.gedcom.org/].

As a genealogist you do not have to dive into these GEDCOM specifications. The specifications are intended for the suppliers of family tree programs and services (more specifically, their developers). But as a genealogist you should make sure that the GEDCOM function of your family tree program or service adheres to the GEDCOM specifications! After all, if a family tree program or service does not adhere to the GEDCOM specifications, then there is a risk of information loss during the transport of the genealogical information!

As a genealogist you can check the quality of your GEDCOM too! If you’re not using Genealogy Online, just go to GED-inline [http://ged-inline.elasticbeanstalk.com/validate] directly and upload your GEDCOM. See how many warnings are in the validation report. The number of warnings says nothing about your genealogical information, you didn’t do anything wrong. The warnings relate to compliance of the GEDCOM file with the GEDCOM specification. If there are warnings, there is a good chance that the GEDCOM file will not be fully understood by another family tree program or service and that there is a risk of information loss!

Another number that you should pay attention to in the GED-inline report is the User-defined value. This number represents the number of lines in the GEDCOM file where a so-called user-defined tag is used. Such tags are valid within GEDCOM, but the meaning of this is not laid down in the GEDCOM specification. And often, these use-defined tags are not documented publicly. So if program “A” places a certain information in a user-defined tag, chances are that program “B” does not know what information it is and what it should do with it. In a best case scenario these values are included as a comment, in the worst case scenario, these values are ignored. So, the user-defined tags also increase the risk of information loss.

Genealogy Online’s ‘GED-inline validation statistics’ [https://www.genealogieonline.nl/en/GED-inline/] report show that 1,215,130,449 lines of GEDCOM were inspected, 8,129,466 warnings were given (that’s 0.7%), and 93,365,260 lines contained user defined tags (that’s 7.7%). With these shocking numbers, you have to wonder, just how much genealogical data is lost when transferred?

What can you, as a genealogist, do to reduce the risk of information loss?

If you – after checking the quality of your GEDCOM file – find that there is a risk of information loss, contact the supplier of your family tree program or service. Ask them to improve GEDCOM support (and minimize the use of user-defined tags and document them), so that parts of your genealogical data are not lost during export (and import)!

In your contact with the vendor you can send the GED-inline report of the validation of your GEDCOM file and the link to www.gedcom.org where the GEDCOM specifications are published. If the supplier does not consider the quality of the GEDCOM export (your genealogical data!) as important, it may be time to look for another family tree program of service.

17 Comments

“The most recent GEDCOM version is 5.5.5, …”

Informations from FamilySearch (by asking about 5.5.5):
“The Church of Jesus Christ has the copyright on the Gedcom Specification since 1987. There has not been a legal transfer of the rights we have to the Gedcom Specification.”

So 5.5.5 is not a legal GEDCOM version.

Like

    —> So 5.5.5 is not a legal GEDCOM version.

    I am not a lawyer but I don’t believe having a copyright has anything to do with version numbers under U.S. laws. I am not sure about other countries, however.

    In this case, Company A can create version 1.0 of anything and copyright the product. Company B can then legally create version 2.0 of the same thing. Company C can then legally create version 3.0 of the same thing.

    If either Company B or Company C then attempt to SELL their new versions, then U.S. copyright laws will be involved. But simply announcing a new and improved standard is never illegal. U.S. copyright laws only deal with the rights to copy a product and reproduce it elsewhere, not for simply suggesting improvements to something and then publishing the new improvements’ specifications. If the suggested improvements are published for all to read and there is no claim of new copyrights, I believe the suggestions qualify as “open source.” (Available to all.)

    If I am wrong, would someone with legal expertise please let me know?

    FamilySearch owns the copyrights for GEDCOM and probably will do so forever. However, that does not affect your right or my right or anyone else’s rights to suggest improvements.

    Like

    I don’t know witch other programs has a place management and need to export the data.
    Not all of the programs from the Gedcom-list need it (and use it).

    There is no difference between
    n NAME
    +1 TYPE RUFNAME
    and
    n NAME
    +1 _RUFNAME

    Both ist not documented in the GEDCOM standard and programs will ignore it, because no program knows a type “RUFNAME”. As a user-defined TYPE no program can evaluate the type.

    So _RUFNAME and TYPE RUFNAME is possible in GEDCOM. Many (German) programs know and use _RUFNAME. So the user can transfer the data without problems (with the Rufname).

    Greetings, Stefan

    Like

    I can’t repy to Dick, so I comment it here.

    I’m not a lawyer, too.
    But I think, most of the structure, tags records and perhaps the explaned text from 5.5.5 looks equial to the GEDCOM versions from LDS. I see it as a copy.

    If you copy most of the documentation from the Microsoft DOCX Documentation and change some sites. Do you think you have the right to name it “DOCX2019”?
    I stay with my opinion. Tamuras 5.5.5 document is not al legal GEDCOM version.

    Greetings from Germany, Stefan.

    Like

    If FamilySearch were to sue for infringement of their GEDCOM copyright, on what grounds would the suit be based? Loss of revenue? 🙂

    Like

    It is not allow to publish texts protected by copyright as one’s own. You may also not simply sing a song from Madonna and publish it as a new song.

    Like

This article says in other words “User defined tags are evil! The more lines with user defined tags your GEDCOM file has – the lower is its quality.”
But it is not as easy as it sounds.
There are some user defined tags like “_UID” you find in nearly every GEDCOM file which causes no problems at all.
User defined tags are a valid way intended by the GEDCOM standard to save data for which no other standard tag exists (home person, personal tasks, additional location information, …).
What should a vendor do, when users asking about “disturbing” user defined tags? Left out some of the information? No! The goal should be to write all user data in the GEDCOM file.
The better way is that a.) software should give an detailed import report of what data is ignored and b.) vendors should share informations about user defined tags (like German GEDCOM-L group do – see here: http://wiki-de.genealogy.net/GEDCOM/_Nutzerdef-Tag).
And believe it: standard tags are no guarantee for being not ignored by importing software. Sometimes the importing software has fewer capabilities and the user looses data for this reason.

Regards, Dirk (www.ahnenblatt.com).

Like

    Dirk, nearly all data can be stored in GEDCOM files without the use of user-defined tags. Just use the EVEN.TYPE or FACT.TYPE tags that are already defined.
    I have written many articles about different applications’ compliance (or lack thereof) with the GEDCOM 5.5.1 standard. I have also notified all the developers about the problems. Most of them are not interested in improving their GEDCOM compliance.
    Keith Riggle (GenealogyTools.com)

    Like

    I don’t agree.
    I’m in the same German GEDCOM-L group as Dirk. We have searched a way to export
    the german “Rufname”. It is no Nickname and no way to do it in any GEDCOM version. So we agreed to _RUFNAME as a new tag and it works fine for all represented developers of the GEDCOM-L group.
    Or locations that stored in a place management. We have agreed to this (a complete new record):

    0 @@ _LOC
    1 NAME {1:M}
    2 DATE {0:1}
    2 _NAMC {0:1}
    2 ABBR {0:M}
    3 TYPE {0:1}
    2 LANG {0:1}
    2 <> {0:M}
    1 TYPE {0:M}
    2 DATE {0:1}
    2 <> {0:M}
    1 _FPOST {0:M}
    2 DATE {0:1}
    1 _POST {0:M}
    2 DATE {0:1}
    2 <> {0:M}
    1 _GOV {0:1}
    1 _FSTAE {0:1}
    1 _FCTRY {0:1}
    1 MAP {0:1}
    2 LATI {1:1}
    2 LONG {1:1}
    1 _MAIDENHEAD {0:1}
    1 EVEN [|] {0:M}
    2 <> {0:1}
    1 _LOC @@ 0:M
    2 TYPE {1:1}
    2 DATE {0:1}
    2 <> {0:M}
    1 _DMGD {0:M}
    2 DATE {0:1}
    2 <> {0:M}
    2 TYPE 1:1
    1 _AIDN {0:M}
    2 DATE {0:1}
    2 <> {0:M}
    2 TYPE {1:1}
    1 <> {0:M}
    1 <> {0:M}
    1 <> {0:M}
    1 <> {0:1}

    How can you manage this only with tags from any GEDCOM version.

    Greetings from Germany, Stefan.
    ()

    Like

    Keith, I agree that user-defined tags should be avoided whenever possible, but sometimes some extra information can only be stored in user-defined tags like main profile picture of a person (OBJE._PRIM – which works fine with PAF, Legacy, Family Tree Builder, …) or tasks to a person (INDI._TODO – which works fine with Legacy, RootsMagic, …).
    This information could be ignored by some other software, that’s true – but mainly because they have no such options (main profile picture or todo lists). To judge about data quality only by counting lines with user-defined tags would blame software which tries to save as much information as possible – instead of omitting this data in GEDCOM export. I think such statements about “bad GEDCOM quality when using user-defined tags” will confuse many users and are not helpful.
    “… I have also notified all the developers about the problems. …” – but not me so far. You can contact me by mail to discuss further details if you like. Perhaps you’ll find the first gensoft developer who is interested in GEDCOM compliance … 😉
    Regards, Dirk (http://www.ahnenblatt.com).

    Like

    Dirk, I haven’t contact you because I haven’t tested Ahnenblatt. I will add it to my ever-growing list.

    Like

The webside destroy my posts 😦
Stefan

Like

Stefan, which major apps or websites outside the GEDCOM-L group are using your new record type? Family Tree Maker? Roots Magic? Family Tree Builder? The problem with user-defined tags is that other apps can and will ignore them.
You can represent any type of name, not just nickname, with the NAME.TYPE structure that is mandatory, anyway. The PERSONAL_NAME_PIECES with NAME_PIECE_NICKNAME is optional. So, for example, you could have:
n NAME
+1 TYPE RUFNAME
You can have as many name structures attached to an INDI record as you want.

Like

    I don’t know witch other programs has a place management and need to export the data.
    Not all of the programs from the Gedcom-list need it (and use it).

    There is no difference between
    n NAME
    +1 TYPE RUFNAME
    and
    n NAME
    +1 _RUFNAME

    Both ist not documented in the GEDCOM standard and programs will ignore it, because no program knows a type “RUFNAME”. As a user-defined TYPE no program can evaluate the type.

    So _RUFNAME and TYPE RUFNAME is possible in GEDCOM. Many (German) programs know and use _RUFNAME. So the user can transfer the data without problems (with the Rufname).

    Greetings, Stefan

    Like

    Stefan, it does not matter that “RUFNAME” is not in the GEDCOM standard. NAME.TYPE IS in the standard, page 38 of the original version. It is not user defined. The TYPE can be anything: Byname, Confirmation name, Patronym, etc, just like NAME can be anything, even “Stefan Mettenbrink”. 🙂 That’s the whole point of the TYPE field: it can hold any content (up to the size limit), and apps and websites must import it (most do in fact import it).

    Like

    You are right. But if I export a german Rufname (may program has an extra option for that) in that way, I think an english (or other language) program try to translate it. If the user get the GEDCOM file back I get:
    n NAME
    +1 TYPE callname (or something else)
    How could I import that in my field for the Rufname?
    A user defined type is nice. In a not defined field. But in this matter, I need a way to do it without comlications.

    BTW, _RUFNAME ist legal in GEDCOM, too. So, why shouldn’t I use it? It works fin for me and many other programs.
    I think, a german user who need Rufname, use a program that support it.

    Greetings from Germany, Stefan.

    PS: I have just update my GEDCOM-Validator. Now you can choose english as language, too (Russian is just started). So, if you like, just try it. It’s free.

    Like

Leave a Reply to Keith Cancel reply

Name and email address are required. Your email address will not be published.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <pre> <q cite=""> <s> <strike> <strong> 

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: