« Google Maps adds Satellite Photos - Wow! | Main | Changes to SmartMatches on GenCircles »

April 06, 2005

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Mac Young

Dick, your excellent soundex article seems similar to your previous soundex article which I can't date at the moment -- to which I posted a comment that got no response. Anyway, I'm going to again post this observation because it involves what is arguably the most popular employer of soundex coding that we genealogists use daily, and that is the Ancestry.Com census databases.

My testing indicates that Ancestry's soundex searches employ an index that does *not* use the "H & W" rule. My test case uses the Brooklyn, N. Y. census for 1870, and the surname of our friend, Tony Burroughs (who last Saturday gave us an excellent Saturday night banquet speech at the NERGC conference) -- and who bears the problematic surname case where his correct soundex code is B-620, but without the H & G rule, is B-622. An Ancestry.com "exact" search for "Burroughs" yields 25 hits, whereas a "soundex" search yields 80 hits -- but *zero* of them are "Burroughs" or it variants. Checking a few of these 80 hits shows they all seem to match B-622. If we then fake the search out by dropping the final "s" and do a soundex search on "Burrough" -- which codes as B-620 using the incorrect coding rule -- we get 2,182 hits, including the missing 25 "Burroughs" plus numerous useful variants like "Burrows", etc., which is exactly what we want from a soundex search.

Tony, in his original article, I'm sure explained all this, but did he point out that this hughly popular census database suffers this coding problem? And even if he did, we genealogists need to know what we're dealing with and how to cope with it, so it probably deserves being spotlighted again. Are there other popular databases out there with the problem?

Roger Moffat

The only way to be sure is to search for both the "correct" Soundex code and the "incorrect" soundex code.

I discovered this H/W anomaly quite a few years ago (1998/1999 I think it was) while trying to determine all the rules so that I could write the calculations to have FileMaker Pro do soundex coding on databases I was developing for our genealogy society.

At that time the only online reference I could find to the H/W rule was at the Clayton County, Texas Library's web site. Nothing at our local library, or at the State of Michigan Library mentioned the H/W rule, but yet it was obvious this rule had been used when the US Censuses were indexed in the 1930s since Ashcroft was coded in these indexes as A261, rather than A226 which the common rules would determine.

A discussion on soc.genealogy.computing followed and the problem became more publicised.

The databases I create now calculate Soundex both ways - using the H/W rule and ignoring it, so for example if you go to

http://data.wmgs.org:591/KentCountyObits/FMPro?-db=KentCountyObituaries&-lay=Listing&-format=search.htm&-view

and search for Last Name Begins with Burroughs you'll see in the results that the Soundex could be either B620 or B622, so conversely you'd find Burroughs by searching for either B620 or B622.

Cheers

Roger

Roger Moffat

Actually it's "older" than I thought. Google turned up the archive of my first post to GENCMP-L about this in January 1997

http://archiver.rootsweb.com/th/read/GENCMP/1997-01/0853821393

but I can't find the archive of the discussion that involved Tony Burroughs and I think Bob Velke (author of TMG) was involved also.

Roger

Bioubiou

Hi,
Does anybody know the names of these "numerous other improved Soundex methods".

Md. Rakibul haque

Hi,
I have understood converting from name to soundex code. But After getting the code, how I will get the similar names from the code.
That is name to code, then code to some similar names. I need help on code to names which are phonitically similar.
Please help me.
Bye.

Bob Richardson

To Md. Rakibul Haque - converting from a soundex code back to names. To be practical, you would first need a list of all possible/reasonable names. Then the program would pick out from the list, and display, all those names that fit the soundex code provided.

If you are a doctor (Md ?) you may be wanting to convert from a soundex code to a medicine. Your data base would contain all possible medicines. You could type in the soundex code and the program would display the few medicines that fit that code.

Scott Alan Johnson

I am a data analyst profiler and this article, and its rich use of links to sources has been a delight to find. Thank you for your research and sharing of information.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.

Receive FREE daily newsletter updates by email

  • Enter your email address


    Click here to see a typical e-mail message you will receive.

    I promise that:

    1. I will never sell, rent, or give away your address to any outside party, ever;
    2. I will never send you any unrequested e-mail, besides newsletter updates; and
    3. All unsubscribe requests are honored immediately, period.

My Photo

Search This Site for Past Articles

Meet Dick Eastman in Person

November 2009

Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30          

Amazon Kindle

Offers

Blog powered by TypePad

Amazon Picks

Receive daily newsletter updates by email

  • Enter your Email


    Preview

    (Don't worry, I hate spam as much as you do and you will be able to UNSUBSCRIBE within seconds at any time!)