Tuesday, January 21, 2014

Soundex: A Blast from the Past OR A Peak Behind the Curtain

So what is Soundex? What does that click box do to my search? Why do people keep telling me to use it when I do a search?

If you started your genealogy research B.C. (before computers) you probably know these answers. You also may have just nodded and sort of smiled about the "good old days." But if you are newer to genealogy research -- A.C. (after computers) especially after the early years -- you may not know that there is a lot behind that simple click on a search form. And that's okay, today we're going to change that.

Soundex is one of many phonetic algorithms that allow us to index words (mainly names) by the sound of the word. So regardless of minor spelling differences the words are grouped (indexed) together.

For genealogists, that means we can find all the Smith, Smyth, Smythe, etc names in one spot. This makes it easier for us because as we go back in time spelling was not standardized and more people were illiterate and may not have known how to spell their name anyways. Soundex gives us a fighting chance to find them in many cases.

According to Wikipedia, Soundex was developed and patented in 1918 and 1922. A variation called American Soundex was used in the 1930s to index the US Census from 1890-1920. The National Archives and Records Administration (NARA) maintains the rules for implementation for the US Government.

Rules? Yes, rules. [Please read on and learn all about it. Or read on and see how much you remember from the "good old days."]

Soundex converts words to a letter and three numbers -- no matter how many letters make up the word. If you have a Michigan driver's license, the letter and first three numbers of your license number are the Soundex code for the surname on your license. Note: not all states use this as part of the driver's license number.

Before computers (for some of you this equals before you were born), we figured the code with paper and pencil. [It's okay if you use a blank notepad or Word document file but it isn't quite the same as the "old days."]

Take a surname, any surname, and write it down.  Then put four underscores/dashes ( _ _ _ _ ) to the right of the surname or above the surname. As you figure the Soundex code this is where you are going to put your "answers" as you determine the code for the surname you wrote down.

1. The letter portion of the code is always the first letter of the surname/word you are converting.
It does not matter if that first letter is a consonant or a vowel. So write that letter on the first underscore/dash.

Now we figure out the number part of the code (three numbers) from the remaining letters of the surname.

2. Eliminate/cross-out the vowels and a few other letters in the surname. 
A, E, I, O, U, H, W, Y

3. Below the remaining letters of the surname, convert each letter to the appropriate number from the list below.
1 = B, F, P, V
2 = C, G, J, K, Q, S, X, Z
3 = D, T
4 = L
5 = M, N
6 = R

4. Now read and apply any of these additional rules to the surname your wrote down.
Double Letters
If the surname has any double letters, ignore (cross out) the second occurrence of the same letter

Letters Side-by-Side that Convert to the Same Soundex Number
If the surname has two different letters side-by-side that become the same Soundex number, ignore (cross out) the second occurrence of the number. This includes situations where the first letter of the surname (which remains a letter) and the second letter would code to the same number.

Names with Prefixes
In this situation, you need to convert the surname to two different Soundex codes. One using the Prefix and one not using the Prefix. Note: Mac and Mc are not considered prefixes while Van, Le, De, etc. are prefixes. This covers you for different indexing (non-coded) methods used.

Names with Consonant Separators
If a vowel (a, e, i, o, u) separates two consonants with the same number code, the consonant to the right of the vowel is coded -- you use the second occurrence of the code number. But if the letters h or w separate the two consonants with the same number code, you do not use the second occurrence of the code number.

Out of Letters
If you run out of letters, use a 0 (zero) to fill in any of the three Soundex numbers still vacant.

5. Following all the rules, now you should have the number portion of the Soundex code for the surname you wrote down. Transfer your three numbers to the remaining underscores/dashes.

Lincoln  =  L524  (L, 5 for N, 2 for C, 4 for L)

Wellington =  W452 (W, 4 for L, ignore the second L, 5 for N, 2 for G, remaining coded consonants are ignored)

Pfropper = P616 (P, ignore f as codes the same as a p, 6 for R, 1 for P, ignore second P, 6 for R)

See = S000 (S, e is a vowel which is ignored and there are no remaining letters so use 000)
Sy = S000 (S, y is not coded and there are no remaining letters so use 000)

The National Archives and Records Administration's explanation of the rules has a good example of the consonant separator rule.
Tymczak = T522 (T, 5 for M, 2 for C, ignore Z since it codes to 2 also, vowel separates so 2 for K)

with prefix = V522 (V, 5 for N, 2 for G, vowel o lies between the next consonant so 2 for the next G is used)
without prefix = G200 (G, 2 for G, no letters remain so use 00)

Want to check if you figured your code correctly? There's a converter for that. In the early days of genealogy on the internet, an automatic Soundex Converter was "a big thing." Today, you can use it to check your work, or use it just for the fun of it. One Soundex Converter is hosted by RootsWeb. There are likely more still out there on the internet. Almost all genealogy programs have a feature to tell you the Soundex code for a surname.

Now this indexing system takes into account many spelling variations. But not all of them.  [Doesn't there almost always seems to be a caveat?]

Soundex will not help you if the first letter of the surname was switched. Like when a census enumerator (not of the same ethnicity as the resident) heard a V when a resident with a German accent said a name spelled with a W. In German, a W is pronounced more like V. Thus Wandschneider can become Vonsnider on a census. And as you can see the Soundex code for Wandschneider (W532) is not the same for Vonsnider (V525) and you won't find these different spellings in the same place (group). So remember to think how someone said something and how it may have been heard. You may need to play with letters a bit.

So, besides the caveat, that is the mystery behind the curtain of today's simple click to use Soundex in your search form. Started your genealogy B.C.? Hope your memory wasn't too rusty.

