Friday, June 28, 2013

Ancestry New Search and Phonetic Searches

It had been awhile since I used Ancestry's "New Search" so since "Old Search" is going to disappear soon, I started to explore a bit.

One of the first things I saw briefly got me excited. In addition to the standard Soundex, they offered a 'phonetic' search. So I thought I would try it out with the one surname of mine that has several different phonetic variations, but for which I feel Soundex is a complete failure. Soundex may have been a useful algorithm when it was invented, in 1918, but better algorithms have to be possible by now. Beyond entering the surname, I selected no other restrictions:

Here were the results:

Over three million hits for phonetic variations of Cruvant? Really? This is what I expect from Soundex. Other surnames that share the same Soundex code as Cruvant include: Carpenter and Corbin. These surnames are not close to a phonetic match to Cruvant, but Soundex thinks they are. It was clear to me that whatever algorithm Ancestry was using for 'phonetic' was no better than Soundex. I started browsing through the screens to see when the non-matching names would start.

A little bit less than 300 matches for Cruvant, and over three million other surnames, none of which are likely the variations I would hope for: Kruvant, Kroovant, Cruvand, and Kruvand. There should be no reason the first two shouldn't come up with any algorithm coded for English phonetics. The latter two would be nice, but I would understand why they got missed.

When I found Ancestry's explanation of Phonetic Variations, I understood:

"There are other name matching algorithms that we can use to help identify records to consider for your results. If you choose phonetic we will identify appropriate algorithms that apply to specific data collections and if a record has one of those names, we will use it as a possible record for your results set. For example, if you are prioritizing Jewish Collections first, we would choose the Daitch-Mokotoff phonetic algorithm."

In other words: We know of other algorithms. We aren't going to tell you what those algorithms are except for one we randomly choose for an example. There may be others. There may not be. We're not going to tell you. You can't select to use any of these algorithms instead of Soundex. If we decide an algorithm might be appropriate for a particular collection of records, we might use it. We might not. We probably won't even tell you when we do. Selecting 'Phonetic' is really the same as selecting 'Soundex' unless we decide it isn't going to be for a particular search, and we get to decide, you don't. But if selecting 'Phonetic' makes you feel good, go ahead. That's what we're here for, to convince you to click on checkboxes that 9 times out of 10 won't do anything.

Note: I did conduct a Phonetic search on a collection of Jewish records, and a search on the surname "Kruvant" turned up no results when there were "Cruvant" records. See images below.

According to this Daitch-Mokotoff converter they have matching codes. Which means Ancestry chose not to use this algorithm for that search. What a shame. The one algorithm they do mention, and a collection they imply they might use it for, and they didn't.

(Cruvant, Kruvant, Cruvand, Kruvand, and Kroovant all have the same Daitch-Mokotoff code - 597630. For my own personal research, I would love to use this algorithm with every database at Ancestry. Ancestry claims that they have the algorithm coded into their software. Why can't I use it?)

I realize this has nothing to do with the difference between Old Search and New Search, since Old Search doesn't provide you with the 'phonetic' checkbox. I suspect someone might be able to point me to a collection where the Daitch-Mokotoff algorithm is used, but that's not the point.

No comments: