Friday, July 20, 2012

Correcting a Serious Mistake

I’ve uploaded parts of my genealogy database to a few different websites. If I am going to make the tree public, I’ve naturally always trimmed it a bit. My standard procedure is to delete every record of a living person. Most of the websites have privacy mechanisms that obscure information for living individuals, but which information they obscure sometimes varies, and technology can always fail. So no information is better, in my opinion, than obscured information. What the software doesn't know, it can't reveal to the world.

But a recent occurrence made me realize this isn’t always enough. A cousin contacted me, (rightfully) upset because some private information appeared on an entry at Wikitree. How did this happen? I had placed information about a living person within the source notes for their ancestor.

I actually do this a lot, but not to the extent in this case. I copy and paste obituary notices all the time for the individuals in my database. And those obituaries usually contain the names of children and grandchildren. However, few would complain about their name appearing in the obituary of their grandparent.

This lapse was a lot more serious. I had copied and pasted an entire email from a cousin, detailing names, and dates of birth, for several generations. I naturally pasted it into the notes for the original ancestor, and entered ‘see notes for X’ in the notes for the later generations. This email was my ‘source’ for much of the information; it made sense for me to copy and paste it into the database. I just forgot I had done this when I uploaded it to WikiTree. To cap it off, the email also contained a couple email addresses, and a postal address. (This contact information has now been deleted from my database. It didn't need to be there as part of the source material.)

At least WikiTree is a Wiki – making quick edits easy. Within minutes of getting the email, I had removed the text of the email from the profile. Of course, as is standard for the Wiki software, the entire text gets moved to “Change History.”

I emailed the WikiTree staff and asked them if they were able to override the settings and delete the Change History. I learned this isn’t possible. The only solution would be to delete the individual’s profile and then recreate it manually. I had no problem with this solution (in my email I had actually told the WikiTree staff that I would do this if they couldn't) but I didn’t want to do it immediately.

Why not? Google had already cached the private information. [I suspect this is how my cousin discovered the private information was there - through a search on their name.] It can take weeks, or months, before Google updates a cache, and it is my understanding that it can take longer if the page no longer exists. WikiTree names pages with the “Surname-XX “format, where XX is a number. I knew deleting and recreating the profile would likely generate a different number, and thus a different page from Google’s perspective.

I didn’t want to wait weeks or months, though, so I conducted a search to see if I could manually speed up Google’s process, and I learned I could.

In order to remove a cached page you need to provide
1) The URL of the page you want removed
2) A word that appears on the cached page that no longer appears there.

This process will not work to update a page where you have only *added* information. It is precisely intended for those times where information is deleted for reasons of privacy or legality. My situation fit perfectly.

I submitted the request in the evening, and by noon the next day the cache was gone.

I then deleted the profile, along with the change history, and recreated the profile manually with only the information I had intended to make public.

My next step was to remove the cache from Bing. They have a similar form to fill out, however, they require the original page to no longer exist before they remove the cache. Which is why I deleted the profile first. They also require the form to be filled out by someone associated with the website, and specifically ask for an email contact, so I sent the information to the WikiTeam, and they submitted it for me.  Three days later, the cache has been removed. (Since Yahoo uses Bing's search engine, this deleted it at Yahoo as well.)

I suspect the information may be cached on some minor search engines, however, a search at DogPile and MetaCrawler (which both search multiple search engines simultaneously) no longer produces a cached result. Any other search engines will update their information in their normal process of events.

I am very thankful to the WikiTree team for the quick assistance they provided me in dealing with this situation.

No comments: