Tuesday, August 17, 2010

Tuesday Tech Tip: Google's "Site:" command is not always better

Kerry Scott over at the Clue Wagon discusses Google's site search function.
  1. Go to the Google homepage.
  2. Type in site:[name of site] [keywords] For example, if you wanted to search for mentions of Glenbeulah on this site, you’d type in site:cluewagon.com Glenbeulah.
As she points out, this can be a very useful function for searching websites that don't have a search function.  However, she concludes:
I almost never use a site’s native search box, because I prefer the consistent results of doing it this way.
Be careful.  Google doesn't index every page of every website.  And the pages it indexes, it doesn't always capture every word.  In many cases, if a blog or website provides a native search function, it will actually yield more results than Google will.

Below are the Google results searching my non-genealogy blog for the word 'Napoleon'.  It yields 8 results.
Here are the results if you use the Wordpress search box on the blog.  There are 9 results.  The entry Google missed contains the word 'Napoleonic.'  That's only part of the reason Google missed it.
Two identical searches, conducted about 3 minutes apart, with different results.  So Google's results aren't even consistent.  However, even if it were consistent, Wordpress's ability to find different word-endings of the same word makes it a superior method.

What about for this blog? 
Here I use Blogger, which is owned and operated by Google.  Does this make a difference?

When I use Blogger's search function on this blog, and search for Horton, I get 17 results.

Google gives me 68 results.  This at first glance seems a marked improvement.  However, the word "Horton" appears on my blog's sidebar, and Google's search function can't tell the difference between the sidebar and the blog entry.  Theoretically, Google should return all 925 entries as hits, if it counts the sidebar.  Not counting the entries in the results that only appear there because of the sidebar, I think there are 11 results.  I don't feel like spending the time figuring out which entries it missed, but I'm pretty sure it did miss some.  And the extra chaff it provides makes it more difficult to find the wheat.

What about using the site: command at Google's blogsearch?

This provides interesting results.  Blogsearch finds all 17 entries for Horton (and only these 17 entries.)  Google's Blogsearch understands the difference between sidebar and entry, while Google's main search site doesn't.  And (maybe) because Blogger is a Google product, they've indexed Blogger blogs better.

My search for Napoleon at Blogsearch yields only two results.

This isn't an adequate study, but I think I might avoid using blogsearch for non-blogger blogs.

Blogger and Wordpress blogs make up a sizable number of the blogs out there, and I am quite happy with the results their database search functions yield.  Google's results are inconsistent, and incomplete. But if you are at a site that doesn't have a search function, Google's sitesearch command is useful to remember.  It's better than nothing.

No comments: