One factor I haven't heard mentioned in these discussions is: How Often Will the Internet Wayback Machine at Archive.org 'crawl' your site?
Archive.org has been preserving webpages for over a decade. This can be very useful. Bloggers may wonder what will happen to their blog posts if their blog disappears -- but if it's being archived somewhere else, survival isn't completely dependent upon the blogger's backup regimen.
I recently discovered that:
1) This blog, which currently has over 200 subscribers according to Google Reader, has been 'crawled' by the archival spiders at the IWM a grand total of 2 times, both in 2008, and a total of 67 pages have been preserved.
2) A blog I have maintained since 2002 on my personal domain, and which currently has 6 subscribers according to Google Reader, has been 'crawled' 52 times since 2006, and a total of 5359 pages have been preserved. (Note: This is a Wordpress blog, and has separate 'pages' for comments, trackbacks, and rss feeds, so the number is probably closer to an equivalent of 1000 pages.)
3) I decided to look at the results for some other blogs. I've decided not to name them. Those who are curious about their own blogs, can follow the links above, and replace the URLs for my own blogs with any other site they wish to test.
I looked at three other popular genealogy blogs maintained on Blogspot, all with more subscribers than I have through Google Reader. Two blogging since 2008, and one blogging since 2006. The former two have been crawled twice each, with 15 and 23 pages preserved. The one blogging since 2006 has been crawled 7 times, and has 346 pages preserved.
Then I looked at two popular geneabloggers, both blogging since 2006, who switched to a personal domain back in 2008. Their Blogspot blogs were crawled 2 and 7 times, with 60 and 783 pages preserved respectively. Their personal domains have been crawled 37 and 40 times since 2008, with 427 and 2118 pages preserved respectively. While the numbers are different, moving to a personal domain clearly benefited both on this measurement.
Following some links on the archived pages results in this error:
Some references to the Blogspot Robots.txt suggest its primary purpose is to prevent the 'duplicate' pages that otherwise might result, as exemplified by the 5000 pages the Internet Wayback Machine has preserved for my Wordpress blog. But it appears to be having a larger impact than that.
The Robots.txt file is on the Custom Domains as well, so it's not the entire explanation. The Internet Wayback Machine might treat Blogspot, in general, differently.
Why did I originally set my genealogy blog up on Blogspot?
I didn't at first. For the first few months all my genealogy-related posts were a subset of the personal blog referenced in (2) above. But as I grew more obsessed with genealogy, I knew I needed a separate space devoted to the one topic. So many other geneabloggers were using Blogspot, and it was easy to use, so that's the direction I went.
It wasn't a mistake, per se. Blogspot has been a fine home. But I've considered moving the blog back 'home' before, and this was just the proverbial straw for me.
All of this explains why as of this post, this blog is no longer located at http://transylvaniandutch.blogspot.com - but is now at http://blog.transylvaniandutch.com
All links to the former Blogspot version should forward automatically to the new page.