I have a personal blog where I talk about everything that comes to my mind. This is what it looks like now.
This is what it looked like in March 2005, July 2004, December 2003, and a full five years ago in August of 2002. All the links are courtesy of the WayBack Machine. Archive.org has been archiving the internet for several years now. Here's what another website I maintain looked like in October of 2000
There has been some talk about Ancestry’s caching of genealogical websites – such as USGenNet and genealogy blogs. Such as at Genea-Musings, About.com’s Guide to Genealogy , and Genealogue.
When I blog I know what I blog may appear elsewhere. I consider myself a poet, and have included some poetry in some of my blog posts. I’ve had some of this poetry appear on other sites without credit. (In these instances I emailed the owners and asked them to include a byline…which they did.) I’ve also had poetry I’ve written appear on websites, credited, but without people asking, which legally they are required to do…but I’m not wealthy enough to take them to court, and I don’t really mind, usually. I now have a Creative Commons copyright notice on the blog which allows people to distribute the content as long as they don’t make any money off of it, and as long as they give me credit. I don't have that notice on this blog. It's probably not going to appear here.
Of course, USGenNet doesn’t have a Creative Commons copyright notice on their site. And if you search for their archives at archive.org you will be able to access their archived homepage, but when you try to follow a link, you will receive the error msg: "We're sorry, access to [url] has been blocked by the site owner via robots.txt." Basically, robots.txt files are files webmasters put on their sites to tell searchbots that they shouldn’t archive their pages. I could put these on my site, but I don’t. USGenNet does. Understandably, too. Bots are still physically able to ignore the requests and archive the pages…- it's just respectable archival search engines (such as Google and Archive.org) don’t ignore the requests. Partially probably due to fear of legal retribution. Ancestry, apparently (key word - I'm still stating an opinion here) is ignoring these electronic requests. Note: I've been assured they didn't ignore robots.txt files.
As others have stated, I state as well, what this means legally is beyond me. I’m not a lawyer. I took a media law course in college over ten years ago, and have some clues, and this looks suspicious, but I am certainly not an expert. It should be interesting to watch if Ancestry does insist what they appear to be doing is legitimate, as there are a whole bunch of companies – completely outside of the genealogy industry – who might justifiably be worried about the results of a court case in Ancestry's favor. If a court decides Ancestry can cache pages on sites with robots.txt files specifically requesting pages not be cached … will Google and Archive.org decide to still be nice? I suspect every newspaper in the country has a stake in the answer to that question.
And while it certainly feels more reprehensible for Ancestry to charge for viewing their cached files, I suspect that newspapers or any other website which wishes to protect their content hope that's not the deciding factor, as archival websites making their content available for free likely isn't an acceptable solution from their perspective.