Showing posts with label Blogger. Show all posts
Showing posts with label Blogger. Show all posts

Thursday, February 14, 2013

CAPTCHA Cha-Cha

CAPTCHA is an ancronym for Completely Automated Public Turing test to tell Computers and Humans Apart

A Turing Test - named after Alan Turing - is a test of a machine's ability to simulate a human. A test to tell computers and humans apart.

[Obviously, the CHA part of the acronym is redundant and repetitive. Apparently those who coined the term wanted to add half of a dance step to it.]

Some people refer to a CAPTCHA as a "Reverse Turing test," since it is a computer program providing a test to a human (or another computer program). You're probably very familiar with CAPTCHAs. And you probably hate them.


Above is the CAPTCHA form you will find on blogs using the Blogger software. Often they are a lot more difficult to read than the sample above. I have been blogging since 2002, and up until November of 2012 I avoided installing a CAPTCHA. Here's a 2005 post on my non-genealogy blog explaining my anathema. (At that time I used a blogging system called MovableType. A week later I switched to Blogger. In February of 2006 I switched to WordPress. When I created this separate genealogy blog, though, I chose Blogger. MovableType still exists, maybe someday I will experiment with it again. Or try something else.)

Where was I? In November of 2012, on this blog, I was getting enough spam in my moderation queue on a daily basis I felt it was taking too much time to delete it all. I discovered Blogger's CAPTCHA had an audio option for the visually impaired, overcoming one of my major complaints about CAPTCHA in 2005. So I added the CAPTCHA, and that solved the problem instantly. For me. I'm sure it created problems for some of my readers. Most of you were kind enough not to say anything. (I do have an email address posted, even if you found yourself unable to leave comments.) However, I have received some comments recently almost begging me to remove the CAPTCHA.

I decided to test removing it, and see how much spam I received. November might have been a temporary fluke. For a few days the CAPTCHA hasn't been there. I have been getting some spam, but not nearly as much as I was getting in November. I can handle it at this level.

Some other bloggers recommended to me that I remove the "Anonymous" option in my comments. It is true that all the spam I receive in my moderation queue was/is anonymous (Back in November, and Today). However, I feel the Anonymous option is essential, at least for me.

Here is what the "Choose an Identity" portion of my blog's comment screen appears like currently:
For someone who doesn't own a Google Account, and has no clue what OpenID even means, much less how to use it, the only two choices are Name/URL and Anonymous.  If I could, I would remove just the Anonymous option. But on Blogger, removing "Anonymous" also removes "Name/URL."

Leaving:
There are many many many internet users who would look at that, scratch their head in consternation, and probably curse a little. CAPTCHAs might be irritating, but in my estimation this would result in greater confusion and annoyance. I want to scare away as few distant cousins as possible.

So What Happens if Your Spam Increases Again?
Will the CAPTCHA return?
Maybe not.

I've been investigating several other options.

1) Leave Blogger

WordPress allows for a lot greater flexibility in Spam Control. Their Akismet plug-in is exceptionally good at catching spam. I'm surprised Blogger isn't as good, as Google's Gmail is also exceptional. If the Akismet plug-in started to fail, there are some plug-ins with significantly more benign CAPTCHAs. Honeypots and Checkboxes are my current favorites.

A Honeypot Captcha inserts an invisible input box into the comment form; that is, it is invisible to humans. The spambots will see the box, and fill it in. The biggest problem with honeypots is that many people use automatic scripts on their browsers to fill in forms. Those scripts see the honeypots too.

Checkboxes, as the name suggests, inserts a simple checkbox asking if you're human. The spambots aren't (currently) expecting this. I suspect as checkbox CAPTCHAs become more common, the spambots will become smarter.

I prefer Blogger's user interface, but this is a possibility. There are other blogging systems I might investigate. My website host offers easy installation of either B2Evolution or Nucleus - neither of which I am familiar with. I'd appreciate insight from anyone who has used either of them. 

2) Install Disqus or IntenseDebate

I've heard that both of these comment systems can be installed on Blogger, and both have greater means of spam control. I would appreciate any insight from those who have used either of them, from the perspective of either blogger or commenter. I don't receive a lot of comments on my posts, so my main reason for using these would be spam control.

3) Install a completely separate comment script on my website, and link to it from my blog. There are lots of options here. I have a very simple guestbook/messageboard on another site that is completely spam-free. This would allow those who can't navigate the CAPTCHA to leave a visible comment for me.
 
Whatever I do, I will always provide some means for people to leave comments, and I will try to make it as user friendly as possible.
 
If anyone has any thoughts on this topic, please feel free to share them in the comments.

Saturday, November 26, 2011

The Internet Wayback Machine - and Blogspot

There are occasionally discussions regarding whether or not a blogger should host their blog for free at Blogspot, Wordpress, etc - or buy their own domain name.

One factor I haven't heard mentioned in these discussions is: How Often Will the Internet Wayback Machine at Archive.org 'crawl' your site?

Archive.org has been preserving webpages for over a decade.  This can be very useful.  Bloggers may wonder what will happen to their blog posts if their blog disappears -- but if it's being archived somewhere else, survival isn't completely dependent upon the blogger's backup regimen.

I recently discovered that:

1) This blog, which currently has over 200 subscribers according to Google Reader, has been 'crawled' by the archival spiders at the IWM a grand total of 2 times, both in 2008, and a total of 67 pages have been preserved.

2) A blog I have maintained since 2002 on my personal domain, and which currently has 6 subscribers according to Google Reader, has been 'crawled' 52 times since 2006, and a total of 5359 pages have been preserved.  (Note: This is a Wordpress blog, and has separate 'pages' for comments, trackbacks, and rss feeds, so the number is probably closer to an equivalent of 1000 pages.)

3) I decided to look at the results for some other blogs.  I've decided not to name them.  Those who are curious about their own blogs, can follow the links above, and replace the URLs for my own blogs with any other site they wish to test.

I looked at three other popular genealogy blogs maintained on Blogspot, all with more subscribers than I have through Google Reader.  Two blogging since 2008, and one blogging since 2006.  The former two have been crawled twice each, with 15 and 23 pages preserved.  The one blogging since 2006 has been crawled 7 times, and has 346 pages preserved.

Then I looked at two popular geneabloggers, both blogging since 2006, who switched to a personal domain back in 2008.  Their Blogspot blogs were crawled 2 and 7 times, with 60 and 783 pages preserved respectively.  Their personal domains have been crawled 37 and 40 times since 2008, with 427 and 2118 pages preserved respectively.  While the numbers are different, moving to a personal domain clearly benefited both on this measurement.
    4) The last page preserved for each Blogger-blog has the exact same filename, and may be part of the reason why so few pages are preserved:  robots.txt.  

    Following some links on the archived pages results in this error:

    From what I have found researching so far, Google added the robots.txt files to Blogger blogs in 2007. (Explaining perhaps why those blogging since 2006 were crawled a little more) This file, which cannot be changed, is preventing search 'robots' from following certain links on the blog.  I'm not entirely certain which links are blocked, and which ones aren't. It's certainly not stopping Google from indexing their blogs.  Google has owned Blogger and Blogspot since 2003, and certainly wouldn't do that.  But it appears to have an impact on how other robots crawl the site.

    Some references to the Blogspot Robots.txt suggest its primary purpose is to prevent the 'duplicate' pages that otherwise might result, as exemplified by the 5000 pages the Internet Wayback Machine has preserved for my Wordpress blog.  But it appears to be having a larger impact than that.

    The Robots.txt file is on the Custom Domains as well, so it's not the entire explanation.  The Internet Wayback Machine might treat Blogspot, in general, differently.


    Why did I originally set my genealogy blog up on Blogspot?

    I didn't at first.  For the first few months all my genealogy-related posts were a subset of the personal blog referenced in (2) above.  But as I grew more obsessed with genealogy, I knew I needed a separate space devoted to the one topic.  So many other geneabloggers were using Blogspot, and it was easy to use, so that's the direction I went.

    It wasn't a mistake, per se. Blogspot has been a fine home.  But I've considered moving the blog back 'home' before, and this was just the proverbial straw for me.



    All of this explains why as of this post, this blog is no longer located at http://transylvaniandutch.blogspot.com - but is now at http://blog.transylvaniandutch.com

    All links to the former Blogspot version should forward automatically to the new page.

      Wednesday, June 9, 2010

      Regarding the Blogger Outage

      Judging from comments in the Blogger Discussion Forum, from Sunday at about 8pm to Monday at about 3pm (Central Time Zone) – Blogger was completely inaccessible by many users – mostly in the Central Time Zone in the US and Canada. That’s nineteen hours.

      As one of those impacted by this outage, there are a few observations I’d like to point out. [Despite the date-stamp I used, my Amanuensis Monday post this week was delayed until Tuesday morning.]

      First – It’s frustrating that there was no mention of this outage in any online newspapers or major tech newsblogs I could find. Think about this. A major blogging platform. For 19 hours many users couldn’t access it. I can’t help but think if it was the East coast or West coast users that were impacted, at least the tech blogs would have mentioned it. But since the impacted users were all in ‘flyover country’ the story was ignored. At least, that is how it looks to me.

      Second – whenever something like this happens, users question whether to stay or go elsewhere. There were comments in the discussion forum from users who were saying they were finally going to take their blogs to their own domain because of this.

      My reaction, however, was exactly the opposite. I’ve been there. I have a handful of personal domains. All of which have been down at some point or another. And I had to deal personally with the support staff each time. Often they are friendly, sure. They may even be friendlier than the Google Support Staff. But are they likely to get the site fixed more quickly? No. And I often have to be more hands-on with the tech support issues. I don’t really have the time for that.

      Once I knew the Google team was working on the issue, I could focus on other tasks. It was frustrating it took that long, but I have no reason to believe they were dallying.

      There are other blogging platforms such as Typepad, Wordpress, Posterous, and Vox. I'd probably switch to using one of those if I ever felt Google's Blogger was becoming too unreliable for me.  But one major outage in the past three years is not going to convince me to jump ship.

      Wednesday, January 20, 2010

      Pages come to Blogger-in-Draft

      Note: As of 9 am this morning, January 21, Pacific time, this feature was 'temporarily' disabled. While pages created work, new pages can't currently be added. Pages were restored at 9:32 PM Pacific according to the link above.


      If you're actually reading this on the website, and not on Facebook, or through a newsfeed, you will see links at the top of the blog to several pages I have created.

      Pages are something new for Blogger - released today in "Blogger-in-Draft." If you have a 'Blogger' blog, you can access Blogger-in-Draft at draft.blogger.com. New features for the blogging software are released there first, giving users a chance to "beta-test" them.

      Those of you who use WordPress, perhaps Blogger's biggest competitor, may be scratching your head in disbelief, as WordPress has had pages for years. Basically, pages are the same thing as posts, but they don't have dates attached to them. So they don't appear in your archive. You link to them in your sidebar or underneath your header as I have done.

      The Blogger-in-Draft blog has an explanation, with more details.

      The pages I created are copies of pages I had created at my personal domain, and had been linking to from my sidebar. I removed the links, as they will be easier to edit and update here. Several of them do need some updating.

      If you try it, and discover that your paragraph breaks are disappearing, under Post Options change "convert newlines" to "ignore newlines." That should work. Blogger will probably fix this bug soon, and you won't have to do that. Playing with something when it is in 'beta' always has its quirks.

      A hat tip to Jasia, who mentioned this new feature on Facebook, and Miriam who shared it on Google Reader. Otherwise, I wouldn't have known about it.

      Friday, October 9, 2009

      Blog Indexing

      Apple writes about Creating a Blog Index.

      Her post made me think about my attempts at organizing the growing information on my blog. Originally, like her, I focused on the labels that Blogger provides (On other blogging systems they are often called categories.) And I had all the labels listed in my sidebar. A list that grew and grew.

      Recently Blogger added a feature that allows you to selectively add labels to your sidebar. (It's no longer all or none.) So now I have three listings: 1) Posts by Surnames 2) Posts by Locale 3) Carnivals and Memes. There are many other labels I have used in the past, and continue to use, but they are no longer in the sidebar.

      The only 'true' index I have created is an index of my Amenuensis Monday transcriptions, which I have placed on my main website - where I also have created some lists of genealogy resources. This blog may someday leave Blogspot and return to its original location. I moved it here at a time when there was nothing else genealogical on my website and I wanted to make this separate.

      The search features Blogger offers eliminates some need for indexing. The search box in the "Blogger bar" doesn't work very well -- it doesn't conduct an every-word search on the posts, and misses a lot. However, the Search Gadget they provide for the sidebar does a much better job. And recently they added the ability to search every page linked to from the blog, as well. I'm not certain how comprehensive this feature is, but it looks impressive.