Sunday, March 7, 2010

Preservation of the 2010 Census

This was going to be part of my weekly links, but I realized I had enough commentary for a separate post.

NARAtions, the blog of the US National Archives, dispels the rumors that 2010 census data will be destroyed.

These rumors began because Data Killers, a data destruction company announced in a PR release that they were hired for "on-site media destruction services" by the Census Bureau. Obviously, any governmental agency continuously has data that needs to be destroyed. But some individuals, on very little data, and without conducting further research, jumped to the wrong conclusion and went into chicken little mode.

Which I will do my best to avoid. But we now have some more information, and some of it is worrying.
NARA has not officially received and registered a proposed records retention schedule from the Census Bureau for the 2010 census.
The 2010 census is planned as an all-electronic census which will affect the format in which permanent records are preserved. The Census Bureau will scan the respondent questionnaires as part of its business process for compiling the census. The draft schedule calls for the permanent retention of the scanned digital images. These scanned images are the 21st century equivalent to the microfilm copies of census forms generated for previous decennial censuses.
This sounds good. The data will not be destroyed. However, the draft plan is for the census to be 'all-electronic.' This suggests no backup. Is this a problem? Are scanned images the 21st century equivalent to microfilm copies? I don't know. I'm not an expert.

But I remember Sally Jacobs, the Practical Archivist saying Scan-and-Dump will not be OK "until we solve the problem of long-term digital preservation." In the same article she said "film and dump" is OK, as long term preservation of microfilm has been proven. Jill Hurst-Wahl at Digitization 101 also questioned the practice of destroying originals.

These articles are almost three years old, though. A lot can change in the world of technology in three years.

As I understand it, this is one of the issues of long-term preservation. 30 years ago computer data was retained on punch cards. Punch card readers are hard to come by today. I will go out on a limb here and say there exists no one on Earth who knows for sure what we will be storing data on 72 years from now.

As I understand it, another issue of long-term preservation is that digital media does degrade over time. Unlike microfilm, you can't put it on a shelf, keep it in a temperature-controlled space, and be assured that all of it will be there to view 72 years later -- even if you have the proper viewer.

Has the problem of long-term digital preservation been solved? Can we assume that the National Archives knows what they are doing, and that the final records retention schedule will include the steps necessary to make certain the census data will be there in 72 years? I would expect they have some of the best archivists in the nation working for them.

I don't know. I'm not an expert. I wait to hear from someone with a little more background to weigh in on the subject.

A second part of the Census Bureau's statement is also worth noting.
In addition, the Census Bureau is also proposing permanent retention for the unedited file containing response data, with linkage information to the scanned images. This means that once the census is opened to the public 72 years from the enumeration date of the 2010 census, genealogists will have two means of searching for their ancestors. They can search the database, which will contain all the data (including names and addresses) from the respondent forms. They can also use the database to locate and retrieve images of the forms themselves.
If I understand this correctly -- they have a 72 year plan to replace Ancestry, Footnote, FamilySearch and other genealogy websites when it comes to accessing census records. I don't think any of them can complain about a 72-year notice.

But this leads me also to remember a post I wrote in 2008. 2010 Census - Will Genealogists Care in 2082?
Records on people prior to 1900, and in the early part of the 20th century are sparse. That's a lot of the reason we love the census. In some cases, it's all we've got. But it's not like relatives are untraceable after 1930. Record keeping starts to improve greatly in the 20th century. Vital Records are going online. In 2082, 72 years after 2010, when the 2010 census is released, I'd be surprised if there was much information on it that wasn't obtainable easily somewhere else. Maybe genealogists will welcome it as a verification of what they have learned elsewhere. Or maybe we will enjoy reading what our ancestors "said" as opposed to what we we already "know."
If the National Archives makes a mistake, and the retention schedule they produce ends up not working, and all the digital data disappears into the digital ether before 2082, will it be the black hole in the lives of genealogist's ancestors as the missing 1890 census is today? I am unable to see the future, but I don't think so.

The one exception I see is those genealogists with illegal immigrants as ancestors. Officials are working to convince them to fill out the census forms, because even though they are here illegally, they are using the schools, and the hospitals, and the amount of money each state receives for these services is dependent upon the census results. The census may be one of the few government documents that manages to record an illegal immigrant in 2010.

No comments: