It's a huge challenge: how to store digital files so that future generations can access them, from engineering plans to census records to family photos. The documents of our time are being recorded as bits and bytes with no guarantee of readability down the line. And as technologies change, we may find our files frozen in forgotten formats. Popular Mechanics magazine asks, "Will an entire era of human history be lost?"
The article quotes Ken Thibodeau, head of the National Archives and Records Administration's Electronic Records Archive (ERA). The National Archives is charged with the task of preserving all historically relevant documents and materials generated by the federal government-everything from White House e-mails to the storage locations of nuclear waste.
The entire article sounds a bit alarmist to me, but still it is interesting reading. You can find it at http://www.popularmechanics.com/technology/industry/4201645.html.
Massachusetts is one of the first governmental entities to adopt the OpenDoc standard for all public documents. The OpenDoc standard does not include proprietary/licensed technology - unlike the document standard Microsoft is promoting. Like PDF/A mentioned in the Popular Mechanics article, OpenDoc is an important step towards serious archival options.
Posted by: Denise Olson | November 21, 2006 at 05:50 PM
The issue of future readability is why I write my journal in HTML.
I figure that the Internet has to support old standards. So HTML is likely to be supported for a longer time than other formats. Plain text
would be even safer, but I occasionally want formatting, color, font or such.
Posted by: Lorin Lund | November 22, 2006 at 09:11 AM
Digital preservation is a long-standing problem. The most famous case I remember hearing about in library school about 20 years ago was that of the 1960 Census for which the Hollerith cards (those "IBM" punch cards) had become too brittle to be read, so they were microfilmed (obviously in hopes that the film would one day be scanned by machine and the data converted to usable form). Essentially, we did not have the raw data of the 1960 census, only the summaries. (This situation may have changed.)
In the suggestions in the article (esp, "gold" CDs), everyone should seek:
1. Redundancy (have data in multiples places (online, on-disc, on hard drive, and on paper);
2. Test the data and the files periodically;
3. Migrate data and files to new media and (if needed) new formats if the formats are proprietary.
(Note on open formats mentioned above: OpenDoc and HTML are useful, but use with some caution: I recall previous attempts at open formats--some less successful than others. Not all of the Netscape and IE extensions to HTML in older versions of HTML are acceptable in the latest HTML standards, but you usually can find a browser which will "render the code gracefully" [to quote programmers]--one hopes. Always make sure there's software to open, edit, print, share and save your files. Sometimes that may mean keeping an old piece of software or an old computer--just as NARA and the Library of Congress have been forced to do.)
4. Track down stray files. E-mail is especially prone to being "fugitive" as the archivists at NARA point out. If you don't have an automated system to save e-mails (which are sometimes stored in proprietary databases like MS Outlook), then you should exercise self-disipline and either create a digital filing system or print out copies of important messages.
5. And please don't forget "old fashioned" media. I recall the sf story (was it _A Canticle for Liebowitz_?) in which a man survives nuclear holocaust but no one understands the microfilm strapped to his body because there are no electrical lights, no magnifying lenses, and no microfilm readers. Worst case, yes, but if your research is valuable, you'll keep it available at the most accessible level of technology.
Thanks for the article link, Dick.
Posted by: Paul Romaine | November 22, 2006 at 06:11 PM