Will digital data still be here tomorrow?
Governments that rush to digitize their data today could be jeopardizing their ability to retrieve that data tomorrow. The problems of preserving a digital legacy have not been solved.
Greater reliance on desktop computing'which does not handle backup, recovery and archiving particularly well'is reducing the retrieval capabilities from what is possible with existing legacy computing platforms.
The twin challenges are to persuade the information technology industry to make more efficient archiving tools and to convince governments to set policies for long-term management of digital information.
There is a conflict between the continual march of new media technology and its ability to survive. Acid-free paper and microfilm last up to 500 years, but a CD-ROM may be usable for only 200 years. Life expectancy is about 50 years for new-media hardware and less than 10 years for the software that accesses or processes the data. Digital media is becoming like a single issue of a newspaper'both will last less than a generation, about 30 to 50 years.
Data freshness is squeezing out data longevity, as most new media sacrifice physical longevity for current capacity. Governments, meanwhile, spend money to refresh technology, allowing more and more data to be digitized, but they fail to budget for long-term data preservation.
As a result, data is moved offline relatively quickly and stored in rapidly disintegrating media. Today's data has moved beyond letters and spreadsheets to simulations and Web pages'and the mean lifetime of a Web page is about 70 days. Many documents contain a useful interactivity that will be lost. Without the means of displaying, navigating and interpreting the data, users may not be able to reconstruct a Web page, much less go to its related resources. Compounding the problem are the complexities of scrambling data through compression and encryption.
So what can IT administrators do? Turn to industry'it has made a start on the data architecture needed to solve this problem. Metadata can serve as a Rosetta stone of interpretation. Combine it with uniform standards for conveying data, maintaining data integrity and retrieving specific data from a storage or transmission system. Finally, add a universal translator, which converts documents continually to new formats and media as needed.
An effective solution also needs processes to manage digital continuity, to determine what information to preserve and the best way to preserve it. Unused data dies unless managers use these processes to refresh it. Finally, you need broad strategies for preservation, ranging from saving everything'did you know the Web is saved in its entirety every day?'to employing strategies for refreshing documents and data.
Consider saving in the most common file formats and avoid compression and encryption wherever possible. Using as much metadata as possible with a log of processes and changes to the digital object gives you a fighting chance of keeping that data usable for a long time to come.
Government must take the lead in protecting the data entrusted to it, and the IT industry can play a critical role in helping government achieve this goal. Success can be declared when future scholars have ready access to records from the latter half of the 20th century onward.Otto Doll, South Dakota's chief information officer, formerly worked in federal information technology and is a past president of the National Association of State Information Resource Executives.