NIST workshop takes first steps toward standards for preserving digital data
Congress has mandated that agencies preserve digitally born records, and the nation’s laboratories, libraries, museums and other institutions are preserving huge amounts of existing analog data in digital formats. But much of this information is at risk of being lost or inaccessible because of a lack of common standards.
“Everybody is doing their own thing to preserve data, but they are not doing it in a common way,” said Wo Chang, a computer scientist at the National Institute of Standards and Technology. “This is a huge problem.”
The problem was addressed by digital preservation experts at a workshop hosted by NIST this week to identify requirements for an international standard. An approved standard probably is at least two years away and then it will address only a preliminary set of needs, Chang said. “We want to address as big a picture as possible, but we have to prioritize.”
But that standard would be the beginning of a framework for preserving the terabytes, petabytes, and exabytes of data being created and saved each year in a bewildering variety of formats.
The U.S. Workshop on Roadmap for a Digital Preservation Interoperability Roadmap, held in Gaithersburg, Md., March 29 through 31, was co-sponsored by NIST with the U.S. International Committee for Information Technology Standards, the International Standards Organization and the International Electrotechnical Commission. It will be followed by an international symposium in Dresden, Germany, beginning April 21.
“Both roadmaps will be combined and provided to the ISO/IEC study group to standardize a digital preservation interoperability framework,” said Chang, manager of the Digital Media Group in the Information Access Division of NIST’s information Technology Lab and chairman for the program.
Chang called the initial workshop a success. “I was only expecting about 60 people,” he said, and more than 100 showed up. “We had a lot of good conversations.”
Presentations were given in three tracks: Content Organization, covering the operations of agencies that produce, manage and preserve the data; Technology, a survey of the available means of preserving data; and Standards and Best Practices, focusing on how technology is applied.
Because of the amount of digital data already in existence, any standard will have to work within existing technology and infrastructure, Chang said.
“That is a fundamental approach,” he said. “Some experiments you can’t reproduce,” and there is no way all existing data could be reformatted to accommodate a new standard.
Finding a standard framework that will incorporate the current state of technology will be a challenge. “We don’t know how to tackle it,” Chang said. But there are ideas. “I have a solution, but it is not the solution.”
His vision is to encapsulate existing data, adding new metadata about the formatting and metadata contained within the envelope. This would allow users to identify the data and to find what data is useable. This would not make the data universally accessible, however, and would only be a beginning.
Is it possible to preserve everything in an interoperable way? “Nobody knows,” Chang said. But agencies have a mandate to preserve their records and other data and their needs will provide a starting point for identifying the use cases to be prioritized and addressed in a standard.
“There will be many other use cases not addressed,” he said. “We can’t do everything at once.” Future iterations of any standard will build on progress achieved in the first versions. One thing that could help adapt data to a common standard for preservation would be to adopt a common workflow for capturing metadata in a systematic way. Doing this at the beginning would aid standardization going forward.
The ISO will meet in June to set preliminary priorities based on the results of the Gaithersburg and Dresden meetings, and a final set of priorities based on feedback from participating national bodies will be established at a meeting in late August or early September, Chang said. If the standard is put on a fast track with four meetings a year, a final product could be available for approval in 18 to 24 months.