- By Patricia Daukantas
- Aug 27, 2003
How NASA standardized data archiving
Donald M. Sawyer, interim head of NASA's National Space Science Data Center, has spent eight years working on an international standard for organizing digital archives.
Courtesy of NASA
NASA archivist Donald M. Sawyer has guided a digital archiving framework to International Standards Organization status.
The product of eight years of work, NASA's Open Archival Information System, or OAIS, this year became ISO 14721. Sawyer is interim head of the National Space Science Data Center at Goddard Space Flight Center in Greenbelt, Md.
OAIS establishes definitions and terminology for users and systems that interact with a digital archive, Sawyer said, so that everyone is on the same conceptual page. More information about OAIS appears at ssdoo.gsfc.nasa.gov/nost/isoas
Founded during the Apollo era, the space science data center was NASA's first official archive. 'At that time, we were collecting everything,' Sawyer said.
Over the years archiving chores were distributed as the space agency built active archives for various user communities. The center, whose Web address is nssdc.gsfc.nasa.gov, changed its role to long-term data preservation.
The OAIS project began in 1995 when ISO asked the Consultative Committee for Space Data Systems, an international group in which Sawyer participated, to develop a standard archive format.What did you say?
'We knew we could not get agreement on a standard archiving format because we really didn't have the terminology to communicate effectively,' Sawyer said.
Even within NASA, people in charge of the various data archives were using different terminology. In the Web's early days, data might be posted online and called an archive without regard to hardware obsolescence or proprietary formatting.
'It's not a simple matter to preserve information in digital form,' Sawyer said. A reference model is necessary to long-term preservation because 'digital information is inherently fragile,' he said. 'It's not something you can readily see, and it requires active management to stay on top of preservation.'
The hardware for accessing digital data, especially disk drive storage, keeps improving and costing less. That means archivists need an active migration management process, as well as a system of checks to ensure that bits aren't corrupted during transfers.
'Your technology can move out from under you in three to five years, and you need to migrate, so you have to do something active to maintain the information on short time scales,' Sawyer said.
A successful archive also needs adequate metadata so that everyone using it understands its format for information content, Sawyer said. That's especially true of scientific data, which tends to have a lot of jargon and specialized terminology used by small groups of experts.
In 1995 the international committee held its first workshop on a reference model for data archives. The workshop drew about 60 participants, and a core group of 10 to 15 started meeting several times yearly to work on a reference model. Sawyer co-chaired the group along with Lou Reich of Computer Sciences Corp.
'We didn't want this to be just a reference model for science archives,' Sawyer said. To influence vendors of archiving systems, the result would have to apply to digital archives in other fields and be independent of any specific technology or programming language, the group believes.
The OAIS principles can apply to any organization that maintains digital archives for longer than one technology change cycle, which might last just a few years, Sawyer said.
In the OAIS model, the archive interacts with three conceptual roles called producer, management and consumer. The producer role provides the data to be preserved, the management role sets archiving policies and the consumer role interacts with the archive to use the preserved data.
Recently Sawyer gave a talk on OAIS to staff members at the Library of Congress, which is applying OAIS principles to the National Digital Information Infrastructure and Preservation Program, at www.digitalpreservation.gov
Also, the National Archives and Records Administration has used OAIS in work with the San Diego Supercomputer Center on its Electronic Records Archives.
Sawyer's center uses OAIS as a framework for re-engineering archival ingest and storage processes.