Digital records swamp NARA

'E-government is exploding, and electronic record-keeping is not keeping up,' NARA's Reynolds Cahoon says.

NARA must figure out how to ensure access to e-records no matter their format.

'NARA's Reynolds Cahoon

Henrick G. DeGyor

Agency struggles to keep afloat amid sea of data

The National Archives and Records Administration received more electronic files from the Clinton administration than it had in the previous 30 years from all of government.

This electronic deluge has forced NARA officials to take stock of the data tsunami heading their way.

'The rapid evolution of IT has produced huge volumes of diverse and complex digital records,' said John Carlin, archivist of the United States. 'NARA still lacks proven methods for preserving most forms of e-records that will be created in the near- and long-term future.' For example, over the next 10 years, NARA officials said, they expect to receive 1 billion electronic personnel files from the Defense Department, along with more than 40T of data from the 2000 Census. And that's just a sampling of the electronic records agencies are slated to send NARA in coming years.

'E-government is exploding, and electronic record-keeping is not keeping up,' said Reynolds Cahoon, assistant archivist for human resources and information services at NARA. 'We have to find a way to free e-records from the hardware, software and data formats that created them and do it in a way that ensures access to those records for as long as they are needed.'

NARA is working with agencies and industry to develop an electronic archival system that is independent of any specific hardware or software, maintains record authenticity, ensures context, guarantees that the records will never become obsolete and are retrievable forever, Cahoon said.

NARA is concerned that with more than 4,800 formats in use by agencies and with technology advancing at a rapid pace, Cahoon said, electronic records will become inauthentic or, worse, unavailable if newer technology does not support old formats.

Cahoon compared NARA's dilemma to opening a Microsoft Word 3.1 document in Microsoft Windows XP: It could be done, but XP would not support the document's formatting and the record would lose its authenticity.

Whatever system NARA creates must store and read data in its original format no matter how it was created, he said. This is a complex issue that has been troubling the government's archivists since the dawn of computing. But only over the last decade has the magnitude of electronic data begun to swamp the agency in ways comparable to the paper records catacomb maintained by NARA.

NARA is working with other federal groups on the essential data standards for different types of records, and it will organize working groups by lines of business to make it easier for agencies to agree on those standards, said Dan Jansen, a staff member on the electronic-records archives team.

NARA is under pressure to develop a system because on top of the normal increase in electronic records, agencies are faced with meeting the October deadline of the Government Paperwork Elimination Act of 1998. The law directs agencies to make all transactions electronic, driving the creation of even more data.

Although NARA has been working on electronic archiving systems for nearly two decades, in 1998 it began a new 10-year effort to refocus on preserving, managing and providing access to billions of government documents, photographs and recordings, Carlin said.

'Many records have survived hundreds of years, but the same cannot be said for e-records,' he said. 'Records created just a few years ago are already unreadable by today's technology.'

NARA late last year released a request for information about storage technology from integration and software vendors. It received more than 40 responses.

Next, the agency plans to release a solicitation with the goal of awarding a contract before next year. The system's initial deployment is scheduled for 2007, said Ken Thibodeau, director of NARA's Electronic Records Archives Program.

In the meantime, NARA will conduct two pilots to test features the future system should include, Thibodeau said.

Unearthing archives

In the first, NARA soon will unveil software called Access to Archived Databases that will give researchers a tool for searching for specific records in the agency's databases.

The agency also will test software called the Presidential Electronic Records Pilot System to pull specific records from hard drives taken by the FBI from the White House at the end of the George H.W. Bush administration. The FBI pulled the hard drives from the 286-processor PCs in use during an inquiry by an independent counsel. After the counsel issued a report on the inquiry, the bureau turned the hard drives over to NARA.

'We want to see how this type of system will meet our operational needs and see how well it will give access to unlimited databases,' Thibodeau said. 'It also will give us experience from the user's point of view.'

Thibodeau said that by the end of this year, NARA wants to make roughly 50 databases available for searches.

As it moves ahead with its archival system development, NARA will settle on standard metadata elements to describe the content, context, structure and presentation of the records that agencies submit, Jansen said. NARA will create templates using the standard elements, he said.

From these templates, the working groups will decide on other data elements specific to agencies' lines of business, Jansen said.

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above