NARA lacks processing power

The National Archives doesn't have the computing power it will need to copy and
store the millions of electronic files headed its way, Archives officials said.


Deputy archivist Lewis Bellardo brought up the problem confronting archivists during a
speech to members of the Association for Federal Information Resources Managers in
Washington.


When President Clinton's term expires, Bellardo told the group, the National Archives
and Records Administration will receive more than 20 million files from the executive
office.


"With the current tools we have at our disposal, if we started processing that
documentation when Clinton leaves office," he said, "it would take us 10 years
using our full capacity--doing nothing with the rest of the federal government--to
preserve it once.


"Then, after 10 years, we have to preserve it again."


To handle the flood of individual documents--some as small as single-page e-mail
files--NARA is considering using a Defense Advanced Research Projects Agency supercomputer
to add the necessary kick to its Archival Preservation System (APS), Bellardo said.


The supercomputer is only one computer option Archives is considering.


"We're looking for advanced computational capabilities beyond
state-of-the-art," said Ken Thibodeau, director of NARA's Center for Electronic
Records.


Most modern computer systems handle fewer than 2 million files annually, he said,
"which is not in the ballpark that we need to deal with."


APS converts electronic files into more secure formats. To do this, archivists must
first deal with whether the media containing the documents--diskettes, tapes, compact
disks--are in good condition, Thibodeau said.


Some have been subject to years of environmental stress and mishandling, he said.


"You don't know [a file's] life history, so the first step is to get it onto a
medium that you can trust," Thibodeau said. "That's the core function of the
APS."


The result is an archive of documents in a standard digital text format.


"It may be going from a square tape to a round disk, but ... they're all
completely standardized," Thibodeau said.


Tapes were considered the only reliable storage mechanism when APS was designed in
1990. But that all changed when the courts ruled for the first time that electronic
documents are records and must be handled as such.


"We were suddenly saddled with responsibility for 6,000 volumes of stuff, mainly
backup tapes coming off various [executive office] systems," Thibodeau said. "At
that point we had zero in-house capability."


Even before APS was delivered, NARA had it redesigned to process the new electronic
records.


"We went from handling several hundred files to several thousand, which is not too
bad," he said. "But that's not going to get us to several million," he
said.


Expanding the current system won't work, Thibodeau said. APS operates on PCs linked by
Ethernet to an IBM RS/6000 server running an Oracle database.


"If we got a hundred of these PCs on the floor, it wouldn't handle the
workload," Thibodeau said. "Before we finished copying the files, we'd have to
start migrating them" to the archives.


In addition to talks with DARPA, NARA is looking at the possibility of combining
individual files into one file.


"If I can put 1,000 messages into one file, my information bottleneck is reduced a
thousandfold," Thibodeau said.


The large number of individual files wreaks havoc on the system.


"We can copy 200M in 15 minutes, but we can't copy 100M in thousands of files in
50 hours," he said.


NARA officials hope archiving experts can come up with options.


"We have no predilection about what platform it may eventually run on," he
said. They want to first engineer an ideal system, then decide the platform to run it on,
he said.


inside gcn

  • analytics (Wright Studio/Shutterstock.com)

    3 data strategies to help crackdown on internal corruption

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above