For the record, NARA techie aims to preserve

For the record, NARA techie aims to preserve

BY RICHARD W. WALKER | GCN STAFF

Question: How are the government's electronic records going to be preserved over multiple generations of technology so that future archivists and historians can access them?

Answer: Nobody knows yet.

But Kenneth Thibodeau, director of the National Archives and Records Administration's Electronic Records Archives (ERA) program in College Park, Md., thinks he's on the trail of a solution.

It's called persistent object preservation.

Computer scientists and engineers at the San Diego Supercomputing Center at the University of California at San Diego, one of the ERA program's main research partners, are developing the technique.

Kenneth Thibodeau
'We want to drive the research to the breaking point before making any commitments,' NARA's Kenneth Thibodeau says.
'Persistent object preservation is essentially designed to give us applications that are independent of the technology they use at any given time,' Thibodeau said.

Other proposed techniques of preserving electronic records over long periods have the liability of being dependent on the technologies that deliver them'technologies that are bound to become obsolete over time.

Emulation, for example, simply keeps the original hardware and data formats working.

Migration to new software formats scraps the original hardware but requires you to reformat the data every time a new technology comes along.

When you're talking about accessing digital records over centuries'not years or even decades'those methods clearly aren't feasible.

Persistent object preservation takes an entirely different approach, one that liberates a record from the computer that created it.

'The simple idea behind persistent object preservation is that you focus on the properties of the record you're trying to preserve rather than the artifacts of the technology you use either to create or store it,' Thibodeau said.

State it simply

In the persistent object method, the structure of a record and of aggregates of records is described in plain language'simple tags and schemas'so that any future technologies, and people, will recognize the essential properties of the record and be able to access it, he said.

That gives managers the ability to change hardware and software over time with no significant impact on the records that are being managed and preserved.

'What San Diego is telling us is that records in this format should be good for 300 to 400 years,' Thibodeau said.

Not that NARA is going to take anyone's word for it.

'In all of our research, we require empirical demonstrations,' he said. 'Don't just tell us this can be done. Do it, and show us.'

To gather proof, scientists at the San Diego facility have tested the persistent object method on a variety of electronic records.

Among them are databases that span 30 years of database technology, a collection of a million e-mail messages, and a group of 10,000 digital images of artworks and artifacts from museums, Thibodeau said.

So far, San Diego's demonstrations of the technique have been consistently successful, making it the most promising method yet for preserving digital information and electronic records, he said.

That marks a major step forward for the ERA program, which was born about two years ago amid a sense of impending crisis at NARA.

'NARA's top management took stock of where we were and realized that we were really facing a major crisis in our future,' Thibodeau said. 'And that is, if we don't figure out ways to preserve electronic records over time, there simply will no longer be a national archives for the records that the government is creating now and will be creating in the future because those records are increasingly electronic.'

When the program was launched, there were no models for long-term preservation, Thibodeau said.

'Vendors are not producing for a 400-year market,' he said. 'When we started, we looked around to see if there was anything we could take as precedents for building a solution, and we just didn't find anything relevant.'

As a result, the ERA program's work makes it a leader in the search for records preservation technologies.

But Thibodeau emphasized that the program is a pure research effort, the ultimate goal being records preservation over time, not technology advancement per se.

'We want to drive the research to the breaking point before making any commitments,' Thibodeau said. 'What we want to do in the research environment is develop the knowledge that a solution is going to work or not going to work.'

ERA's research at the San Diego center is conducted through the National Science Foundation's National Partnership for Advanced Computational Infrastructure, to which NARA contributes.

Thibodeau finds his place at the leading edge of research in records preservation somewhat paradoxical, having been toiling in government records management fields for a quarter of a century.

'I'm an old fogey who feels he's at the beginning of his career because, after 25 years, it finally looks like we're going to be able to do some good stuff,' he told a group at the NARA Records Administration Conference in Washington in May.

Thibodeau never planned a career in records management. He started out as an academic.

In the early 1970s, after earning a doctorate in the history of science from the University of Pennsylvania, he began teaching at the University of Notre Dame in South Bend, Ind.

Extra credit

On the side, he pursued postdoctoral work in computer science, an area that increasingly captured his interest.

One day, he got a call from a professor at the University of Illinois who was overseeing Thibodeau's postdoctoral studies. 'He asked me if I would be interested in a job with the government working on electronic records,' Thibodeau said. 'My initial reaction was, there's no way I want to work for the government. But I was willing to listen, and the more I learned, the more I got interested.'

Thibodeau decided to take the job. So in 1975 he found himself in Washington working at what was then called the National Archives and Records Service.

Three years later, he left the archives to take a records management position at the National Institutes of Health, eventually becoming chief of records management.

At NIH, he delighted in records management's close association with information technology.

'When the PC started making inroads into NIH offices, we got linked up with initial support for PCs,' he said. 'That was very exciting. I wound up planning local area networks and developing NIH's first strategic information resources management plan.'

But in 1988, Thibodeau was lured back to NARA when the agency dangled the directorship of its new Center for Electronic Records in front of him.

'It was fun because I'm an entrepreneur,' he said. 'It was a building experience. In the 1990s, we developed applications that are still in use and still being enhanced today.'

At the end of 1998, it seemed only natural for Thibodeau to take the helm of the fledgling Electronic Records Archives program.

'Envisaging the future is really what the ERA program is all about,' he said.

What does he like best about his work at ERA?

'I thrive on challenges and this is probably the biggest challenge of my life,' he said.

inside gcn

  • machine learning

    Mitigating the risks of military AI

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group