Library develops tools to bag and tag data for large file transfers
One of the tools developed to assist the Library of Congress in its National Digital Information Infrastructure and Preservation Program is the BagIt specification, a file package format that allows organizations to bag and tag data for transferring large files.
The goal was simplicity and ease of use, said Martha Anderson, NDIIP program director.
“A ‘bag’ has just enough structure to safely enclose a brief descriptive ‘tag’ and a payload but does not require any knowledge of the payload's internal semantics,” according to the specification.
It was developed out of an archive ingest and handling project with NDIIP and a number of universities that simulated the transfer of 50G files across a network.
“What we learned is that we needed a kind of common container to move things around,” Anderson said. They experimented with a number of existing formats and tools, such as Zip files, but they lacked a manifest element that describes the content of the envelope, were too complicated and could not handle large enough files.
“This specification can handle arbitrary sizes of data,” she said.
BagIt is not an official standard, but “we make it the standard for any content sent to the library,” she said. “It’s proving itself to be quite useful. We are promoting it widely.”