GPO dives into digital future
Begins testing automated management system
- By Joab Jackson
- May 04, 2007
When the file comes to GPO, it would be a fairly automated process to ensure the information was all there. ' Mike Wash, Chief Technical Officer
The Government Printing Office is ready to test the ambitious Future Digital System, a content management system designed to handle the many documents GPO publishes and posts for the rest of government.
In August 2006, GPO awarded Harris Corp. a four-year, $29 million contract to build out the initial capabilities of FDSys. The company will put the first elements into operation this month for internal testing.
'The first version of this is really for detailed testing,' said Mike Wash, GPO's chief technical officer.
The program's goal is to digitize nearly every federal document published since the birth of the country. People can then search, view and download documents via a Web portal.
Historically, federal agencies would submit publications, and GPO would print for both the public and libraries. Although agencies will continue to submit publications, GPO will now work to disseminate the information electronically and in print.
'Access to government information is widely expected to be electronic,' said GPO manager Kate Zwaard, who spoke at the Interoperability Week held recently by the National Institute of Standards and Technology. Part of the challenge is to ensure that the versions are correct and are posted almost immediately, especially timely documents such as the Congressional Record.
Adding to the challenge, GPO must keep the most important of the documents it publishes 'in perpetuity,' Zwaard said. As a result, GPO will keep both archived copies of all submitted publications, and copies that can be used as originals and from which additional copies can be made.New workflow
With FDSys, GPO has mapped out an entire life cycle for documents, from submission to archiving.
Participating agencies will submit documents in what GPO calls submission information packages. The document can be in any format, from Quark to Extensible Markup Language. In addition, the package can have supporting materials, such as images, as well as some metadata, such as how long GPO should keep the document available.
Since the SIP will be in a standardized format, Wash said he could see the eventual possibility of industry or other agencies creating SIP plug-ins for publication design tools, such as QuarkXPress or Adobe In-Design. Such plug-ins would automatically save a document in the SIP package format.
'So when the file comes to GPO, it would be a fairly automated process [for GPO] to ensure the information was all there,' Wash said.
To work with the metadata, FDSys will use a schema registry, which would allow the agency to add new schemas as they change and evolve. 'We know we can't decide what schemas we will be using in the future,' Zwaard said.
From SIP submissions, GPO will create archival information packages for long-term archiving. GPO will create a schedule to accompany the package, detailing how long the document should be archived and what procedures should be used to access it.
To escape obsolescence, the system was designed so it does not rely on specific hardware. Migration rules have been set up to keep the documents moving to current versions of software as well. GPO will depend on the Open Archival Information System standard, a framework approved by the International Standards Organization.
Only archivists will have access to the archival information packages. Another package derived from the SIP, called the access information package, will be used to make copies or post the document. Finally, each of the various formats in which the document will be rendered, such as Web pages, printed materials or PDFs, will be known as dissemination information packages.Internal test
With the pilot, 'we'll have the basic functionality where we'll be able to accept content into the system, manage it and create the packages that are implicit in the system. We'll also be able to do some testing of the access of the system to make sure it'll do everything we want it to,' Wash said. Live data will not be used for the pilot.
The FDSys pilot will be tested internally, and the first full release of the program is expected in December, Wash said.
That version will have the basic functionality. Two subsequent versions, Release 2 and Release 3, should roll out in 2008. Each of those releases will have new features, such as preservation, processing and enhanced access tools.
'So think of Release 1 as putting the basic building blocks in place, and releases 2 and 3 start using those blocks,' Wash said. And the system is expected to keep growing.
'From our planning perspective, we captured the need for three versions of FDSys to get to the full capability, but expectations and requirements will continue to evolve,' Wash said.