New software gets GPO over the XML hurdle
- By William Jackson
- Sep 19, 2012
The Government Printing Office is beginning the phased implementation of a new software system that will enable the direct formatting of XML documents for electronic and print publication.
“We’re replacing a 30-year-old composition engine called Microcomp used for the composition of the majority of our congressional documents,” said CTO Ric Davis.
The new system, the XML Professional Publisher (XPP) from the U.K. information and document management company SDL, will streamline and standardize the job of preparing for publication documents that are received in a variety of formats, and specifically ease the job of handling documents using the Extensible Markup Language, which is being adopted by government as the standard for managing digital information, said Jim Bender, GPO’s assistant production manager.
With Apple deal, GPO a step closer to 'where we are going'
Government's move to digital leaves the poor out in the cold, study says
XML documents now have to be converted to a proprietary Microcomp format for typesetting. “We will avoid that transformation step and publish directly from the XML," he said.
XPP is an industrial strength tool built to handle the variety, volume and rapid publishing schedule of the GPO, one of the world’s largest printers, said John Hoffman, principle project manager at SDL.
“It’s not something that’s built for a mom and pop shop,” Hoffman said. He praised the GPO staff for its ability to meet tight production schedules using the home-grown Microcomp, which relies on a collection of different codes for typesetting different types of documents and exists in a number of versions. “They are using an assembly line made for Edsels, and are still getting cars out,” he said. “When something works and you have a lot of deadlines you don’t want to stop to fix it. But the world changes.”
Those changes include the growing use of electronic publishing for many government documents, and the adoption of XML by Congress as a tool for producing official documents published by GPO. The transition to a new composition system while maintaining a demanding production schedule will be challenging, however.
“You’re bringing a new system into a functioning production environment,” Hoffman said. “The deadlines don’t stop.”
The first phase of the transition, which will begin this fall, will be an in-house proof-of-concept to test the new XML workflow, said Matt Landgraf, lead process planner at GPO. A core staff will be trained on the new system to produce copies of legislation from the House and Senate, while also doing their “day jobs” of publishing.
“We chose congressional bills to start with because they are being authored on the Hill in XML, so that’s a good first step,” Landgraf said. When that process is working smoothly bills will be published online using XPP and new products including the Congressional Record and the Federal Register will be transitioned to the new system. That probably will not happen until next year.
GPO has been adopting digital printing processes and electronic publishing over the past several decades, and today many of its most-accessed documents are available online through the Federal Digital System (FDsys). It also has agreements with a number of distributors to publish documents as electronic books in formats for popular e-readers.
The composition engine is the software that applies styles such as fonts, page parameters and links in the format for a document in electronic form. Davis said Microcomp has worked well, but has become dated.
Microcomp is a batch processing system that actually is a collection of 30 programs of more than 212,000 lines of code. Some 700 related applications and utilities, such as translation tools and delivery tools have been developed to sustain it as GPO's production and publishing requirements have evolved. In the face of having to support the outdated software, institutional knowledge of the programs is disappearing as GPO staff is reduced and workers with expertise are retiring.
Most copy comes to GPO in Microsoft Word format, but it also comes in as hard copy, Portable Document Format and ASCII text in a variety of forms, including XML tagging. The House and Senate have standardized on XML for producing documents, and efforts were made to modify Microcomp to process XML. It eventually was decided to just translate XML documents into other formats for Microcomp in the short run and to add XML to finished documents afterwards for publishing. The long-term plan was to completely replace Microcomp when possible.
Planning for the Composition System Replacement program began in 2006 and in 2010 GPO issued a request for information for a commercial system that would accept XML as an input and interface with FDsys.
XPP eventually was selected for the replacement. It will accept XML documents directly, apply formatting from a library of styles and produce a PDF document that can be used for online access and for printing. Hoffman said the throughput for the tool is high enough to accommodate a printer with hundreds of products being turned around on a daily schedule, and it includes APIs for Web access so it can be used remotely.
Eventually, as the new composition engine comes into use throughout GPO, all documents will be converted to XML before moving into XPP. Because documents now are being published in XML, end users of GPO systems should not see any differences.
“We will be able to automate a lot of things that are done manually now,” such as adding references to PDF documents, Bender said.
William Jackson is freelance writer and the author of the CyberEye blog.