GPO moving beyond print and Web
- By Susan M. Menke
- Nov 02, 2004
Michael L. Wash, GPO's digital engineer
Named 1996 inventor of the year by the Intellectual Property Owners Association, Michael L. Wash came to the Government Printing Office in June as the holder of 18 U.S. patents.
His GPO assignment: Map out a digital breakthrough for one of the government's most paperbound agencies.
Wash's patented inventions in 26 years at Eastman Kodak Co. mostly involved digital photo-finishing equipment for the consumer market. He was instrumental in developing the information exchange system for Kodak's Advantix Advanced Photo System, which won him the 1996 award.
Before joining GPO, Wash was executive director of product management at Gerber Scientific Products Inc. of South Windsor, Conn. He also has held engineering and executive positions at Colorado Memory Systems Inc. of Loveland, Colo., and Combyte Inc. of Fort Collins, Colo.
Wash received a bachelor's degree in electrical engineering from Purdue University.
GCN chief technology editor Susan M. Menke interviewed Wash at GPO headquarters in Washington.GCN: How did you happen to get this job as a reinventor?
WASH: Most of my career was at Eastman Kodak Co. The most significant part was as a change manager for digital photo processing. Kodak needed to go from the traditional making of film and developing photos to a workflow for either film-based or digital images.
I was responsible for the architecture of that new workflow and for the business unit to support the services to the consumer. That background was what brought me to the attention of GPO and Bruce James, the public printer.
I approached this job by trying to learn as much as possible about what GPO has done over the years, what the challenges are and what the future state is supposed to deliver. I moved into a stage of trying to create the future system of operations'the document that will serve as a roadmap ... .
If Bruce James is comfortable with the roadmap, we'll start talking about it with the Federal Repository Library Council. Ultimately some version of it will be presented to Congress, because funding is required.GCN: How much will it cost?
WASH: I can't estimate it at this time. It won't be a system that's turned on at one point in time, it will be evolutionary. There are business functions going on today that will continue to need to be done.
Think of it almost as dovetailing. The foundation of the new content management system will be put in place, and the new content will come in over the course of the next three years. There'll be a fairly significant event when the majority of it is in place by about October 2007'about the same time that GPO will be moving to a new headquarters.
The cornerstone of the plan is world-class information content management for public dissemination. We are expected to preserve the federal publications that are available to citizens, when needed and at a level of authenticity that can be trusted.
That requires a strong focus on preserving content so that it can be used in the way originally intended'in print. Or, 20 years from now, the same content might be used in ways that require digital display or other means that haven't been invented yet.
The key functional requirement is to be able to do those things as technology changes. We have to create a system that can accommodate that. Flexibility and extensibility are key aspects.GCN: Do you plan to use XML?
WASH: Tagged Extensible Markup Language is likely to be one of our approaches. But it's a formatting solution of today. Twenty years from now, there will be something else. What we have to do is make sure the system we invent will not have to be re-engineered in 20 years. Formatting technology and packaging technology are going to change.GCN: What's the foundation'a relational database system?
WASH: There are different schools of thought on how to store the content. What are the requirements for accepting content? What are the requirements for dissemination? Take that as the guiding light. Then do a few technology selections based on what needs to be done.
As for managing the storage, there's a centralized storage model and a distributed model. We haven't made a decision yet about that. My personal opinion is that we will have a mix of those. For no downtime or to recover from a disaster, the distributed models are really good. But for access and rapid availability of information, you need a centralized mode.
We have three basic types of content. First is the legacy collection'all tangible, and stored in the depository library program today in some shape or form. That information needs to be digitized and put into a content package that conforms to the expectations of the content management system'reformatted as a compliant digital package.
Second is information somewhere out there in digital form, but it isn't part of our collection. It needs to be found. We call that 'harvested content.' There's lots of technology out there, and we need to select technology to find that content, determine whether it's in the scope of the federal publications for which we're responsible, and put it into a compliant package for our content management system.
Third is what we call 'born digital,' today and in the future. Today we have many ways of creating a digital document. In the future, there should be more of an interface specification, or an ingest requirement to accept that information so that it will comply with the content management system.
All these types will feed into a large storage mechanism. The storage could be centralized, distributed or hybrid. Each package contains information that tells you how to read the bits. An example from today would be ASCII text: As soon as you know the bits are ASCII, you know how to read the content.
The next element that must be applied to the package is preservation information. What are the rules associated with its long-term preservation?
Those content packages will have to be stored under a content management system that can continue to make them accessible.GCN: How far along are you in designing this?
WASH: We're going through a classic phases-and-gates design. The phases each have their own separate activities, and to say you've completed a phase, you have to pass through a gate.
The gate we're heading toward right now is completion of the concept of operations document. In this phase, we have eight or 10 dedicated representatives from the core functional areas of GPO: the information dissemination group, the customer service group that interfaces with agencies, the CIO's organization, the plant, the general counsel, finance and so on.
They are building the concept for how the agency customer and end user will relate to this system. As we go into the next phase, the team will change to technical and engineering folks specifying storage, access times and so on.
We are the ones closest to our customer needs, so the initial phase has been in-house. We recently brought in a contractor from Integrated Computer Engineering [Inc. of Arlington, Va., part of American Systems Corp. of Chantilly, Va.], which has been working on the National Archives and Records Administration's concept of operations for the Electronic Records Archive. ICE is making sure our concept of operations is compliant with the IEEE [enterprise content management] standard and as robust as it can be.
The next phase will require working very closely in collaboration with the IT organization here at GPO, because it will be the ultimate owner of the system we design.
We know that we will continue to ingest information as text for print, but that could change to add video and audio streams. An audio stream, for example, would have to be voice-recognized and composed. The content going out might be information synchronized between hard copy and a video feed that accompanies it, to give the end user a full, rich experience of what that publication is.
The content management system has to be capable of supporting all that. A content provider today is a printer or a Web site. A content provider of the future could be a satellite feed. It could be another channel on your satellite network.
So we have to go from print to Web to some sort of other rich content in the future. It's all a matter of making sure you don't box yourself in. That's hard to do, but there's great systems engineering knowledge that tells you what not to do.