- By Patricia Daukantas
- Mar 26, 2002
Census turns to XML for forms prep
At the Census Bureau in Suitland, Md., Larry Blum (left) and Dennis Wagner use XML to streamline preparations for the 2002 economic census.
The Census Bureau's Generalized Instrument Design System is using Extensible Markup Language to speed the layout and assembly of economic census forms that will go to millions of businesses this December.
GIDS has a central metadata repository plus four software applications that use the metadata to automate forms layout.
Every five years, the economic census collects data on 6.5 million U.S. businesses, said Larry Blum, assistant division chief for collection activities in the bureau's Economic Planning and Coordination Division in Suitland, Md. Commerce Department analysts use the results to compute the gross domestic product and other measures of U.S. economic health.
About 1.5 million of the companies are small enough that the Census Bureau gets all the data it needs about them from IRS payroll and income tax files, Blum said.
Most of the rest do business in only one location, so they fill out a single set of economic census forms. The questions answered by these single-establishment businesses depend on their categories under the North American Industrial Classification System.Custom forms
The bureau makes up different forms for each NAICS category so that businesses needn't wade through a single gigantic form full of inapplicable questions, said Steven A. Schafer, chief technology officer of Fenestra Technologies Corp. of Germantown, Md. Fenestra developed GIDS for the bureau.
Large companies that do business in multiple locations fill out one set of census forms per location, Blum said. With paper forms running six to 20 pages each, some big corporations must answer hundreds or thousands of pages of questions.
The forms are 'sometimes delivered on pallets,' said Rick Rogers, Fenestra's chief executive officer.
Without GIDS, Census workers never would have been able to design electronic and paper forms for each of the 650 NAICS categories, Blum said.
Laying out each form with a graphics design package produces good-looking questionnaires, but it's time-consuming and tedious, said Dennis Wagner, special assistant in the Census Bureau's economic planning division and GIDS team leader.
So Fenestra created a forms designer application for layout, an autoformat application to assemble the pages into forms, a preview tool and a surveyor tool to display electronic versions of the forms and collect responses.
Although the 650 forms are tailored to different industries, many questions do apply to multiple forms, Wagner said. GIDS automates the layout of more than 90 percent of the pages. The XML metadata attached to each question in the repository controls the typography and placement of questions on each page.
'If the question's the same, you only have to design the content once for all the forms,' Blum said.
Past economic censuses used legal-size paper for many forms, but surveys found that respondents preferred standard letter-size paper.
'We're going to give them what they want, but I don't think they're going to be too happy about it,' Blum said. The smaller page size has substantially increased the number of pages in some forms.Precision design
The software had to create pages to exacting specifications'with tolerances as small as 0.001 inch'for compatibility with the revamped data capture system.
The last economic census, in 1997, used a Digital Equipment Corp. key-from-paper system to capture data, Blum said.
For this year's tally, workers will scan paper forms using some leftover equipment from Census 2000 [GCN, Feb. 7, 2000, Page 1
], then key the data from electronic images of the forms, Blum said. That system is now being built in the bureau's National Processing Center in Jeffersonville, Ind.
The key-from-image system will use special templates that block out portions of the questionnaire and show the data entry workers only the fields they need to read, Rogers said. The templates get the precise coordinates of the data fields from GIDS.
Bureau officials are also using GIDS layout and content metadata to design downloadable electronic versions of many of the economic census forms.
In 1997, Fenestra conducted an electronic-filing pilot for 20 of the 600 forms used in that year's economic census. Because data was checked as it was entered, the results were cleaner with higher data integrity than for manually keyed responses, Rogers said.
About 300,000 companies took the 1997 economic census electronically. Bureau officials hope to increase that figure to between 550,000 and 700,000 this year, Blum said.
Although some past projects used metadata, the economic metadata in an Oracle9i database management system is the bureau's first such central database, Wagner said.
'If you don't have reusable metadata, you've spent a lot of time doing something and don't realize its full potential,' Rogers said.