Advanced OCR system will capture Census 2000
- By Frank Tiboni
- Dec 14, 1998
The Census Bureaus Data Capture System 2000 will let the agency process more
forms electronically than it did for the last decennial census, reducing data entry errors
and personnel costs.
Because the system uses an optical character recognition program running under
Microsoft Windows NT 4.0 to capture data from scanned documents, the Census Bureau will
need fewer part-time employees to key in data, bureau officials said.
Census in March 1997 awarded a six-year, $49 million contract to Lockheed Martin Corp.
to build DCS 2000. The system will process about 130 million forms in a 100-day period
starting in March 2000.
Lockheed Martin integrated off-the-shelf products for the high-speed, automated
check-in and electronic imaging of forms and optical recognition of respondents
marks and characters, said Richard E. Taylor, a Lockheed Martin federal systems
architect and DCS 2000s designer.
To create the system, Lockheed Martin enlisted Electronic Data Systems Corp., which
helped plan the equipment for Census four regional processing centers, and Eastman
Kodak Co., which designed the scanning software and hardware.
The system performs data capture in phases: scanning and image capture, assessment of
image quality, character and mark recognition, correcting minimal information, and
detecting and correcting errors, Taylor said.
The bureaus four centers will have clusters of scanning stations. Each cluster
will have three Kodak Digital Science Document Scanner 9500 systems. In total, 40
clusters are planned for the processing of the 2000 Census, he said.
One cluster will handle about 82,100 Census short-form questionnaires per day over two
eight-hour shifts. The bureau plans to vary the number of clusters in use at its
processing centers based on anticipated volumes, Taylor said.
Census has installed the system at its National Processing Center in Jeffersonville,
Ind. The bureau ran the system through a dress rehearsal in March, and currently is using
the system to capture economic census data on small and minority-owned businesses.
The bureau next year will roll out the system at its three other processing centers in
Baltimore, Phoenix and Pomona, Calif.
Processing the data starts after the delivery of Census forms mailed to the centers.
Sorters read the envelopes bar codes and count, sort, open and prepare the forms for
scanning. The bar codes, visible through the envelope windows, identify respondents
addresses. The address data is forwarded to Census headquarters for use by census takers
in follow-up interviews, Taylor said.
The scanners capture and digitize the data as ASCII text. The imaging recognition
engine deciphers handwritten check-box and text responses. It ranks the digitized data as
either a high-confidence or low-confidence result. The high-confidence data is ready for
further processing; low-confidence data is verified and, if need be, corrected by a data
DCS 2000 will alert systems administrators and engineers to potential problems with
data collection. The system also will collect performance and accuracy metrics so
supervisors can gauge the integrity of the raw data, Taylor said.
At the end of each day, the system forwards the extracted ASCII data to
number-crunching systems at Census headquarters for further processing.
DCS 2000 should make the 2000 Census the most accurate and efficient one
ever, Taylor said.