Census revs up 10,000 systems

Census revs up 10,000 systems

Bureau spins IT web for 2000 count

By Patricia Daukantas

GCN Staff

To count all 275 million Americans in roughly six months, the Census Bureau has amassed more than 10,000 computers, ranging from PCs to a supercomputer.

Besides the bureau headquarters in Suitland, Md., Census 2000 will depend on the work done by four data capture centers, 12 regional Census centers and 520 local offices.

The current decennial census will rely more on off-the-shelf technology and outsourcing than the last nationwide head count in 1990'and on many leased systems, too.







The Census Bureau will converte data to an electronic format for the 2000 decennial census."



For the first time, an image-capture system will help process poorly completed forms and save Census workers from having to do time-consuming rechecks against original paper forms.

And, when Census 2000 is complete, the bureau will merge the results into its online data dissemination system.

For the 1990 decennial count, the bureau set up and managed seven temporary data capture centers, said J. Gary Doyle, Census 2000's manager for systems integration. This time, the bureau hired contractors to build and run four data capture centers, or DCCs.

Ten years ago, the bureau captured census forms onto film, which was scanned into VAX 8500 and 8810 minicomputers from Digital Equipment Corp. Bureau workers had to type in any handwritten information, Doyle said.

No more illegibility

This year the four DCCs will use optical character and mark recognition systems to read in data and store facsimile images of forms.

No longer will Census workers dig out the original paper forms to deal with bad handwriting or poorly marked bubble choices, Doyle said. Instead they will look at images on a monitor to sort things out.

Each DCC has nine Docutronix 2000 envelope sorter-readers from Docutronix Inc. of Homestead, Fla., and 30 9500D scanners from Eastman Kodak Co.

The DCCs'in Essex, Md., Jeffersonville, Ind., Phoenix and Pomona, Calif.'each house two Dell PowerEdge 6100 and two PowerEdge 6300 servers, each with 2G of RAM and running Microsoft Windows NT. Each DCC also has about 500 Dell GX1 desktop PCs, more than 50 Dell Precision 410 workstations and about 40 Precision 610s.

For backup, each DCC has two TimberWolf 9714 automated tape libraries from Storage Technology Corp. of Louisville, Colo., along with two DLT 7000 streaming tape drives from Quantum Corp. of Milpitas, Calif., and two Sun Microsystems Ultra workstations.

Lockheed Martin Corp. did the systems development and integration for the four DCCs. TRW Inc. is responsible for construction and day-to-day management.

Besides being a DCC, the Jeffersonville site houses the bureau's permanent National Processing Center, whose Compaq AlphaServer 4100/1000 cluster running OpenVMS handles many bureau projects.

The address labels on the Census 2000 forms are bar coded to show who has responded. As data from returned questionnaires flows from the DCCs to Suitland headquarters, addresses will be checked against the Master Address File that Census has compiled over the past two years. Headquarters will compile lists of unresponsive addresses and forward them to 520 local collection offices, which will send temporary workers to the addresses.

During their visits, the workers will fill out the same forms that the residents were supposed to complete on their own. The DCCs will scan those forms, too.

Each of the 520 local offices, opened over the past year, has one Dell 2300 server, 15 Dell GX1 PCs with 17-inch monitors and five Hewlett-Packard 8000N printers. Each local server has 256M of RAM and either a 350-MHz Pentium II or a 500-MHz Pentium III processor.

Twelve regional Census centers supervise the local offices. Each center has two four-way, 600-MHz Compaq AlphaServer 4100 clusters running Unix. One handles payroll and personnel matters, the other operations.

The regional offices use Oracle Corp. databases, PowerBuilder database applications from Sybase Inc. and payroll software from PeopleSoft Inc. of Pleasanton, Calif.

A Novell NetWare 4.11 network connects the regional and local offices to headquarters. 'It's a very large Novell installation,' Doyle said.

Each regional office has a Dell PowerEdge 6100 server running NetWare for production processing, plus a PowerEdge 4200 failover server. Another PowerEdge 2300 distributes software to local offices.

Last year, Bureau and Novell representatives met to ensure that the company would support the bureau's NetWare installation for the duration of the census, Doyle said.

The bureau manages NetWare 4.11 with Novell ManageWise and ZENworks 2.0 products. TaskMaster 3.0 from Avanti Technology Inc. of Colorado Springs, Colo., replicates the servers every night and guards against unauthorized activity such as posting games on the bureau's servers, Doyle said.

For e-mail, the regional centers use Lotus Notes hosted on a Dell PowerEdge 6300 server with 1G of RAM. For backup, the centers have StorageTek 9740 tape libraries with 496 cartridges each, Doyle said.

Each regional center also has a pair of high-performance SGI Origin2000 shared-memory computers, each with eight 300-MHz processors. They generate the map files for temporary workers who follow up with nonfilers.

'That's a lot of printing and a lot of computing,' Doyle said. 'It's a map of the nation.'

TIGER maps

The regional offices send the maps on CD-ROM to the local offices for printing and distribution. The map data comes from Census' own Topologically Integrated Geographic Encoding and Referencing system. It resides on two Compaq VAX 8048 clusters running OpenVMS. Each cluster has five 625-MHz Alpha processors, and together the clusters provide more than 3T of disk storage.

After the DCCs send the data to headquarters, the Decennial Systems and Contract Management Office in Bowie, Md., processes it on a cluster of two Compaq AlphaServer GS60s, each with two 525-MHz processors, 4G of RAM and OpenVMS. The cluster has 2.4T of RAID storage, and a tape library provides up to 21T of backup.

To study and edit the census data, bureau analysts use an array of software tools through an agencywide license with SAS Institute Inc. of Cary, N.C.

Once complete, Census 2000 data will go into the Data Access and Dissemination System, the Census Bureau's tabulation and publication system. DADS data is available online at factfinder.census.gov.

'That's a big service that we're giving our customers,' Doyle said. Census 2000 results will hit the Web early next year, he said.

inside gcn

  • pollution (Shutterstock.com)

    Machine learning improves contamination monitoring

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above