Federal and state partners review Census numbers with OLAP and GIS toolsBY PATRICIA DAUKANTAS
| GCN STAFF
Before the release of state-by-state Census 2000 figures in March, demographers went through the numbers with the software equivalent of a fine-toothed comb.
Tools that weren't available at the time of the 1990 decennial count, such as online analytical processing and geographic information systems, helped them speed through quality assurance checks of the 2000 results.
For the first time, the Census Bureau also deputized demographers from 39 states and Puerto Rico to lend their expertise to the review through the Federal-State Cooperative Program for Population Estimates, said Mike Batutis, chief of the Population Division's account review staff.
A decade ago, the bureau used Unisys Corp. mainframes and Digital Equipment Corp. minicomputers for its quality check. But some workers reviewed the numbers the old-fashioned way'printing out tables to hand-check figures.
Today's data processing tools from such vendors as SAS Institute Inc. of Cary, N.C., have 'really opened up the gates' to OLAP, said Richard A. Denby, the agency's assistant division chief for household economic statistics.File watch
Federal and state officials started reviewing the files last October and finished five months later, Batutis said.
Because Denby's division had a great deal of SAS programming experience, it created a review application based on SAS statistical software. The Population Division developed its GIS using software from Environmental Systems Research Institute of Redlands, Calif.
With the OLAP tools, Denby's division built multidimensional database cubes for each state. The cubes let the analysts drill down to examine any data subset.
The count review involved the so-called edited files from the Decennial Division, not the raw data scanned from original Census 2000 forms [GCN, Feb. 7, 2000, Page 1
Demographers compared unedited and final edited responses to ensure that quality was high and that the Decennial Division's edits made sense, Denby said. For example, the bureau would not list as heads of households people who were listed as 5 years old.
Comparing unedited with edited records for each of 281 million U.S. residents doubled the amount of data to scrutinize at any one time, Denby said.
The SAS software let the demographers merge data sets and build multidimensional cubes easily, Denby said. The client-server application ran on a six-processor Sun Microsystems Enterprise 6500 server with 10G of memory and more than 500G of storage.
That much computational horsepower was necessary to deal with data sets as large as California's 23G or OLAP cubes the size of Texas', which had more than 5.7 million possible record combinations based on geographic census blocks.
Census demographers Richard Denby, left, and Mike Batutis rely on OLAP tools to comb through millions of 2000 Census Bureau records without printouts.
The application used SAS Version 6.12 with the SAS/ MDDB Server component, SAS/EIS rapid application development tools and some SAS/Frame and screen control language commands.
Under Batutis' direction, the state representatives reviewed basic counts for very small pieces of geography. A contractor, ArcBridge Consulting and Training Inc. of Herndon, Va., developed an application using ESRI's ArcView, which supplemented the SAS tools from Denby's division.
'We realized early on that it would be nice to give [state demographers] a graphical interface and a map so they could orient themselves within their states,' Batutis said.
The ArcView client-server application ran on three Dell Computer Corp. servers under Microsoft Windows NT.
'We wanted to maintain some redundancy so in case one system went down, we'd still be able to review with another system,' Batutis said.
'We were on a very tight timetable, given the legal deadlines for releasing data. We had some folks who had no GIS experience whatsoever and others who were pretty much power users. So we had to develop a dedicated application so that anybody could come in,' he said.
The Census Bureau offered all 50 state governments the chance to join in the count review. Participants had to come to Suitland, Md., and be sworn in as special bureau agents to work on the confidential data.Historical data account
Besides comparing edited and unedited records, Denby and his staff developed another multidimensional database with historical data. The so-called benchmark database merged 1990 and 2000 federal Census data with some independent population estimates.
The 'test of reasonableness' let analysts spot quickly the large or small geographic areas with substantial growth or decline, Denby said.
'I'm not sure we could have done that as well as we did without something like SAS,' he said.
The technology 'really enhanced our ability to drill down into the data,' Batutis said. 'We got a much more in-depth review using all these automated tools than we've ever had before.'
Now that the main results have been reviewed, Denby's division will start studying data from the Census 2000 long forms, which were sent to one out of every six households. The long form asked detailed questions about socioeconomic status and housing. Results will be made public early next year.
Bureau officials hope to use the more sophisticated OLAP tools in SAS Version 8 to review the long-form data, Denby said.