Census tool delivers results
Census tool delivers results
Bureau's behind-the-scenes data warehouse keeps Census 2000 on track
The Census 2000 Cost and Progress System has a simplified Windows 95 interface from which managers can drill down into a data warehouse that pools data from other bureau systems daily.
By Patricia Daukantas
Before the Census Bureau can start counting an estimated 273 million Americans on April 1, it must meet many internal deadlines for hiring workers, connecting computers, printing forms, tracking budgets and keeping Congress informed along the way.
To hew to the strict schedule, managers use the Census 2000 Cost and Progress System, a data warehouse that will eventually hold more than a terabyte. The system draws information from many bureau computers and repackages it into easy-to-use internal reports, project manager Gail S. Davidson said.
Of the 200 users, 80 percent work at Census headquarters in Suitland, Md., Davidson said. Most are managers of general bureau operations or of specific Census 2000 programs, and many use the data to create reports for Congress and local governments.
Beyond the general goal of providing timely, useful information on census operations as they happen, Davidson and her development team set out to make all the predefined reports display on the system clients in less than five seconds apiece.
Davidson's six-member programming group developed the system using software from SAS Institute Inc. of Cary, N.C., whose products the bureau has licensed for many years.
'Our system had to be real easy to navigate and find data in, and SAS is good about that,' Davidson said. From a Microsoft Windows 95 interface, users can call up a report or graph and drill down to underlying values.T'te-'-t'te
The Cost and Progress System grew out of a series of meetings Davidson's group held with users in late 1996. 'For us to design a warehouse, we had to understand the [decennial] census operations,' she said. The programmers demonstrated two potential warehouse designs and quizzed the users about ease and performance issues.
The SAS data warehouse at the heart of the Cost and Progress System resides on an SGI Challenge L server in the bureau's Bowie, Md., computer center. It pulls payroll, budget and schedule data from a heterogeneous mix of feeder systems, including the Operational Control System, which tabulates work assignments at 12 regional and 520 local offices.
The data warehouse queries most of the feeder systems daily, usually in the early morning to avoid interfering with other work, Davidson said.
'We're just a reporting system, and we have to be aware that the feeder systems are doing the real census work,' she said.
In addition, files from Primavera Project Planner, a scheduling package from Primavera Systems Inc. of Bala Cynwyd, Pa., are stored in an embedded Btrieve database and pulled into the Cost and Progress System when requested, Davidson said.
In the client application, the Census programmers enhanced some of the default behaviors of SAS multidimensional data structures and hid others to make the read-only reports more useful to managers.
Users access the Cost and Progress System over regular TCP/IP connections on the bureau's network. The system has neither dial-in Internet access nor public access, Davidson said.
Cynthia Eurich, a program manager in the Decennial Management Division, uses the Cost and Progress System to oversee part of the Master Address File project, which lists home addresses for the whole nation for distributing next April's questionnaires.
Because Eurich has had prior experience with SAS, she can do her own ad hoc analyses in the Cost and Progress System. She said most managers view the data in the canned reports.
Eurich praised the system for alerting her to discrepancies between budget and payment data, so that she can resolve discrepancies even before the standard federal financial management reports arrive. 'That could be a week and a half into the next month,' she said. 'It's too late to react then. Millions of dollars are spent and gone.'
Davidson expects that by the end of the 2000 Census, the Cost and Progress System will hold roughly a terabyte. At present about 400G of online disk space is available, plus a TimberWolf 9714 robotic tape library from Storage Technology Corp. of Louisville, Colo. The tape system's response time is somewhat slower than disk access but sufficient, Davidson said.
One key use for the terabyte of data will be planning for the next nationwide census, in 2010. 'Our goal is not just to manage this census but to manage the next one,' Davidson said.
All the MIS reports from the 1990 Census were preserved on paper, not in databases, which made the scheduling of Census 2000 very time-consuming, Davidson said. Fortunately, year 2000 issues have not been a big concern.''
'I don't think any of our feeder systems use two-digit dates,' Davidson said.'''