Census' online tool carves out custom data

Census' Richard Denby estimates the data tool project cost about $1 million and took a year of part-time work.

Rrick Steele

Eight Asian farmers live in Kansas, according to the Census Bureau's Census 2000 Equal Employment Opportunity Data Tool.

The comprehensive decennial census collects plenty of raw data about the U.S. population. The trick is making it comprehensible to other agencies, companies and the public, most of whom want only some unique subset of the data.

The Census division that handles household economic statistics last year set up a Web site where users can slice and dice the raw data into statistics for their particular needs. The site went live in December 2003.

The Census 2000 Equal Employment Opportunity Data Tool generates reports, at www.census.gov/eeo2000/index.html, on the fly. It works quickly even though it draws from an enormous data pool on 294 million people. The query about Asian farmers in Kansas took about a minute.

SAS 9 business intelligence software from SAS Institute Inc. of Cary, N.C., creates ad hoc tables from the data.

'Users can point and click their way through the options, and the site generates the table,' said Richard Denby, the assistant division chief and project manager. 'It happens dynamically from the selections. The tool does a lookup and finds the right variables to display.'

Census data is the definitive source of U.S. employment statistics, Denby said. Companies and government agencies consult it to draw up equal-opportunity plans by comparing their own workforce characteristics with those of surrounding geographic areas.

The Justice and Labor departments and the Office of Personnel Management contracted with Census to generate the summary employment data and make it publicly accessible.

Denby estimated the project cost about $1 million and took a year of part-time work.

The first task was to compile the data set, known as the special equal-employment opportunity tabulation, from the responses of 40 million citizens who described their jobs.

Denby's team categorized that data in 24 summary SAS data sets about 509 kinds of jobs.

Unrefined data

A SAS data set is a file of raw values plus descriptive external data and formulas for parsing the data. The tables connect the occupations with variables such as age, gender, race, ethnicity, education, home location, industry and earnings.

In addition to the Web site, Census also provides data sets on CD-ROM.

Next, Denby's team had to create the tool's user interface. Tim Braan, a Census IT specialist, hand-coded the Web pages that users step through.

Braan incorporated a JavaScript error-proofing mechanism to keep people from constructing bad queries.

When a user chooses specific variables, the tool automatically forms a Structured Query Language query, SAS public-sector director Rich Bishop said.

A Common Gateway Interface program translates the SQL query into SAS macrovariables.
SAS 9 returns a table in SAS data step code, which is embedded into the Web page sent back to the user.

The Census division runs the SAS 9 Application Server software on a dual-processor, 1.9-GHz Hewlett-Packard ProLiant DL580 server with 6G of storage, under Enterprise Linux Advanced Server from Red Hat Inc. of Raleigh, N.C.

Census procured the SAS software as part of a $16.5 million, five-year unlimited use contract with SAS Institute.

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above