IRS musters IT to snare cheats
- By Florence Olsen
- Apr 27, 1998
After eight years of preparation, the IRS last week loaded 6 million tax records into a
data warehouse that will fundamentally change the way the service deals with errant
IRS compliance officials are counting on a Teradata warehouse server from NCR Corp. of
Dayton, Ohio, to help them through the agency's reorganization. The warehouse is a
component of the tax systems modernization plan.
"The system allows us to do a lot of things without the services of a programmer,
things we couldn't do with just the raw data sitting out there on our old systems,"
said Howard Oglesvy, director of the IRS Office of Research.
As the system went live, officials at the IRS computing center in De-troit completed
loading several terabytes of taxpayer data into a relational data model, which lies at the
heart of the Compliance Research Information System, or CRIS. It runs on an NCR WorldMark
"This is the fourth and final design that we've rolled out," Oglesvy said.
His office has spent more than three years preparing IRS research analysts and
statisticians to use CRIS.
The new system will be a more efficient tool than individual audits for getting
taxpayers to comply with the tax code, said Jim Alzheimer, chief of the CRIS support team.
CRIS will give IRS employees new insights into tax filer behavior, he said, which in
turn will lead to future educational campaigns designed for tax preparers, the real estate
industry, filers of earned-income credits and other groups.
The insights will let the agency take a wholesale approach to compliance, as opposed to
dealing one-on-one with individual tax filers, "which is much more costly and much
more intrusive," Alzheimer said.
If the IRS finds that a half-million people are making the same mistake, for example,
it can determine why and perhaps come up with a way to correct the problem without sending
out a half-million notices.
Other IRS divisions have had less success with data warehouse technology, he said,
because they cannot deal with the volume of data generated by 120 million tax filers.
"The sheer amount of data prevents some of the other IRS functions from
accomplishing [warehousing] in a production mode," Alzheimer said.
CRIS takes about 10 minutes to respond to standard queries, he said, because the
relational data model contains 2,600 data elements.
But that's 13 months faster than previous answers had been taking, Alzheimer said.
The warehouse's 300,000 records represent a weighted sample of the 120 million people
who file tax returns annually. The IRS keeps three years' worth of records in the
warehouse. The newest data is from 1995.
"We do an awful lot of conditioning up front to load data into our model,"
Alzheimer said. Conditioning the data requires a skilled IRS programmer who knows tax law
and ACL, the machine language of the oldest IRS mainframes. "That's not a skill most
people have," he said.
Analysts and statisticians who use the new system work in the agency's 33 district
field offices on Pentium PCs equipped with Andyne GQL, an automated Structured Query
Language generator from Andyne Computing Ltd., which was recently acquired by Hummingbird
Communications Ltd. of Markham, Ontario.
Their queries travel over an IRS network to NCR's Top End middleware applications
running on the WorldMark Teradata warehouse server in Detroit. The analysts then load the
data that comes back to them into a statistical application from SPSS Inc. of Chicago for
Alzheimer said the research system fulfills the recommendations of a six-year-old IRS
district office study and the recommendations of the agency's Compliance 2000 document.
"We've created a research organization in the field where there was none
before," he said.