IRS to upgrade file system gradually

IRS to upgrade file system gradually

Tax agency abandons plan for parallel systems in switching from master file

By Patricia Daukantas

GCN Staff

The IRS will transfer its 1960s-era taxpayer data files incrementally from tape into modern databases, instead of trying to keep both records systems synchronized as originally planned.

In about three years, taxpayer accounts will be deleted from the agency's master file as they are added into the Customer Account Data Engine, or CADE, which will use relational database technology.

The piece-by-piece approach supplants the IRS' original schedule for upgrading its master file system. A 1997 blueprint for IRS modernization called for the old and new systems to run parallel for as long as a decade, IRS chief information officer Paul J. Cosgrave said.

The IRS and its Prime modernization contractor, Computer Sciences Corp., have since decided the strategy is too cumbersome and risky for the core listing of all U.S. taxpayers' accounts.

Cosgrave described the master file system as 'a tape-based system that dates back to when John F. Kennedy was president.'

Because accounts stored on the tapes can't be randomly accessed, the huge tape system is updated weekly, said Donna Dickinson, an architecture and engineering consultant with CSC, which holds the $15 billion, 15-year Prime contract.

The tape system has such a limited capacity that multiple changes to a single account can end up overwriting some of the older data, Dickinson said. IRS programmers have struggled to increase the field sizes to keep up with personal wealth figures never envisioned in the 1960s.

Despite its antiquity, the flat-file system, based at the IRS processing center in Martinsburg, W.Va., is the 'main source of all taxpayer data,' Cosgrave said.

Because of the master file's limited capability, the IRS over three decades has added more than 180 satellite systems to extract data for other functions, said Atul Kapoor, a CSC project director. As a result, no single, comprehensive snapshot of each individual's tax account exists, Kapoor said.

Modern moves

'This is really at the heart of modernizing the IRS,' Cosgrave said of the master file. 'We need to replace this system if we're going to move the Internal Revenue Service forward.'

The 1.5T master file system consists of three flat files. The individual master file, or IMF, holds data about individual taxpayers, and the business master file, or BMF, handles business returns.

The third file system, called the nonmaster file, holds cases that don't fit into the other two categories, such as the growing number of innocent-spouse relief claims, Cosgrave said.

The IRS is restructuring itself into four business units to handle four categories of taxpayers: wage and investment taxpayers, which will cover most individuals; small businesses; midsize and large businesses; and nonprofit organizations.

This categorization of the taxpayer base lends itself to a segmented approach to transferring records from the old system to the new one, Cosgrave said.

He and IRS Commissioner Charles O. Rossotti decided to abandon the original plan for parallel systems after concluding that it just wouldn't work, Cosgrave said. The 1997 modernization blueprint called for running both systems 'for the length of time that it was going to take us to transition from one environment to the other,' or most of this decade.

'Running in parallel normally doesn't work for more than about a day, and lots of times it doesn't even do that very well,' Cosgrave said. 'So the notion that you would keep systems running in parallel for 10 years was an impossible task, in our minds.'

Under the original approach, several years' worth of taxpayer records would have resided simultaneously in both environments. 'We're getting away from that'a breakthrough, if you will,' Cosgrave said.

After CSC landed the Prime contract in December 1998, its first task was to accelerate the master-file replacement schedule while reducing the risk, Kapoor said. He worked with IRS senior technical adviser Tom Lucas on the strategy.

Growing and shrinking

'We would ideally like the authoritative representation of an account to be in one place,' Dickinson said.

As each tax account goes into CADE, it will be deleted from the master file, so that over time the old system will shrink while CADE grows.

IRS first will transfer the 'easiest taxpayers, meaning the ones that tend to use TeleFile or Form 1040EZ,' Cosgrave said.

Individuals' tax accounts will be selected for transfer to CADE based on past filing characteristics, Kapoor said.

The current schedule calls for a pilot release of CADE in 2002, Dickinson said. That would transfer about 6 million TeleFile accounts into CADE.

CADE's first production release, perhaps in 2003, would transfer about 30 million electronic filers, Dickinson said. The second production release would transition 40 million paper filers.

After that, the IRS would migrate 34 million people who report income through Schedules C, E or F. Under the reorganization plan, they will come under the agency's small-business division, rather than the wage and investment division.

The goal is to get all individual filers onto the new system within two or three years, Cosgrave said. Kapoor said CSC hopes to shut down the IMF system after five or six CADE releases.

IRS will then start to move the business filers, although Kapoor and Dickinson said the BMF replacement schedule isn't set.

CADE will use the Structured Query Language and could be built on either Oracle Corp. technology or IBM's DB2 Universal Database. 'We've not made a final decision there but we've narrowed it down to those two choices,' Cosgrave said.

inside gcn

  • data wrangler

    Data wrangling: How data goes from raw to refined

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above