HCFA tests data warehouse

HCFA tests data warehouse

Agency to convert from taped records to maintain 16T of Medicare claims

By Patricia Daukantas

GCN Staff

To replace a tape records system that takes weeks to answer queries, the Health Care Financing Administration plans to build a modern data warehouse for up to 16T of Medicare claims data.

HCFA officials were unsure, however, whether such a large electronic storehouse would work. They also wanted to find out whether it would return queries within a reasonable time and how hard it would be to maintain.

HCFA chose IBM Corp., whose DB2 Universal Database software the agency had selected during a previous search, to stress-test a 1.5T prototype of the massive data warehouse. Working at the agency's Baltimore site and an IBM facility in Poughkeepsie, N.Y., IBM tuned the prototype to answer queries within minutes or hours.

HCFA's planned data warehouse will hold the Medicare National Claims History File, said Betty Jackson, director of the agency's Enterprise Databases Group. The file stores all claims submitted by Medicare beneficiaries nationwide.

HCFA now stores eight years' worth of Medicare claims data on 'pretty antiquated technology using flat-file access,' Jackson said.The eight years of claims constitute 8 billion records, totaling about 16T, she said. Running a single Cobol query against the 800 tapes of records can take two weeks to a month.

Based on a study a few years ago, HCFA chose IBM's DB2 Universal Database, running on the IBM S/390 platform, for the data warehousing effort. Before starting the warehousing project, however, HCFA officials wanted additional feasibility tests, Jackson said. So IBM designed a two-phase, proof-of-concept test.

Ask me

First, the HCFA-IBM team designed 12 multilevel Structured Query Language queries based on 78 questions submitted by database users in HCFA's regional offices. The queries tested three years' worth of claims files from three states.

The first phase simply tested the logic of data extraction, data transformation and related functions, Jackson said.

'We didn't want to go [to IBM] and do this large proof of concept unless we were sure that we had everything correct first,' she said.

The second phase involved taking the test data to IBM's S/390 Teraplex Integration Center in Poughkeepsie, one of four centers the computer maker has built to test business intelligence systems before they go into production.

The National Claims History File will be one of the largest data warehouses ever built by IBM users, said Malcolm Nolan, an S/390 market development leader.

For the Teraplex Center test, HCFA selected three years' worth of Medicare claims data from 37 states, Jackson said. The data set totaled about 740 million claims, or about 1.5T.

When HCFA workers went to Poughkeepsie to load their data at the Teraplex Center, Jackson said, they wanted to see whether they could perform functional queries, operational updates and basic maintenance within the agency's time requirements.

The crucial question, Jackson said, was 'can we take the system down on the weekend and do updates in that short time frame, and then be up and running for business on Monday morning?'

Over a three-month span, IBM tested the system for scalability in terms of users, processors and data, Jackson said. The company's testers 'maxed the system,' she said, to test its input-output capability and performance with the agency's own management utilities from BMC Software Inc. of Houston.

The Teraplex Centers'one each for IBM's RS/6000, S/390, AS/400 and Netfinity platforms'give customers the chance to test software and systems with their own data instead of benchmarks. Working with IBM staff members on the prototyping also helps customers improve their skills, Nolan said.

Ready now

'We gave them real-time access to their data, which I don't think they had originally,' Nolan said. 'Their users wanted more rapid access to make more timely decisions.'

The IBM centers have firewalls for security, said Darren Swank, a technical team leader at the S/390 Teraplex Center. The company signs nondisclosure agreements with the data owner, and at the end of testing data is either destroyed or returned to the owner, he said.

Although the IBM engineers detailed the fine-tuning necessary to make the final data warehouse run more efficiently than the prototype, they found that the prototype would answer queries in a few minutes to a few hours, depending on complexity.

'That's astronomically faster than what we're living with today,' Jackson said.

The IBM team also made recommendations for the warehouse's disk storage, technical support and maintenance, she said.

With the test results in hand, HCFA officials hope to complete the first stage of the data warehouse by fall, including conversion of the first two years of claims-history data to the DB2 format, Jackson said. They want to have the final data warehouse running in about 18 months.

'We should be able to do simple queries against it, which is a lot easier than going through and reading sequential records on a tape,' Jackson said.

Year 2000 preparations caused a temporary lull in the development project, however.

To protect the privacy of individual Medicare claimants, the National Claims History File will have a midtier level of data marts holding summarized information, Jackson said. Only authorized users can drill down to individual claims.

'We're defining different layers of security, because our prime objective is to keep information private and secure,' she said.

Use of the Teraplex Center was free of charge because HCFA had hired IBM to work on the modeling and proof of concept, Jackson said.

She declined to specify the total cost of the collaboration.

'Right now, our enterprise cost of system development and maintenance is quite large, because we have redundancy of processing and data. We're hoping to eliminate the redundancy and reduce the cost' through the warehouse, Jackson said.

Stay Connected

Sign up for our newsletter.

I agree to this site's Privacy Policy.