Data mining digs up the dirt
- By Florence Olsen
- Dec 15, 1997
Most organizations are only talking about it, but data mining is already working for
the Defense Department, which now has tools to spot billing and health services fraud.
Providers who try to double-bill the government health care system are getting caught
with evidence uncovered by DOD personnel who sift through terabytes of data.
Optenberg is proud that the data warehouse and mining tools he and his colleagues built
at Fort Sam Houston, Texas, stopped a thief who had defrauded the government of $1.7
million by double-billing the Civilian Health and Medical Program of the Uniformed
At the same time the provider was submitting bills to CHAMPUS, he was seeking payment
for the same services from the Medicare program.
Optenberg, chief of the Analysis Branch of the Army Center for Healthcare, Education
and Studies, or CHES, said the Army's surgeon general asked the center five years ago for
an archival database management system that would help financial officers manage CHAMPUS
CHAMPUS had been a separate line item in the DOD budget, so no single organization felt
accountable for it.
"Nobody really cared when CHAMPUS was a $500 million line item, but then it
started growing about 12 percent a year," Optenberg said. "Pretty soon it broke
$4 billion a year and got on the congressional radar screen."
Congress told the services that CHAMPUS would be folded into their operational budgets
as "a guns or bandages type of thing," Optenberg said.
His group took on the task of designing the algorithms, metafiles and indexing methods
that now make it hard for criminals to hide under a mountain of paper and electronic
Baseline SAS from SAS Institute Inc. of Cary, N.C., was Optenberg's starting point for
building the Unix system. "We used the deep code in SAS where the real power is for
developers," Optenberg said.
His own expertise in structuring episodic data proved essential.
One hospitalization can generate 25 or more CHAMPUS claims. Optenberg used his
knowledge of episode-builder techniques to convert those 25 claims into a single record so
administrators could see the total cost of that hospitalization. It took a large number of
SAS subroutines "with hundreds of do loops and array processes," he said.
Any database supports many-to-few data extraction. But a warehouse, Optenberg said, was
"the only logical way of managing our combined requirement for many-to-many and
With the CHES IT Infrastructure System's restructured data, administrators for the
first time could mine DOD health services records to learn the total cost of, say, a bone
marrow transplant or the average yearly cost of treating diabetes.
No one could ever do that with unmodified claims records, Optenberg said, because
"you'd be dealing with 1.2 million records in a sequential format, reading one at a
Data in the CHES IT Infrastructure System warehouse gets reindexed in memory each time
the number of records reaches a predefined threshold.
The client-server SAS database system runs on a Sun Microsystems Inc. Ultra Enterprise
6000 server that is "fully loaded with 5G of RAM," Optenberg said. Two front-end
Sun Ultra Enterprise 1000 servers act as session managers.
The unlimited storage available for CHES IT Infrastructure System users at the research
center comes from internal redundant disk arrays, online optical jukeboxes, offline
optical disks and workstation optical drives.
Users work primarily at Sun Ultra and Sparcstation workstations with 100Base-T Ethernet
connections to the Solaris 2.6 server.
The center compresses data 4-to-1 and delivers it under public-key encryption on two T1
lines to the Internet.
The 250 officials around the globe who analyze the health services data run a SAS query
application on their PCs under Microsoft Windows, Optenberg said.
The query application, the Medical Analysis Support System, uses automated data access
controls to manage access to the data warehouse and tools.
Optenberg's group is beta testing a prerelease version of another SAS product, SAS
IntrNet software for Windows NT, to see whether analysts could get at the data more easily
by using their Netscape Communications Corp. browsers.
Archival and current health services data for the warehouse comes from a DOD megacenter
in Aurora, Colo. Other sources send mainframe IDMS and VSAM, Unix tape backup, dBase III,
spreadsheet and other data on every possible media type from cartridges and digital audio
tape to optical disks and PKZip files on 1.44M floppies.
"I can't think of a data source that we haven't received, and for us that's not a
problem. It's fun," Optenberg said.
The civilian researcher started building the warehouse with 200 reels of archival
data--standard IBM compressed variable-length record streams "not designed to be
mucked with," he said.
His expert colleagues sometimes had to work down at the bit level to resurrect some of
the data, which dated from 1987. The mishmash of data sources took lots of scrubbing,
The warehouse at first held only claims data but has expanded to include eligibility,
clinical, commercial and tumor registry data from more than 100 different databases.
Since the word has gotten out about the system Optenberg and his colleagues have built
at Fort Sam Houston, their phones have been ringing.
The system will be integrated in another year or so with other DOD decision-support
systems into an even larger Corporate Executive Information System, Optenberg said.
For more information, contact the Army Center for Healthcare, Education and Studies at
210-221-9333, ext. 0278.