Data mining prospects for diamonds in the rough
- By Richard W. Walker
- Feb 22, 1999
The Armys Scott
Optenberg says data mining has saved $1 for every 11 cents invested in the CHAMPUS medical
Data, data everywhere. Your data warehouse is awash in data. How do you dig deep into
that data, discern patterns and trends, and apply what you learn to advancing your
One way to plumb and analyze vast amounts of data is data mining. Data mining uses a
variety of techniques to uncover patterns and relationships in large databases that the
user may not have known about. Ultimately, data mining serves a higher end, giving the
user a business advantage.
There is no silver bullet that does data mining, said Christopher Westphal, chief
executive officer of Visual Analytics Inc. in Bethesda, Md., and co-author with Teresa
Blaxton of Data Mining Solutions, published by John Wiley & Sons Inc. of New York.
Rather, data mining is a combination of technologies and techniques, he said.
There are several key differences between data mining and conventional technologies for
analyzing data such as online analytical processing. With OLAP tools, you must know
exactly what patterns and trends youre looking for.
Data mining permits fishing expeditions in the depths of the data. The software can
ferret out trends and patterns in datayou dont have to know what youre
looking for. But the two tools can work together. Theyre not competing
technologies, said Mark Brown, data mining program manager at SAS Institute Inc. of
Cary, N.C. Theyre very complementary. We often see people using data mining to
understand the key drivers of a certain type of activity. Then they use that to determine
how they want roll the data up in an OLAP-type environment.
Data mining is not new. What is new is the increasing availability of sophisticated,
off-the-shelf data mining tools.
Over the years, we developed data mining techniques that didnt exist
before. Now the technology has caught up with us, said Scott Optenberg, chief of the
Analysis Branch at the Armys Center for Healthcare Education and Studies at Fort Sam
Scott headed a team that built a 1.5-terabyte data warehouse containing medical billing
data for the Civilian Health and Medical Program of the Uniformed Services (CHAMPUS).
About 250 Army managed-care analysts around the world access the CHAMPUS data via the
Web, using data mining techniques to uncover instances of fraud, abuse and waste.
When Optenberg and his team started building the CHAMPUS warehouse just a few years
ago, there were no off-the-shelf tools that could use the data mining techniques required
to analyze such huge amounts of data. So they wrote the programs using C and source code
from SAS Institute.
The CHAMPUS system is migrating to shrink-wrapped software that can perform the same
data mining functions. One package the CHAMPUS team is using is SAS Institutes
Enterprise Miner software, a collection of tools that share a common user interface.
The CHAMPUS warehouses track record illustrates just how stunningly data mining
can pay off. In one case, analysts discovered that a single provider had double-billed
CHAMPUS and Medicare to the tune of $1 million. Overall, data mining has saved the Army
$28 million in fraud and waste over the last three years.
Optenberg estimates that data mining saves $1 for every 11 cents invested in the
system. Thats better than you can do in the stock market, he said.
This is one of the few software activities that has a large, measurable return on
investment, Brown said. You can clearly measure the impact of this type of
work for a government agency.
Although data mining in the federal government is still limited, some agencies are
using it for specific operations. The IRS, for instance, is using data mining to find
non-compliant tax filers.
The Treasury Departments Financial Crimes Enforcement Network is applying the
technique to money-laundering investigations.
Other agencies using data mining include the Federal Aviation Administration for
aircraft safety records, Customs Service for narcotics smuggling investigations and
Defense Department for counter-drug initiatives.
For government data mining to succeed, agencies databases must have good, clean
The quality of the data is a big issue, Brown said. The results of
data mining are only going to be as good as the data you have.