Data mining prospects for diamonds in the rough

The Army’s Scott
Optenberg says data mining has saved $1 for every 11 cents invested in the CHAMPUS medical
system.





Data, data everywhere. Your data warehouse is awash in data. How do you dig deep into
that data, discern patterns and trends, and apply what you learn to advancing your
agency’s mission?


One way to plumb and analyze vast amounts of data is data mining. Data mining uses a
variety of techniques to uncover patterns and relationships in large databases that the
user may not have known about. Ultimately, data mining serves a higher end, giving the
user a business advantage.


There is no silver bullet that does data mining, said Christopher Westphal, chief
executive officer of Visual Analytics Inc. in Bethesda, Md., and co-author with Teresa
Blaxton of Data Mining Solutions, published by John Wiley & Sons Inc. of New York.
Rather, data mining is a combination of technologies and techniques, he said.


There are several key differences between data mining and conventional technologies for
analyzing data such as online analytical processing. With OLAP tools, you must know
exactly what patterns and trends you’re looking for.


Data mining permits fishing expeditions in the depths of the data. The software can
ferret out trends and patterns in data—you don’t have to know what you’re
looking for. But the two tools can work together. “They’re not competing
technologies,” said Mark Brown, data mining program manager at SAS Institute Inc. of
Cary, N.C. “They’re very complementary. We often see people using data mining to
understand the key drivers of a certain type of activity. Then they use that to determine
how they want roll the data up in an OLAP-type environment.”


Data mining is not new. What is new is the increasing availability of sophisticated,
off-the-shelf data mining tools.


“Over the years, we developed data mining techniques that didn’t exist
before. Now the technology has caught up with us,” said Scott Optenberg, chief of the
Analysis Branch at the Army’s Center for Healthcare Education and Studies at Fort Sam
Houston, Texas.


Scott headed a team that built a 1.5-terabyte data warehouse containing medical billing
data for the Civilian Health and Medical Program of the Uniformed Services (CHAMPUS).


About 250 Army managed-care analysts around the world access the CHAMPUS data via the
Web, using data mining techniques to uncover instances of fraud, abuse and waste.


When Optenberg and his team started building the CHAMPUS warehouse just a few years
ago, there were no off-the-shelf tools that could use the data mining techniques required
to analyze such huge amounts of data. So they wrote the programs using C and source code
from SAS Institute.


The CHAMPUS system is migrating to shrink-wrapped software that can perform the same
data mining functions. One package the CHAMPUS team is using is SAS Institute’s
Enterprise Miner software, a collection of tools that share a common user interface.


The CHAMPUS warehouse’s track record illustrates just how stunningly data mining
can pay off. In one case, analysts discovered that a single provider had double-billed
CHAMPUS and Medicare to the tune of $1 million. Overall, data mining has saved the Army
$28 million in fraud and waste over the last three years.


Optenberg estimates that data mining saves $1 for every 11 cents invested in the
system. “That’s better than you can do in the stock market,” he said.


“This is one of the few software activities that has a large, measurable return on
investment,” Brown said. “You can clearly measure the impact of this type of
work for a government agency.”


Although data mining in the federal government is still limited, some agencies are
using it for specific operations. The IRS, for instance, is using data mining to find
non-compliant tax filers.


The Treasury Department’s Financial Crimes Enforcement Network is applying the
technique to money-laundering investigations.


Other agencies using data mining include the Federal Aviation Administration for
aircraft safety records, Customs Service for narcotics smuggling investigations and
Defense Department for counter-drug initiatives.


For government data mining to succeed, agencies’ databases must have good, clean
data.


“The quality of the data is a big issue,” Brown said. “The results of
data mining are only going to be as good as the data you have.”    

inside gcn

  • artificial intelligence (vs148/Shutterstock.com)

    Government leans into machine learning

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above