Categorizing is key for effective data mining

Categorizing is key for effective data mining

By Florence Olsen

GCN Staff

Contrary to popular opinion, data mining is no clean, antiseptic process, according to data mining consultant Philip Matkovsky.

As with old-fashioned statistical analysis, organizations have to categorize data minutely before they can mine it for patterns such as fraud.

'The federal government, interestingly enough, has not built up a large structured data set of known fraud cases,' said Matkovsky, manager of operations for analytical systems at Federal Data Corp. of Bethesda, Md. Lacking such resources, federal data miners must use unsupervised learning methods, such as disjoint cluster analysis, to train their favorite algorithms to sniff out fraud, he said.

If analysts had more examples of known fraud cases, Matkovsky said, they could fall back on supervised learning methods to train the algorithms to detect evidence of new frauds.

In selecting data mining tools, Matkovsky looks for a tool set that allows both supervised and unsupervised learning 'because you don't always have perfect information,' he said. It generally 'is going to be a mess.' With the cost of data mining tools running about $30,000 per seat, most agencies can afford only one toolset.

'Look for a toolset with multiple algorithms,' Matkovsky advised, because data mining problems can be unpredictable.

He said modern neural network algorithms are far more sensitive to subtle variations in the underlying data than were the earliest neural net programs. Differences in sensitivity are 'largely a matter of parameter setting,' he said.

Some neural network algorithms permit considerable user involvement in setting search parameters, 'which is a good thing,' he said, 'because you can fine-tune your algorithm and get rid of false alarms bit by bit.' Training a neural network is an abstract, creative activity that does not fit easily into a 9-to-5 schedule, Matkovsky said.

But for anyone who has enthusiasm for tools such as Intelligent Miner from IBM Corp. and SAS Enterprise Miner from SAS Institute Inc. of Cary, N.C., the rewards are great, Matkovsky said.

'You're using a machine to explain variants with no human intervention except for the initial input. That's a very cool thing,' he said.


  • 2020 Government Innovation Awards
    Government Innovation Awards -

    21 Public Sector Innovation award winners

    These projects at the federal, state and local levels show just how transformative government IT can be.

  • Federal 100 Awards
    cheering federal workers

    Nominations for the 2021 Fed 100 are now being accepted

    The deadline for submissions is Dec. 31.

Stay Connected