Categorizing is key for effective data mining

Categorizing is key for effective data mining

By Florence Olsen

GCN Staff

Contrary to popular opinion, data mining is no clean, antiseptic process, according to data mining consultant Philip Matkovsky.

As with old-fashioned statistical analysis, organizations have to categorize data minutely before they can mine it for patterns such as fraud.

'The federal government, interestingly enough, has not built up a large structured data set of known fraud cases,' said Matkovsky, manager of operations for analytical systems at Federal Data Corp. of Bethesda, Md. Lacking such resources, federal data miners must use unsupervised learning methods, such as disjoint cluster analysis, to train their favorite algorithms to sniff out fraud, he said.

If analysts had more examples of known fraud cases, Matkovsky said, they could fall back on supervised learning methods to train the algorithms to detect evidence of new frauds.

In selecting data mining tools, Matkovsky looks for a tool set that allows both supervised and unsupervised learning 'because you don't always have perfect information,' he said. It generally 'is going to be a mess.' With the cost of data mining tools running about $30,000 per seat, most agencies can afford only one toolset.

'Look for a toolset with multiple algorithms,' Matkovsky advised, because data mining problems can be unpredictable.

He said modern neural network algorithms are far more sensitive to subtle variations in the underlying data than were the earliest neural net programs. Differences in sensitivity are 'largely a matter of parameter setting,' he said.

Some neural network algorithms permit considerable user involvement in setting search parameters, 'which is a good thing,' he said, 'because you can fine-tune your algorithm and get rid of false alarms bit by bit.' Training a neural network is an abstract, creative activity that does not fit easily into a 9-to-5 schedule, Matkovsky said.

But for anyone who has enthusiasm for tools such as Intelligent Miner from IBM Corp. and SAS Enterprise Miner from SAS Institute Inc. of Cary, N.C., the rewards are great, Matkovsky said.

'You're using a machine to explain variants with no human intervention except for the initial input. That's a very cool thing,' he said.

inside gcn

  • IoT security

    A 'seal of approval' for IoT security?

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group