William Jackson | Cybereye: First things first: Does data mining even work?
- By William Jackson
- Feb 04, 2007
The specter of agencies trawling databases of personal information in search of clues to terrorist activity raises difficult questions about balancing privacy with security, but witnesses at a recent Senate Judiciary Committee hearing helpfully cut through a lot of the fog to simplify this issue.
The first question we need to ask about data mining is, does it even work?
'We don't even need to raise privacy questions until we see whether or not it's working,' said Leslie Harris, executive director of the Center for Democracy and Technology. No one has definitively answered that question, but Harris said 'the executive branch is bewitched with this technology.'
So bewitched that incoming Judiciary chairman Sen. Patrick Leahy (D-Vt.) devoted the committee's first hearing of the new Congress to the Privacy Implications of Government Data Mining. The senator said effective congressional oversight is long overdue and cited a 2004 Government Accountability Office report that identified 199 data-mining programs in 52 agencies.
Leahy used a loose definition of data mining, including any type of database search. Real data mining is more intelligent, using algorithms not only to search data from multiple sources, but also to analyze it for patterns and relationships.
The government has plenty of these programs planned. The Homeland Security Department's $40 million Analysis, Dissemination, Visualization, Insight and Semantic Enhancement is slated for review by the DHS inspector general before going into operation. The Treasury Department is working on a program to gather and mine data on as many as 500 million financial transactions involving wire transfers of $3,000 and more across U.S. borders. The Treasury program is three years behind schedule and has drawn complaints from the banking industry and privacy advocates that it is burdensome and intrusive.
Will these systems work? It depends on what you expect from them. Properly tuned, they could be useful investigative tools, providing avenues for follow-up in the wake of an event. But as a predictor of terrorist activity? Not likely, said Jim Harper, director of information policy studies at the Cato Institute.
'Data mining is not, and cannot be, a useful tool in the anti-terror arsenal,' Harper told the Judiciary Committee. 'The incidence of terrorism planning is too low for there to be statistically sound modeling of terrorist activity.'
In other words, the familiar disclaimer used by brokers pitching investments applies here: Past performance is not an indicator of future performance. To catch anyone by using data mining technology, the number of false positives would be too great to be acceptable.
Not everyone agrees, of course. Kim A. Taipale, executive director of the Center for Advanced Studies and Technology Policy, said the false-positive problem can be overcome, but only if development of the technology is allowed to continue unfettered. Restricting the technology is not the answer to privacy concerns'it's effective rules on the use of data, he said.
Even if data mining is shown to be an effective tool for preventing terrorism, some big hurdles remain for its appropriate use. These include deciding who owns the personal data being collected by government and how do we protect individuals from the mistakes that inevitably occur?
So far, the government has shown itself incapable of resolving problems, even in the relatively simple terrorist watch lists that repeatedly flag innocent travelers trying to get on commercial airplanes. Until it shows that it is capable of handling this kind of data responsibly, government should not be attempting anything more complex.
William Jackson is freelance writer and the author of the CyberEye blog.