Data use vs. privacy

'I think the fact that this is haphazard is a major concern.'

'Ari Schwartz, Center for Democracy and Technology

Henrik G. de Gyor

Growth of mining techniques raises concerns on protections

Federal officials are beginning to feel unexpected ripple effects from the data mining technology used in many government programs.

Agency officials and lawmakers say that existing privacy laws may not be up to the task of protecting personal privacy from powerful analytical tools that probe personal information.

The General Accounting Office last week issued the first of two reports on data mining commissioned by Sen. Daniel K. Akaka.

The Hawaii Democrat requested the report following last year's flap at the Defense Department over the Terrorism Information Awareness program.

The initial GAO report, Data Mining: Federal Efforts Cover a Wide Range of Uses, found that 52 of the 128 agencies the congressional audit agency surveyed were using or planning to use data mining tools in 199 projects.

Of those activities, 122 involved the use of personal information and 46 involved sharing personal information among agencies.

'The personal information used in these data mining activities includes credit reports, credit card transactions, student loan application data, bank account numbers and taxpayer identification numbers,' Akaka said.

'I am disturbed by the high number of data mining activities in the federal government involving personal information,' he said. 'The federal government collects and uses Americans' personal information and shares it with other agencies to an astonishing degree, raising serious privacy concerns.'

Akaka said it is unlikely that the public realizes the extent to which the government collects and uses personal information and shares it among agencies.

Privacy advocates from the American Civil Liberties Union, Center for Democracy and Technology, and Electronic Privacy Information Center expressed concern about the GAO findings in a letter to Akaka. The groups cited the fact that of the 54 federal government projects that mine private-sector data, 36 involve personal information, such as credit reports or credit card transactions.

According to GAO, 77 of the projects it found involved mining of data from other agencies and 46 of those used personal information.

Ari Schwartz, CDT associate director, said he was concerned by the rapid increase in data mining projects. 'I think the fact that this is haphazard is a major concern,' he said. 'CDT has concerns about how guidelines for data mining projects would be written, but you can't do it at all if they are increasing in this haphazard way.'

Akaka spokesman Paul Cardus said the second section of the GAO report is due next year and would include case studies and an analysis of privacy issues associated with data mining.

'When we have that we will have the ability to make more informed judgments,' Cardus said. 'Pending the release of the second part of the report there will be no legislation drafted.'

Mundane tasks

The GAO report described how data mining has spread across the federal government not only to carry out investigative tasks but to conduct more mundane operations such as improving safety and health regulations, and scientific research.

Jay Mattos, deputy director of the Program Evaluation and Information Resources office at the Labor Department's Mine Safety and Health Administration, said MSHA has been expanding its data mining operations over the past three years.

The agency used a Teradata system from NCR Corp. of Dayton, Ohio, to create its data warehouse. MSHA uses the data mining tool BI Query from Hummingbird Ltd. of Toronto to extract information from the warehouse.

'We take key information from [safety and health, as well as civil penalty] databases and refresh the data warehouse. For example, our enforcement staff can look at civil penalties, which were in two different databases, and also look at accident and injury information to get a better picture of the mining industry,' Mattos said.

MSHA users also analyze the Teradata warehouse's contents with more common tools, such as Microsoft Excel spreadsheets, to interpret data.

'We strip out the Privacy Act information from the data store,' Mattos said. The sensitive data also resides in other databases, and the officials responsible for them take steps to protect the Privacy Act data at its source, he said. 'We have Privacy Act assessments published for all that.'

GAO appears to have taken a rather broad approach to defining the term data mining when it compiled the report. For example, the audit agency cited the Education Department's National Student Loan Data System as using data mining technology to help identify ineligible federal student loan applicants.

But Pam Eliadis, acting director of the system, said her agency simply runs incoming student loan application data against its database of borrowers to find any red flags for loan ineligibility. All this is done in an IBM DB2 database.

'It is not a data mining tool as you would recognize it,' she said, contrasting the advanced analysis that data mining tools conduct with normal program operations.

'It is a program within NSLDS,' Eliadis said.

For example, running the incoming student loan application form data against the database can flag applicants who have defaulted on student loans, which would make them ineligible for future loans.

At NASA's Goddard Space Flight Center, officials use data mining tools to extract meaningful patterns from the terabytes of data that the center receives daily from the Hubble space telescope, officials said. None of that data involves personal information. But using data mining tools to analyze it speeds the research process dramatically, they said

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above