To share is human
Anonymous data-mining technology protects privacy
- By William Jackson
- Dec 07, 2004
Sharing data among government agencies is easier said than done. Many groups want or need to protect the confidentiality of their information, meaning they'd like a secure way of letting other agencies know what data they have without showing them the actual data.
Systems Research and Development of Las Vegas, a developer of identity recognition software, has come up with a scheme to let organizations share and compare data without compromising privacy.
SRD's Anonymous Entity Resolution software, dubbed Anna, uses a hashing algorithm to create a unique identifier for each piece of personal data in a file. Once hashed, the data is unrecognizable except by software and can't be cracked. Hashed identifiers from different databases can then be compared for matches without revealing the identity of the underlying individual.
'In the last two years we have seen the need for sharing more information while protecting privacy,' said John Slitz, chief executive officer of SRD. 'This is a technique that allows us to look at large quantities of data and only evaluate that data that is common to both sets.'
The technology caught the eye of In-Q-Tel, the CIA's technology investment incubator, when it was in prototype stage. The group partially funded development of the Anna technology, In-Q-Tel CEO Gilman Louie said.
Louie said homeland security programs depend on the ability of agencies to share data and use private-sector information securely, while assuring the public that privacy is not being compromised.
'Unless there is a technical solution to enable the policies, we are not going to get there,' he said. 'In some places we're at a logjam.'
He cited the discontinued Total Information Awareness program and the Computer Aided Passenger Prescreening System II, both of which fell victim to privacy concerns.
'There is no perfect technology,' he said. 'But this could be a critical enabling technology, providing a degree of comfort.'Next-generation analysis
Anna builds on two previous SRD products: the Erik Identity Recognition Architecture, which standardizes names, cleans up personal data and puts it into a common format for comparison; and Non-Obvious Relationship Awareness (Nora), which looks for pieces of information that could link individuals, such as shared telephone numbers and addresses.
Anna adds anonymization to the mix and a tool to compare anonymous lists of hashed data.
A hashing algorithm creates a unique signature when it's run against any piece of digital data. The hashing process cannot be reversed to reveal the original data, but two identical pieces of data will produce the same hash signature. Identical names, Social Security numbers and other identifying data each produce the same hash if the holders of the data use the same hashing tools.
This would let airport screeners, for instance, compare anonymized passenger data with government no-fly lists without releasing identifying data from either list. When matches are found, the appropriate parties could be notified for any further action.
Slitz envisions an infrastructure in which a trusted third party, either a government agency or a private organization, would compare anonymized data from other parties. The third party would not have access to any identifiable data.
'They're not doing anything but providing computer space where the hashed values are compared,' he said.
The encryption technology of hashing algorithms probably is not a complete solution to maintaining anonymity, said Peter Swire, a professor of law at Ohio State University and former chief counselor for privacy in the Office of Management and Budget during the Clinton administration.
'Every encryption technique is subject to constant challenge. Something that works today is likely to be broken in the future,' Swire said. 'But having state-of-the-art security is much better than no security at all.'
He said the greatest obstacle to the adoption of Anna and similar technologies is political will. Many agencies might still prefer to use identifiable information for security applications.
Even if the technology works as hoped, 'there are still many policy hurdles to overcome,' Louie said.
How many and what types of data fields must be identical before identities can be matched will vary depending on the application.
Ultimately, however, surveillance policies should not be tailored for the convenience of agencies, Swire said. 'The ability of data mining to explode personal privacy is terrifying to many citizens.'