NSA looks to informatics to connect dots
- By Joab Jackson
- Jan 21, 2005
Current apps cannot filter data quickly, finely enough, agency's research chief says
NSA's Eric Haseltine says he envisions software that can distinguish among people, places, events and time in documents and then relate that data to comparable information in other documents.
Every bit counts'literally'at the National Security Agency.
That's why NSA needs new informatics software that can gather individual bits of information from many sources and assemble them to produce 'actionable knowledge,' said Eric Haseltine, NSA research director.
Today's commercial intelligence software just isn't powerful enough to do the job of sorting through the immense amount of data NSA collects, Haseltine said at a recent meeting of the Technology Council of Anne Arundel County, Md.
'In our business, actionable knowledge is someone we need to get before they get us,' he said.
Haseltine envisions software that can, as he says, connect the dots. It must distinguish among people, places, events and time in documents and then relate these data entities to those in other documents and produce a summary of the combined elements.
To get this new analysis software, NSA has awarded a one-year, $445,000 contract to the Chesapeake Innovation Center of Annapolis, Md., to seek out information assurance technologies and informatics software being developed by industry and academia.
NSA is experiencing a vast increase in the data it must monitor, Haseltine said. New communications mediums such as instant messaging, cellular telephones and Web pages flood the organization.
But when NSA invited a representative from a large Silicon Valley relational database-mining company to discuss ways of capturing, filtering and ingesting such data, the agency found the company had little to offer.
'We told him our problems and he said, 'That's way beyond anything we can do,' ' Haseltine recalled.
NSA is looking for software that can pinpoint trends buried within large data sets, using a minimum of data. Haseltine likened the process to how astronomers find planets in other solar systems that cannot be viewed through telescopes. They record individual photon activity, or 'tiny little signals' for extended periods.
'In other words, if you listen to extremely faint signals over a large period of time, [they will] tell you things,' he said. 'We call that 'turning volume into your friend.' '
Mehmet Dalkilic, an associate professor at Indiana University's School of Informatics, defines informatics as the study of how humans use IT.
'Scientists can no longer be experts in IT,' he said. Given the growing complexity of most IT products, professionals are finding it more difficult to make full use of the capabilities of new software. Conversely, most technology specialists do not fully understand their users' domains, so it becomes increasingly difficult for them to know which new capabilities to add to a program. Informatics bridges the two worlds, he said.Semantic thumbnails
Dalkilic himself is heading up a project to develop 'semantic thumbnails,' a method of automatically summarizing text documents by using controlled domain vocabularies to highlight important parts of documents.
Search engines can use semantic thumbnails to produce more accurate results.
Perhaps the key to working with large amounts of data is not relying solely on the software applications themselves, but placing them within a larger iterative process that can filter out the unwanted data.
This is the approach taken by Rand Corp. of Santa Monica, Calif. The nonprofit research organization has developed a new workflow schema that could help U.S. intelligence agencies better sift through massive amounts of data, said John Hollywood, a Rand researcher who helped craft the architecture.
'We specifically designed it for counterterrorism, but a lot of the overall approach would apply to a wide range of intelligence analysis,' Hollywood said.
The Atypical Signal Analysis and Processing architecture sorts data through multiple steps. Data about suspicious people, places, things and financial activities is first collected from a number of government external databases. This information is supplemented by field reports of unusual activity.
Software agents scan the information for relationships and form hypotheses about how the different bits are related. The system then prioritizes its findings and alerts human analysts to the most important outcomes.Doing it all
'You're filtering incoming information, flagging the most useful pieces, linking incoming data elements together and generating hypotheses of the information,' Hollywood said.