NSA's intell gathering puts the spotlight on metadata
- By Rutrell Yasin
- Jun 26, 2013
The revelation that the National Security Agency has been collecting the phone records of millions of Verizon customers has sparked concern that the spy agency could glean more information about a person from the data than NSA officials are willing to admit.
But aside from all the debate that has followed, the news, first reported June 5 by the Guardian, might have put a smile on the faces of some data mavens, because a better understanding of metadata came out of the ordeal, according to some data scientists.
NSA officials defended the program by saying the agency does not listen in on phone calls but instead collects the metadata of those calls — such phone numbers and the time and length of each call — as part of its attempts to foil terrorist attacks.
Metadata is often described as data about data, an ambiguous definition. More specifically, it is descriptive information about a particular data set, object or resource, including how it is formatted, and when and by whom it was collected.
“One of the things about the recent news I appreciated is the defining of the word metadata,” said Marion Royal, program director for Data.gov, the government website that gives the public access to machine-readable datasets generated by federal agencies.
“We struggled with [how to define metadata] when we started Data.gov,” Royal said recently at an FCW Executive Briefing in Washington, D.C. The fact that metadata is described as the information about the phone numbers and not the content of the actual conversation that takes place over the phone line, makes it simple for people to understand the difference between metadata and data, Royal said.
“I had the same reaction: ‘Now we will understand metadata,’” said Suzi Iacono, deputy assistant director of Computer and Information Science and Engineering at the National Science Foundation, who also spoke on the panel. “There is this clear understanding it is not the content of the phone call that is being analyzed but the information about the location of the phone call,” Iacono said.
And therein lies the problem for some critics of the NSA’s collection of troves of phone records. Research shows that the NSA could obtain very specific information about individuals just from phone call logs, according to a recent article in the MIT Technology Review.
The article points to a study published in March that analyzed 15 months of anonymous call records from 1.5 million people. Using records provided by a European wireless carrier, the researchers were "able to uniquely pinpoint the movements of 95 percent of people from only four records, using only the location of a nearby cellular station and the time each call was made," the MIT Technology Review article states. Connecting those movements with a person’s real identity would be easy by simply cross-referencing the records with other data sources.
As a result, some researchers think that the NSA is downplaying the significance of metadata or think NSA’s use of the word is misleading.
“I felt that NSA’s use of the term metadata was misleading the public into thinking it was less than it really is,” said Brand Niemann, a data scientist and former data architect for the Environmental Protection Agency. What the NSA calls metadata is not like original metadata, which is analogous to the library catalog numbers used to classify books and their locations. The agency actually has most of the data about a phone call, which agents can use to see if they need to listen to actual conversations, Niemann explained.
“Knowing the call exists, when it was placed and for how long, provides more than library catalogs and supports lots of data analyses to gain more data and insights into us than we can get from library catalogs,” Niemann said.
The NSA’s activities aside, collecting and analyzing metadata can be valuable for conducting research in areas other than intelligence or counter-terrorism, the experts noted.
New metadata tools can be applied to unclassified research on energy, weather or public safety to extract valuable knowledge, helping researchers to pinpoint that needle in the haystack of a dataset, NSF’s Iacono said. Using these tools, researchers would not have to federate a database or archives, she said. A federated database is a relational database that has data from multiple sources appearing as though it is a single, large database, which then can be accessed by traditional SQL database queries.
Using these new metadata tools, “you can figure out from the metadata where you need to go to extract the knowledge you need. I think that is a helpful way for us to think about moving forward,” Iacono said.