NSA delves into next level of data analysis

The National Security Agency has launched an initiative to better trace and record the origins and accuracy of data the agency collects and its movement between databases, a discipline known as data provenance, which is becoming more important as the intelligence community attempts to fuse and analyzes troves of data from a variety of sources.

NSA has a pilot initiative that runs on top of a big data, standard cloud architecture that lets the agency track the entire life cycle of data, said Neal Ziring, technical director of NSA’s Information Assurance Directorate.

Big data technologies offer potential advantages for extracting knowledge and actionable information from mountains of raw data, but there are still challenges that government and industry must overcome to reap the benefits, Ziring said.

Related coverage:

Keys to big data in the brain, not the computer, former NSA exec says

How cloud can improve intell community's analyses 

The intelligence community needs big data technologies to make sense of the complex patterns and behaviors of adversaries who have become more and more sophisticated, he said.

Ziring spoke June 13 at the National Institute of Standards and Technology’s Big Data Workshop in Gaithersburg, Md., citing several challenges that hamper government agencies from getting the full benefit from big data analytics, including the fusing of data from multiple sources, handling data subject to different forms of constraint, supporting analytic multi-tenancy and enabling exploration and discovery.

Often intelligence analysts want to derive actionable knowledge from data but find that there are constraints or restrictions on the data. They get a data feed from a source that comes with strings attached: It might be top secret, privacy protected or subject to legal considerations.

Access control is the simplest piece to this issue wherein people have privileges to access certain information. This is fine if all a person is doing is searching and retrieving. However, with analysis of data there needs to be a way to express conditions in a computation-friendly way.

So an analyst can say, “I want to perform this type of computation; therefore, I want to use this field [of data] but I can’t use that other field,” Ziring said. The capability is there for access control but not for analysis, he said, adding that the intelligence community is working on standardizing the simplest aspects of this area. “This area will benefit from standardization," he said.

It is one thing to put complete classification markings onto data objects when you have 1,000 objects, but it is harder when you have 10 billion objects, he noted.

Data provenance, or data pedigree as some in the intelligence community call it, is a side challenge to dealing with data being subjected to constraints. Such technologies help organizations determine whether data has been compromised and provide metrics to help estimate the reliability of the data.

Can you say, “Yes, senator I do know where that data came from and I can account for its entire life cycle? Can we say that today about all of our data in our big data repositories?” Ziring asked. “In my world some times you get that kind of question.”

About the Author

Rutrell Yasin is is a freelance technology writer for GCN.

inside gcn

  • cybersecure new york city

    Cybersecurity for smart cities: Changing from reactionary to proactive

Reader Comments

Mon, Jun 25, 2012 Washington DC

It appears that some COTS tools today are already able to track Data Provenance. As long as our Govt stays with COTS, we the taxpayers will be better off then wasting valuable money, resources and time on dead end technologies. I would hope the NSA and others in Govt are looking at COTS Solutions....

Tue, Jun 19, 2012 Dave Washington DC

Next after provenence is a computable error term for the data in use. The error term expressed in second order terms allows the accuracy of the table being used to propagate through the analytic process to create a quality factor. Hence for and decission being based on a set or sets of data the quality of that decission may be judged and, conversely, if only sparse older data exists the effect of using that data may be judged....

Tue, Jun 19, 2012 Art Conroy Chantilly, VA

I am a former Marine pilot who has been designing mental models and training intelligence community clients to think and see differently. Seeing patterns in big data with greater speed and accuracy is a skill that can be taught with great effectiveness at low cost. Glad to see the idea is catching on...

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group