Connecting classified and unclassified big data

DIG IT AWARD FINALIST: OPEN DATA

Connecting classified and unclassified big data

Some of the data used to fight terrorism is classified, but much of it is not. That makes it difficult to cross-reference and share information while still enforcing the appropriate level of security.

Dig IT Award Finalists

The GCN Dig IT Awards celebrate discovery and innovation in government IT.

There are 36 finalists this year. Each will be profiled in the coming days, and the winners for each category will be announced at the Oct. 13 Dig IT Awards gala.

See the full list of 2016 Dig IT Award Finalists

To address that problem, the Department of Homeland Security created the DHS Data Framework, which consists of two Hadoop data lakes (or data management platforms) that can handle large volumes of information. It also uses attribute-based access controls so that designated users can see data while protecting privacy, civil rights and civil liberties.   

“There are a number of different problems that we’re looking to solve with the data framework,” said Paul Reynolds, director of the DHS Data Framework. “Many of them can’t be solved unless you bring the data into one location.”

Law enforcement officials who are investigating a terrorism suspect, for instance, need to look at classified and unclassified data. Until the data framework, there wasn’t an efficient way to do that, especially not in real time, Reynolds said.

The system takes the unclassified data and moves it up to the classified networks, “so the data itself is still unclassified, but it's sitting in a classified spot,” he said.

The classified and unclassified data sit in two separate Hadoop data lakes that use a cross-domain guard to share data in near-real time. When the framework is fully operational, DHS officials expect to have 20 to 25 databases in the lakes. Right now, four are fully operational and nine are being populated.

And they aren’t small databases. Reynolds said one of them has about 70 billion records in it.

The framework is currently only being used for counterterrorism purposes, but he said he expects that it will ultimately be used for additional mission areas.

About the Author

Matt Leonard is a reporter/producer at GCN.

Before joining GCN, Leonard worked as a local reporter for The Smithfield Times in southeastern Virginia. In his time there he wrote about town council meetings, local crime and what to do if a beaver dam floods your back yard. Over the last few years, he has spent time at The Commonwealth Times, The Denver Post and WTVR-CBS 6. He is a graduate of Virginia Commonwealth University, where he received the faculty award for print and online journalism.

Leonard can be contacted at mleonard@gcn.com or follow him on Twitter @Matt_Lnrd.

Click here for previous articles by Leonard.


inside gcn

  • data science (chombosan/Shutterstock.com)

    4 steps to excellence in data analysis

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group