DIG IT AWARD FINALIST: OPEN DATA
Connecting classified and unclassified big data
- By Matt Leonard
- Oct 12, 2016
Some of the data used to fight terrorism is classified, but much of it is not. That makes it difficult to cross-reference and share information while still enforcing the appropriate level of security.
To address that problem, the Department of Homeland Security created the DHS Data Framework, which consists of two Hadoop data lakes (or data management platforms) that can handle large volumes of information. It also uses attribute-based access controls so that designated users can see data while protecting privacy, civil rights and civil liberties.
“There are a number of different problems that we’re looking to solve with the data framework,” said Paul Reynolds, director of the DHS Data Framework. “Many of them can’t be solved unless you bring the data into one location.”
Law enforcement officials who are investigating a terrorism suspect, for instance, need to look at classified and unclassified data. Until the data framework, there wasn’t an efficient way to do that, especially not in real time, Reynolds said.
The system takes the unclassified data and moves it up to the classified networks, “so the data itself is still unclassified, but it's sitting in a classified spot,” he said.
The classified and unclassified data sit in two separate Hadoop data lakes that use a cross-domain guard to share data in near-real time. When the framework is fully operational, DHS officials expect to have 20 to 25 databases in the lakes. Right now, four are fully operational and nine are being populated.
And they aren’t small databases. Reynolds said one of them has about 70 billion records in it.
The framework is currently only being used for counterterrorism purposes, but he said he expects that it will ultimately be used for additional mission areas.
Matt Leonard is a former reporter for GCN.