Building the infrastructure for big data
- By Amanda Ziadeh
- Oct 07, 2015
NASA knows big data. The agency's Center for Climate Simulations, for example, produces petabytes of material in support of NASA's Earth science modeling communities. But when it comes to making all that data available and usable, however, there's still ample room for improvement.
Daniel Duffy, the center's high performance computing lead, explained at an Oct. 6 big data event that, more often than not, scientists just download the data in 5-10 terabyte chunks onto their desktop. "They run their analytics, which is a custom Python script they wrote or something like that," he said. "They'll then delete that 5-10 terabytes of data, and download the next 5-10 terabytes and repeat until they're done."
The whole process, he said, "typically takes weeks if not months of time" -- and must be repeated in its entirety if analytics need to be re-run.
“We have to do a better job of providing data access," Duffy said. "Not just to the experts but for a large amount of external use.” The plan, he said, is to improve both storage and server-side analytics capabilities, allowing all that data to be more efficiently used, analyzed and stored.
“NASA’s product is data,” Duffy stressed. “If at some point the public can’t just download and consume all the data, there’s going to be questions to whether or not NASA can be more efficient.”
Gary Good, a strategist in the Army’s Office of Business Transformation, and Ryan Swann, the General Services Administration's director of data analytics for the Office of Governmentwide Policy, also spoke at the event, which was presented by GCN and its sister publication, FCW.
They echoed Duffy's concerns about putting proper infrastructure in place, but also stressed the cultural challenges that agencies face in bringing big data to bear on critical agency missions.
“Big data is really about leveraging the data we collect," Swann said, "but also integrating the data to actually answer true business questions.”
And the concerns voiced by the panelists are common in federal agencies attempting to launch big data initiative, according to a new poll on the subject. The survey, which was underwritten by Unisys and released at the event, found that -- in addition overall costs, security risks and lack of qualified staff -- strains on existing IT infrastructure topped IT leaders' lists of worries about such projects.
Beacon Technology's James Warrick, who presented the survey data at the event, described the challenge of deploying a true big data analytics program as a "three-legged stool. It requires staffing, it requires infrastructure and it requires very robust tools."
"There is an uneasiness" among federal IT and business leaders, he said "that their agencies' existing infrastructure can really handle this."
And Government Accountability Office Chief Scientist Tim Persons, who also spoke at the event, noted that justifying the investments can be difficult -- a concern cited by 63 percent of survey respondents who had big data projects underway.
"Data economics is a relatively nascent field," Persons said. "We really don't know how to value big data."
The Army's Good, for his part, said that resource issues are surmountable, so long as the value of the analytics can be clearly linked to an agency's mission.
The Army’s big data focus has lately been on cyber and intelligence, Good said, but he is pushing to use tactical big data more broadly, applying it to a wide range of Army business processes.
"It's all about decision making," Good said. "We have to do decision-point analytics... and that's a whole different thing."
Duffy agreed, and he repeated his desire to push big data beyond the small circle of scientists and specialists.
"We're talking about actionable data here," he said. "We've got to turn big data into small data that you can consume, run some diagnosis on and have something actionable."
Amanda Ziadeh is a former reporter/producer for GCN.