White House launches $200M 'Big Data R&D' initiative

The Obama administration has launched a “Big Data Research and Development Initiative” aimed at improving the tools and techniques required to access, organize and glean pertinent information from huge volumes of digital data.

Six federal departments and agencies announced more than $200 million in new commitments to achieve the goals of the initiative, including the Defense and Energy departments, Defense Advanced Research Projects Agency, National Institutes of Health, National Science Foundation and the U.S. Geological Survey.

NASA and the National Oceanic and Atmospheric Administration, which gather large volumes of data, were not present at the event but will be in time, John Holdren, assistant to the president for science and technology and director of the White House Office of Science and Technology Policy, said March 29 at an event on big data in Washington, D.C. Many other agencies will be engaged in the big data initiative in the future, Holdren said.

The initiative responds to recommendations by the President’s Council of Advisors on Science and Technology, which last year concluded that the federal government is under-investing in technologies related to big data, Holdren said. As a result, OSTP launched a Senior Steering Group on Big Data to coordinate and expand the government’s investments in this area.

Holdren said government encourages industry to help further the goals of the big data inititave.

"The administration's work to advance research and funding of big data projects, in partnership with the private sector, will help federal agencies accelerate innovations in science, engineering, education, business and government,” said David McQueeney, vice president of software with IBM Research

“The federal government faces significant challenges when it comes to effectively extracting and leveraging big data, especially in real time,” said Randall Jackson, vice president of MarkLogic Public Sector, a company that specializes in analytics and big data.

“This is mostly due to the underlying technology that is traditionally used," Jackson said. "Much of the data is in ‘silos’ that do not quickly or easily interact with each other.”  Additionally, the government has to contend with the velocity of data and the fact that it is distributed around the nation, making it hard to aggregate, he noted.

The big data initiative shows “how cooperation and collaboration among agencies, researchers, the private sector, universities and others can overcome these challenges, to unlock the true power of big data," Jackson said

The first of agency commitments to support the Big Data initiative include:

National Science Foundation and the National Institutes of Health. NSF and NIH will advance the core scientific and technological means of managing, analyzing, visualizing, and extracting useful information from large and diverse datasets. This will accelerate scientific discovery and lead to new fields of inquiry that would otherwise not be possible. NIH is particularly interested in imaging, molecular, cellular, electrophysiological, chemical, behavioral, epidemiological, clinical, and other datasets related to health and disease.

National Science Foundation.
In addition to funding the big data solicitation, and in keeping with its focus on basic research, NSF is implementing a comprehensive, long-term strategy that includes: new methods to derive knowledge from data; infrastructure to manage, curate and serve data to communities; and new approaches to education and workforce development.

Subra Suresh, director of NSF, outlined six specific areas that the organization is working on: 

  • Encouraging research universities to develop interdisciplinary graduate programs to prepare the next generation of data scientists and engineers.
  • Funding a $10 million Expeditions in Computing project based at the University of California, Berkeley, that will integrate three powerful approaches for turning data into information machine learning, cloud computing, and crowdsourcing.
  • Providing the first round of grants to support "EarthCube," a system that will allow geoscientists to access, analyze and share information about our planet.
  • Issuing a $2 million award for a research training group to support training for undergraduates to use graphical and visualization techniques for complex data. 
  • Providing $1.4 million in support for a focused research group of statisticians and biologists to determine protein structures and biological pathways.
  • Convening researchers across disciplines to determine how big data can transform teaching and learning.

Defense Department. DOD is “placing a big bet on big data,” said Zachary Lemnios, assistant secretary of defense for research and engineering. DOD is investing approximately $250 million annually (with $60 million available for new research projects) across the military departments in a series of programs. The programs will harness and utilize massive data in new ways and bring together sensing, perception and decision support to make truly autonomous systems that can maneuver and make decisions on their own. DOD officials also intend to improve situational awareness to help warfighters and analysts and provide increased support to operations.

To accelerate innovation in Big Data that meets these and other requirements, DOD will announce a series of open prize competitions over the next several months.

In addition, DARPA is beginning the XDATA program, which intends to invest approximately $25 million annually for four years to develop computational techniques and software tools for analyzing large volumes of data, both semi-structured and unstructured.

National Institutes of Health. NIH is announcing that the world’s largest set of data on human genetic variation, produced by the International 1000 Genomes Project, is now freely available on the Amazon Web Services cloud. At 200 terabytes the equivalent of 16 million file cabinets filled with text, or more than 30,000 standard DVDs the current 1000 Genomes Project data set is a prime example of big data, where datasets become so massive that few researchers have the computing power to make best use of them, Francis Collins, director of NIH, said. AWS is storing the 1000 Genomes Project as a publicly available dataset for free, and researchers will pay only for the computing services they use.

Energy Department. DOE will provide $25 million in funding to establish the Scalable Data Management, Analysis and Visualization Institute. The SDAV Institute will bring together the expertise of six national laboratories and seven universities to develop new tools to help scientists manage and visualize data on the department’s supercomputers, with the goal of further streamlining the processes that lead to discoveries made by scientists using the department’s research facilities.

U.S. Geological Survey. USGS announced the latest awardees for grants it issues through its John Wesley Powell Center for Analysis and Synthesis. The center catalyzes innovative thinking in earth system science by providing scientists a place and time for in-depth analysis. These big data projects will improve understanding of issues such as species response to climate change, earthquake recurrence rates, and the next generation of ecological data.

About the Author

Rutrell Yasin is is a freelance technology writer for GCN.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/Shutterstock.com)

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected