Bringing big data down to size

To make it easier for scientists to stream, archive and analyze massive research datasets, the Department of Energy announced it is investing $13.7 million in projects that will address the challenges of moving, storing and processing data produced by observatories, experimental facilities and supercomputers.

The projects – led by five universities and five DOE National Laboratories across eight states – aim to develop mathematical and computer-science techniques that will effectively reduce the size of the datasets by removing trivial or repetitive data while preserving the trustworthiness of the information.

The projects will tackle:

  • Compressing streaming data: Researchers at Oak Ridge National Laboratory will develop ways to compress data coming directly from a scientific instrument or a computer model by integrating advanced machine-learning techniques.
  • Selecting and tuning compression techniques: Researchers at Texas State University will investigate data compression techniques and select the best method based on a user’s requirements for fidelity, speed and memory usage.
  • Compressing groups of datasets: Researchers at the University of California, San Diego will develop scalable techniques for compressing multiple related streams of data, such as those from multiple sensors observing the same physical system, by leveraging the relationships between the data sets.
  • Programming custom hardware accelerators for streaming compression: Researchers at Fermi National Accelerator Laboratory will work on encoding advanced compression and filtering as custom hardware accelerators for use in a wide array of experimental settings.

“Scientific user facilities across the nation, including the DOE Office of Science, are producing data that could lead to exciting and important scientific discoveries, but the size of that data is creating new challenges,” DOE Associate Director for Advanced Scientific Computing Research Barb Helland said. “Those discoveries can only be uncovered if the data is made manageable, and the techniques employed to do that are trusted by the scientists.”

About the Author

Connect with the GCN staff on Twitter @GCNtech.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected