DOE investing in machine learning tools for data analysis
To help researchers better analyze the massive amount of data they collect from their experiments, the Department of Energy is dedicating $29 million to develop new machine learning tools and advanced algorithms that will benefit multiple scientific fields and inform cutting-edge solutions for a variety of complex problems.
Today’s scientific facilities, instruments and high-performance computing (HPC) simulations regularly generate terabytes of data -- so much that traditional analysis methods can struggle to interpret the data efficiently. More advanced machine learning tools can identify patterns in data that humans cannot detect, running up to thousands of times faster than traditional data analysis techniques.
“As research tools like computers or microscopes have gotten more powerful, the amount of data they can gather has gotten overwhelming—and scientists need new capabilities to make sense of it all,” Energy Secretary Jennifer M. Granholm said. “Advanced analysis methods will help them unlock the full potential behind all this data, so that we can solve even our most complex challenges.”
A number of factors are driving this need. Emerging scientific computing technologies – such as convergence of HPC, massive data, and artificial intelligence/machine learning on increasingly heterogeneous architectures – will require new analysis techniques. Second, the growing use of neural networks that can implicitly learn from massive amounts of training data will likely change the way applications are programmed. Finally, new approaches will be needed to realize the full potential of AI/ML for scientific discovery.
Up to $21 million will focus on high-impact approaches to machine learning under the Data-Intensive Scientific Machine Learning and Analysis program. The principal goal is the development of reliable and efficient AI/ML tools for managing massive, complex and multi-modal scientific data.
Rather than incrementally extend current research, the program aims explore unconventional approaches to solving challenges posed by AI/ML for scientific inference and data analysis, the announcement said. Possible approaches might feature “asynchronous computations, mixed precision arithmetic, compressed sensing, coupling frameworks, graph and network algorithms, randomization, Monte Carlo or Bayesian methods, differentiable or probabilistic programming, or other relevant facets.”
The remaining $8 million is dedicated to the Randomized Algorithms for Extreme-Scale Science program, which aims to make large datasets easier to understand. Its goal is to explore the use of “randomized” algorithms, which use random sampling to simplify extremely large datasets for analysis and are much more accurate than current methods.
In this case, DOE said it is looking for algorithms “that use some form of randomness in their internal algorithmic decisions to achieve faster time to solution, better algorithmic scalability, enhanced reliability or robustness, or other improvements in scientific computing performance.”
Possible research topics include:
- High computation and communication complexity and the development of efficient algorithms.
- High data dimensionality and finding sparse representations for data from scientific instruments and user facilities.
- Better algorithm scalability for low-power, high-performance edge computing.
- Improved algorithm reliability and robustness to noise.
This investment “will boost scientific breakthroughs and assist the United States with analyzing and solving some of the greatest challenges facing our nation, like climate change, new cures for quality healthcare and cybersecurity,” said Rep. Darren Soto (D-Fla.).
Connect with the GCN staff on Twitter @GCNtech.