Titan supercomputer

Supercomputer-powered training puts machine learning on the fast track

Neural networks offer the promise of making sense of massive datasets, but only if they can be trained to know precisely what to look for in order produce valid results.

Creating such networks is labor intensive, taking months to organize. It involves ensuring the right layers -- convolution layers, pooling layers, interconnected layers -- and the right number of layers combine in a way that, in the end, can accurately classify what’s in the data the network is intended to analyze.

“There is a lot of design process to architecting what the network looks like,” said Travis Johnston, a postdoctoral researcher at Oak Ridge National Laboratory.

Johnston and his ORNL colleague Steven Young have been tapping into the lab’s Titan supercomputer to automate the design of neural networks. They have now developed two pieces of code that – when running on the lab's supercomputer -- dramatically speed up the processes.

The Multi-node Evolutionary Neural Networks for Deep Learning (MENNDL) has been under development for a couple years and is a genetic algorithm that uses selects the best performing network to create a next-generation version, eventually optimizing until the best solution has evolved. The other code, RAvENNA, is for “more fine-grained tinkering,” Johnston told GCN.

MENNDL builds the network from the ground up, making decisions on the number of layers and what each layer will do. It starts with randomly guessing how to assemble to networks and then tests the networks against the datasets it’s being built to analyze.

RAvENNA takes these macroscale network suggestions and provides more micro-level adjustments, like the number of neurons on a layer.

Both tools can generate and train as many as 18,600 neural networks simultaneously and achieved a peak performance of 20 petaflops on Titan, ORNL officials said. In practical terms, that translates to training 40,000-50,000 networks per hour.

Researchers at Fermilab had been working for three months on creating a neural network for their research observing neutrinos going through detectors. ORNL used its two new codebases along with 4,000 nodes of Titan to create a better network in 24 hours, Johnston said.

“They had a pretty decent network,” he said. “But MENNDL was able to come up with one from scratch that dramatically outperformed what they had done.”

This means scientists can spend more time on research than on building neural networks. Running MENNDL and RAvENNA on a supercomputer dramatically brings down the time-to-solution variable, Young said.

The process is limited by the number of networks that can be evaluated and trained in a given period of time, Johnston said. When ORNL brings its Summit supercomputer online in the coming months, efficiency will increase even more.

There will be more and better GPUs in Summit that will be able to test more and larger networks.  "Out of the box, without tuning to Summit's unique architecture, we are expecting an increase in performance up to 50 times," Johnston said.

The researchers also plan to combine MENNDL and RAvENNA into a single piece of software, so RAvENNA can directly refine the results from MENNDL. They also want to change the backend they use to train the network to allow for more flexibility in what the networks look like, Johnston said.

Researchers outside ORNL have shown interest in the applications, Johnston said, so they’re looking at software licensing and open source possibilities.

About the Author

Matt Leonard is a reporter/producer at GCN.

Before joining GCN, Leonard worked as a local reporter for The Smithfield Times in southeastern Virginia. In his time there he wrote about town council meetings, local crime and what to do if a beaver dam floods your back yard. Over the last few years, he has spent time at The Commonwealth Times, The Denver Post and WTVR-CBS 6. He is a graduate of Virginia Commonwealth University, where he received the faculty award for print and online journalism.

Leonard can be contacted at mleonard@gcn.com or follow him on Twitter @Matt_Lnrd.

Click here for previous articles by Leonard.


inside gcn

  • power grid (elxeneize/Shutterstock.com)

    Electric grid protection through low-cost sensors, machine learning

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group