Supercomputing takes new direction at Oak Ridge
An ongoing program to upgrade the Jaguar supercomputer at the Oak Ridge National Laboratory in Tennessee will not only result in a much more powerful computer, it will also create a new and unique research tool. When it is complete, the updated machine, which will be called Titan, will be able to perform calculations at speeds approaching 20 petaflops, reclaiming its title as the world's fastest computer.
The Titan effort is the beginning of a new architecture design path for more much more power and energy-efficient computers, said Jack Wells, director of science at the laboratory’s Oak Ridge Leadership Computing Facility (OLCF). Graphics accelerators are key to this new approach because they provide great processing power in a very efficient power-and-size combination. There are other options for achieving efficiency, but this is Oak Ridge’s current approach, he told GCN.
One reason for new architecture considerations is to help increase performance. Clock speeds for microchips peaked in 2004 when it became impractical to mass produce faster chips, Wells said.
The making of supercomputing champ
Since then, manufacturers and computer designers have moved onto multicore methods with multiple chips operating in parallel to achieve improved performance. “If clock speed doesn’t increase, you see an increase in parallelism,” he said.
But although there is an increase in parallel processing for high-performance computing, general-purpose processors are not energy efficient. Increased parallelism requires better performance and lower energy consumption, Wells said. This is one reason that supercomputer designs are now relying on hardware such as the Nvidia graphical processing units and Intel multicore processors, he added.
GPUs act as accelerators to speed up computer performance while staying within the same physical footprint and power envelope, Wells said. The upgrades during the Jaguar transition to Titan will push the machine’s processing speed up in increments from 10 to 20 and perhaps 30 petalops, he said.
The upgrade and transition to Titan is part of the Energy Department’s user facility concept. The department meets its mission by providing large-scale, unique facilities and systems that university and corporate computers cannot replicate, Wells said. The Oak Ridge supercomputing facility falls under the department’s mandate for “big science” operations, he added.
DOE provides its facilities to researchers through calls for proposals. At the OLCF, the most compelling research projects that require the processing and modeling power of Jaguar/Titan are considered, Wells said. This process makes the facility reliant on user proposals. “We’re not executing a research program; we’re executing a user program,” he said.
When the Titan program was proposed in 2009, there was some question within DOE about whether any users would be able to use such a powerful computer, Wells said. The primary challenge was to establish that a variety of scientific software could run on the new machine.
Some of the software codes that run on Titan include, S3D, which is used in the direct numerical study of combustion; and De Novo, software that models radiation transport — a critical part of the work at Oak Ridge because it is used to model neutron transfer in reactor cores. Other types of scientific software model everything from molecular dynamics to atmospheric movements.
The software will run on Titan to support a variety of government, academic and commercial research projects. These include an effort to better understand turbulent fuel combustion run by the Sandia National Laboratory in New Mexico. This kind of research is hard because modeling the combustion of chemically complex fuels under high pressures is difficult, Wells said.
The Oak Ridge/Sandia project slated to run on Titan is part of a federal program to study how efficiently biofuels combust. Results of the modeling software will affect how commercial industry designs and studies combustion, Wells said. He added that automobile manufacturers have proprietary software used in engine design, and data from tests such as this will be plugged into the company’s modeling computers.
Oak Ridge tried to keep its users happy during the recently completed first phase of the upgrade by minimizing downtime and moving more slowly with the transition. This was achieved by shutting down half of the computer for upgrades while keeping the other half up and running, Wells said. When the newly upgraded half was reactivated, an initial user launch allowed testing of the new system. During the acceptance period, when the initial Titan upgrade was being tested, users with large jobs were brought in to help stress test the system. The machine was then briefly taken down to fully upgrade the remaining portion, he said.