A benchmark of its own
SPECIAL REPORT | DOD's disciplined approach to supercomputing could change how HPC systems are developed
Power Line: Cray Henry says the High Performance Computing Modernization Program sets target times for real workloads.
GCN Photo by Zaid Hamid
Twice a year, work being done by the world's fastest supercomputers comes to a screeching halt so the systems can run a benchmark called Linpack to determine how fast they are, at least in relation to one another. Linpack ' which measures how many trillions of floating-point operations per second the machine is capable of executing ' is the benchmark used to rank the fastest supercomputers in the world, in the twice-annual Top 500 List.
As an exercise in flexing muscle, Linpack is about as useful as any other benchmark. But as a tool for judging supercomputing systems in a procurement process, it is limited at best. The Defense Department, through its High Performance Computing Modernization Program, is shaking up the supercomputing world by applying a more disciplined approach to purchasing big iron.
Instead of using a generic benchmark to compare models, the program issues a set of metrics that carefully codifies its own workload. Program leaders then ask vendors to respond with the best ' yet most cost-effective ' systems they can provide to execute such a workload.
'We don't specify how big the machine is,' said Cray Henry, head of the program. 'We will run a sample problem of a fixed size, and call the result our target time. We then put a bid on the street and say we want you to build a machine that will run this twice as fast.' It is up to the vendor to figure out how that machine should achieve those results.
Sounds simple, but in the field of supercomputers, this common-sense approach is rather radical.
'It's a well-oiled process,' agreed Alison Ryan, vice president of business development at SGI. She said that for vendors, 'this kind of procurement is actually difficult. It takes a lot of nontrivial work. It's easier to do a procurement based on Linpack.' But in the end, the work is worthwhile for both DOD and the vendor, because 'it's actually getting the right equipment for your users.'
'They've done a great job on the program in institutionalizing the [request for proposal] process,' said Peter Ungaro, chief executive officer at supercomputer company Cray.
DOD created HPCMP in 1994 as a way to pool resources for supercomputing power. Instead of having each of the services buy supercomputers for its own big jobs, the services could collectively buy an array of machines that could handle a wider variety of tasks, including large tasks.On the rise
Today, the program has an annual budget of about $250 million, including $50 million for procuring two new supercomputers. Eight HPCMP shared-resource centers, which house the systems, tackle about 600 projects submitted by 4,600 users from the military services, academia and industry.
As of December 2006, the program had control of machines that could do a total of 315.5 teraflops, and that number grows by a quarter each year, as the oldest machines are replaced or augmented by newer technologies.
And over the years, the program has developed a painstakingly thorough process of specifying what kind of systems it needs.
What about HPCMP is so different? It defines its users' workload, rather than use a set of generic performance goals.
Henry said that most of the workloads on the program's systems can fall into one of about 10 categories, such as computational fluid dynamics, structural mechanics, chemistry and materials science, climate modeling and simulation, and electromagnetics. Each job has a unique performance characteristic and can be best run on a unique combination of processors, memory, interconnects and software.
'This is better because it gauges true workload,' Ryan said.
To quantify these types of jobs, HPCMP came up with a computer program called the linear optimizer, which calculates the overall system performance for handling each of these jobs. It weights each job by how often it is executed. It also factors in the price of each system and existing systems that can already execute those tasks.
Once numbers have been generated for each proposed system, the program takes usability into consideration. Henry admitted that is hard to quantify, but it includes factors such as what sorts of third-party software is available for the platform and what sorts of compilers, debuggers and other development tools are available.
Once these performance and usability numbers are calculated, they are weighted against the past performance of the vendors. From there, the answer of which system may be the right one may be obvious ' or it may come down to a narrow choice between a handful of systems.
'It's not often they need the same type of system year after year,' Ungaro said.Bottom line
Although DOD generally is well- represented on the twice-annual list of the world's fastest computers ' it had 11 in the June 2007 Top 100 ranking, for instance ' the true beneficiaries are the researchers who can use the machines.
The biggest benefit? 'Time to solution,' Henry said.
DOD might need to know the performance characteristics of an airplane fuselage. Using a very accurate simulation saves money and time from testing actual fuselages.
'Typically, the kind of equations we're trying to solve require from dozens to thousands of differential calculations,' Henry said. And each equation 'can require a tremendous number of iterations.'
Imagine executing a single problem a million or even tens of millions of times at once, with each execution involving thousands of calculations. That's the size of the job these systems usually handle.
DOD has many problems to test against. Programs track toxic releases of gas spread across an environment. They help develop better algorithms for tracking targets on the ground from moving radars. They speed development of missiles. In one example, supercomputing shortened the development time of the Hellfire missile to just 13 months, allowing it to be deployed in Iraq two years earlier than otherwise would have been possible.
By providing the fastest computing power available, the program in its modest way can assure the Defense Department stays ahead of the enemy.