When data centers lose their cool
As processors get more powerful and servers get hotter, system modernization takes on new urgency
- By Joab Jackson
- May 15, 2006
Every 10-degree [heat] increase doubles the failure rate of that system.'
,br> 'Wu Feng, former team leader at Los Alamos National Laboratory
Data center managers like to say they create weather. They're not exaggerating.
To prepare for a new supercomputer he'll oversee, Ramesh Kolluru, director of Louisiana's Center for Business and Information Technologies, set up an ad hoc data center within his research facility. Two rows of server cabinets, packed with SGI Altix 350 systems, run 15 feet down the center of the lab. When you start walking down the aisle between those racks, the temperature is a pleasant 78 degrees; by the time you reach the end'about seven paces later'the temperature has plunged to a wintry 40.
Lately, the weather inside data centers has grown unmanageably hot, forcing IT managers to reconsider plans they'd taken for granted. And data center modernization'the unglamorous second cousin to today's business systems modernization'has become essential. Often, an agency can't successfully complete the latter without seriously considering the former.
'You cannot build data centers like you used to,' said Richard Sawyer, director of data center technology for American Power Conversion Corp. of West Kingston, R.I.
That's partly because modern servers can deliver a lot more processing power in a lot less space than they could a decade ago. While the benefits are immediate, the costs are often more subtle'and over the long run can adversely affect an agency's ability to accomplish its mission.
At first you might simply notice servers on the top racks failing more often than those at the bottom. You may see groups of servers running hotter than the rest''hot spots,' as they're known. You may find you're running your backup cooling unit around the clock, instead of just for emergencies. Bring in a cadre of blade servers, and suddenly the heat problem grows'as do the electricity bills. At worst, you may end up with entire racks of servers, or 'dark clusters,' powered down until electricity and cooling concerns are addressed.
According to Gary Spilde, site planning manager for Mountain View, Calif.-based SGI, 'The government has an awful lot of 20- and 25-year-old data centers. Typically, there is a lot of retrofitting to be done.'Power surge
Part of the problem stems from the law of unintended consequences. In October 1995, following the lead of the commercial enterprise sector, the Office of Management and Budget issued a bulletin calling for agencies to consolidate operations into data centers as much as possible.
'There was a push to ... put more gear into the same bit of real estate,' said Douglas Alger of the Cisco Systems Inc. Data Center Infrastructure team. As organizations rented or built data centers, they looked to cut real estate costs as much as possible, so it made sense to buy small.
Server manufacturers heard the call for compactness. Servers that used to take up 3.5 inches of vertical space in a rack'called 2U servers'were replaced with more powerful units that took up half the space. Today's blade servers are even more space-efficient.
But packing more'and more powerful'processors into a cabinet also requires more juice. Much more juice. By some industry estimates, each new generation of servers requires 30 to 50 percent more power.
'I don't know where the cut-off is when a department chair says 'This is too much power,' ' confessed Thomas Zacharia, associate lab director at the Energy Department's Oak Ridge National Laboratory.
Perhaps a bigger problem than soaring electric bills is the heat that servers emit. 'When you increase power, you're doing more work, and you're also creating more heat,' said Brad Nacke, who heads up government relationships for cooling equipment vendor Liebert Corp. of Columbus, Ohio. Some of that heat comes from the memory modules and some from the disk drives but, in most cases, at least half the heat comes from the processor.
Providing proper cooling escalates electricity bills even more. Poorly cooled servers run slow or even shut down when they get too hot. Heat also accelerates equipment breakdown, said Wu Feng, a former team leader at Los Alamos National Laboratory and now part of the computer science department at Virginia Polytechnic Institute.
'Every 10-degree increase doubles the failure rate of that system,' Feng said. Failure rates, of course, mean replacement costs and manpower costs for repairing or replacing the components.
Data center architects didn't spend much time on cooling issues, Nacke said. For instance, rack-mountable servers are designed to pull cold air in from the front and emit hot air out the back. Often server cabinets are arranged front-to-back, meaning the hot air from one server gets sucked into the one behind it. But of course, spreading out servers is rarely a practical solution given real-estate prices and per-foot costs of intrarack cabling.Technology to the rescue
There are a number of ways agencies can approach data center modernization projects (see sidebar for cooling strategies). Perhaps the most significant is to pursue server technologies that require less energy. Processor vendors, in particular, have been working to address customers' call for more energy-efficient systems.
Faced with the task of building the world's largest computer system'one to be built from hundreds of thousands of processors'engineers from Lawrence Livermore National Laboratory and IBM Corp. decided the fastest CPUs weren't necessarily optimal.
'No one would have been able to afford the power bill,' said Herb Schultz, a program director in IBM's deep computing group. The BlueGene/L supercomputer runs over 130,000 processors and will eventually execute up to 360 trillion floating points per second.
With so many processors churning away, engineers looked at broader measures of performance, such as performance-per-watt and performance-per-square-meter, both of which factored into technology and cooling decisions.
As a result, they equipped each node with two IBM PowerPC 440 processing cores. The chips don't execute as many operations per second as the finest from Advanced Micro Devices Inc. and Intel Corp., but because the IBM chips run slower, they run cooler. More nodes can fit into a given area.
'BlueGene/L is a model for where high-density computing is going,' APC's Sawyer said. In other words, all agencies have to start thinking about data centers in a way that balances performance against other costs.
Earlier this year, Sun Microsystems Inc. introduced a new performance measure, called the Space, Wattage and Performance'or SWaP'metric. SWaP, very simply, is performance divided by space consumed times power, said Fadi Azhari, director of marketing for the scalable systems group at Sun. An organization can get a better handle on the true cost of a server through this approach, he said.
Regardless of metrics, companies are starting to acknowledge the new realities of the data center. Both AMD and Intel have introduced dual-core processors, which can execute more than one thread at a time even though they have lower clock rates than their single-core predecessors. Both companies are planning processors with four or more cores.
In June 2006, Intel is scheduled to roll out its next-generation server processors, code-named Woodcrest. According to company officials, this line of processors should ultimately boost performance by 80 percent while reducing power consumption by 35 percent compared to a 2.8-GHz Intel Xeon chip. The savings will come from the chip's multiple cores and smaller transistors, which require less voltage to switch.
Azhari says midsize data centers (those with thousands of servers) can save up to $4 to $5 million per year in electricity and floor space costs by going with more efficient processors.
The fact of the matter is that incremental, thoughtful adjustments to a data center plan are all it takes to overhaul operations. In that respect, data center modernization should be easier than the business systems modernization programs that sometimes bog down agencies.
According to APC's Sawyer, 'Marginal changes in efficiency can have big impact on operational budget.