Government may double computer cluster use within two years

Government may double computer cluster use within two years

Agencies plan to double the number of high-performance computer clusters they use over the next two years, according to a recent survey.

The survey was conducted on behalf of a hardware manufacturer by John Payne, who runs the consulting business JLP Associates of Saratoga, Calif. Although Payne could not reveal the name of the company that commissioned the study, he was allowed to release some of his findings.

Payne spoke at the most recent meeting of the Baltimore-Washington Beowulf User Group , held in McLean, Va.

Payne surveyed 45 organizations, including 20 government agencies, that collectively run about 400 different systems. These organizations plan to increase the number of systems they run to just under 800, Payne said. The 80,000 processors this group now collectively deploys would double to 160,000 within two years.

The systems'all clusters'range in size from 32 nodes to as many as 2,000 processors. Payne interviewed participants between February and May 2004. Government did not differ statistically from the rest of the industries surveyed, which included automobile companies and the oil industry.

In his survey, Payne also asked a number of questions to find what issues program managers routinely face.

'In general, there were no catastrophic problems. People are pretty happy,' he said. He noted general dissatisfaction with file systems, inadequate ways of cooling the systems, data loss and the time it takes to get a system into operation, which managers still see as too long.

One issue that managers did not see as a significant was overall reliability. Payne found this surprising. He surmised that, since most clusters routinely experience node failure, program managers are more tolerant than other IT managers of component failure. Such failures do not cripple the entire system, but rather slow performance.

Payne found that a system with 'thousands' of processors will experience at least a few failures each week. A system with a hundred processors can expect a failure around once a quarter.

'These systems are generally not being run by the IT bureaucracy, so it doesn't have [CIOs] watching over top of them. They're run by scientists and engineers who don't worry about it too much,' he said.

Payne did note that as vendors try to push high-performance computing clusters into more traditional agency environments, CIOs might be more skeptical of this fault-tolerant mindset.

About the Author

Joab Jackson is the senior technology editor for Government Computer News.

inside gcn

  • Congressman sees broader role for DHS in state and local cyber efforts

    Automating the ATO

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group