Government may double computer cluster use within two years

Government may double computer cluster use within two years

Agencies plan to double the number of high-performance computer clusters they use over the next two years, according to a recent survey.

The survey was conducted on behalf of a hardware manufacturer by John Payne, who runs the consulting business JLP Associates of Saratoga, Calif. Although Payne could not reveal the name of the company that commissioned the study, he was allowed to release some of his findings.

Payne spoke at the most recent meeting of the Baltimore-Washington Beowulf User Group , held in McLean, Va.

Payne surveyed 45 organizations, including 20 government agencies, that collectively run about 400 different systems. These organizations plan to increase the number of systems they run to just under 800, Payne said. The 80,000 processors this group now collectively deploys would double to 160,000 within two years.

The systems'all clusters'range in size from 32 nodes to as many as 2,000 processors. Payne interviewed participants between February and May 2004. Government did not differ statistically from the rest of the industries surveyed, which included automobile companies and the oil industry.

In his survey, Payne also asked a number of questions to find what issues program managers routinely face.

'In general, there were no catastrophic problems. People are pretty happy,' he said. He noted general dissatisfaction with file systems, inadequate ways of cooling the systems, data loss and the time it takes to get a system into operation, which managers still see as too long.

One issue that managers did not see as a significant was overall reliability. Payne found this surprising. He surmised that, since most clusters routinely experience node failure, program managers are more tolerant than other IT managers of component failure. Such failures do not cripple the entire system, but rather slow performance.

Payne found that a system with 'thousands' of processors will experience at least a few failures each week. A system with a hundred processors can expect a failure around once a quarter.

'These systems are generally not being run by the IT bureaucracy, so it doesn't have [CIOs] watching over top of them. They're run by scientists and engineers who don't worry about it too much,' he said.

Payne did note that as vendors try to push high-performance computing clusters into more traditional agency environments, CIOs might be more skeptical of this fault-tolerant mindset.

About the Author

Joab Jackson is the senior technology editor for Government Computer News.


  • 2020 Government Innovation Awards
    Government Innovation Awards -

    21 Public Sector Innovation award winners

    These projects at the federal, state and local levels show just how transformative government IT can be.

  • Federal 100 Awards
    cheering federal workers

    Nominations for the 2021 Fed 100 are now being accepted

    The deadline for submissions is Dec. 31.

Stay Connected