Government may double computer cluster use within two years

Government may double computer cluster use within two years

Agencies plan to double the number of high-performance computer clusters they use over the next two years, according to a recent survey.

The survey was conducted on behalf of a hardware manufacturer by John Payne, who runs the consulting business JLP Associates of Saratoga, Calif. Although Payne could not reveal the name of the company that commissioned the study, he was allowed to release some of his findings.

Payne spoke at the most recent meeting of the Baltimore-Washington Beowulf User Group , held in McLean, Va.

Payne surveyed 45 organizations, including 20 government agencies, that collectively run about 400 different systems. These organizations plan to increase the number of systems they run to just under 800, Payne said. The 80,000 processors this group now collectively deploys would double to 160,000 within two years.

The systems'all clusters'range in size from 32 nodes to as many as 2,000 processors. Payne interviewed participants between February and May 2004. Government did not differ statistically from the rest of the industries surveyed, which included automobile companies and the oil industry.

In his survey, Payne also asked a number of questions to find what issues program managers routinely face.

'In general, there were no catastrophic problems. People are pretty happy,' he said. He noted general dissatisfaction with file systems, inadequate ways of cooling the systems, data loss and the time it takes to get a system into operation, which managers still see as too long.

One issue that managers did not see as a significant was overall reliability. Payne found this surprising. He surmised that, since most clusters routinely experience node failure, program managers are more tolerant than other IT managers of component failure. Such failures do not cripple the entire system, but rather slow performance.

Payne found that a system with 'thousands' of processors will experience at least a few failures each week. A system with a hundred processors can expect a failure around once a quarter.

'These systems are generally not being run by the IT bureaucracy, so it doesn't have [CIOs] watching over top of them. They're run by scientists and engineers who don't worry about it too much,' he said.

Payne did note that as vendors try to push high-performance computing clusters into more traditional agency environments, CIOs might be more skeptical of this fault-tolerant mindset.

About the Author

Joab Jackson is the senior technology editor for Government Computer News.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected