Energy links supercomputers
Energy links supercomputers
Program changes emphasis from distributed computing to distance computing
By William Jackson
The Energy Department is putting a lot of energy into its Accelerated Strategic Computing Initiative supercomputers, 'trying to reach 100 trillion floating-point operations per second by 2004,' said John Naegle, a top member of the technical staff at Sandia National Laboratories in Albuquerque, N.M.
Several supercomputers already process in the 1- to 2-TFLOPS range at Sandia in Albuquerque and Livermore, Calif., at Lawrence Livermore National Laboratory in Livermore, and at Los Alamos National Laboratory in Los Alamos, N.M.
A 10-TFLOPS machine is coming online at Lawrence Livermore, and a 30-TFLOPS machine will go in at Los Alamos early next year.
But it does not make economic sense for every lab to have the biggest, fastest supercomputer, said Rich Gay, a senior member of Sandia's technical staff.
'Rather than spend money replicating them, we want to build communications,' Gay said.
DisCom2, Energy's Distance and Distributed Computing and Communication program, will network the high-end computing resources so that scientists at the different labs can share them.Shifting gears
The emphasis is on distance computing'providing access from remote sites'rather than distributed computing, which puts multiple computers to work on one problem.
The distance computing network will require some big pipes to feed the supercomputers.
The rule of thumb, Naegle said, is that each TFLOPS of computing power requires about 1 Gbps of WAN bandwidth. That much bandwidth will come from an asynchronous transfer mode WAN.
ATM router modules bridge Gigabit Ethernet LANs at four national labs to a 2.5-Gbps ATM backbone that lets researchers share Energy's supercomputers.
To connect the ATM backbone to Ethernet LANs at the labs, Energy is trying out a new ATM Routing Module from Cisco Systems Inc. of San Jose, Calif. The module has been installed in Cisco Catalyst 8540 switches at each of the labs, Gay said.
Cisco developed the module because of the growing demand to bridge IP enterprise networks using Fast and Gigabit Ethernet onto ATM backbones, said Indrajit Roi, marketing manager for Cisco's metropolitan services business unit.
LAN Emulation does so by encapsulating Ethernet and token-ring packets and converting them to ATM cells. The technology works well for small, flat networks, but not so well for large, hierarchical networks, Roi said.
'LANE is not so hot right now,' he said. 'LANE is a CPU hog.'
The Cisco modules support LANE as well as the Internet Engineering Task Force's Request for Comments 1483 standard for multiprotocol encapsulation in Layer 3-to-ATM integration over WANs.
The Catalyst 8540 module has two Gigabit Ethernet ports or one Gigabit Ethernet and one packet-over-Synchronous Optical Network port. It supports up to 256,000 routing table entries, freeing up the CPU cycles commonly used for address resolution.
The labs' Catalyst 8540 switches each have an OC-48 card coming in and two Gigabit Ethernet ports coming out of the ATM module. The labs are standardizing on Cisco products for DisCom2. 'Interoperability is the key here,' Gay said. 'We can pool our knowledge if we all own the same things.'
As the switches come online, the backbone will have to be beefed up. Current bandwidth providers are AT&T Corp. and TeleRio Inc. of Albuquerque, N.M.
'The intranet right now is OC-3, and that is vastly underpowered for computing in the tens of teraFLOPS,' Naegle said.
It will be bumped up to a 2.5-Gbps OC-48 pipe. The first OC-48 leg will open between Sandia and Lawrence Livermore next month, and the rest of the intranet will be expanded in the first half of 2001. It should be available to early adopters by March.
The bandwidth of the OC-48 backbone still falls below his 1-Gbps-per-TFLOPS rule of thumb for distributed supercomputing, Naegle said, 'but 2.5 Gbps is a lot of bandwidth. We think we can learn a lot about our needs.'