70 CPUs add up to big power

Researchers at the Energy Department’s Los Alamos National Laboratory—tired
of prohibitive pricing and long queues for supercomputing time—took matters into
their own hands and wired 70 CPUs into one of the fastest and cheapest supercomputers in
the world.


Their homespun improvisation, called Avalon, can perform 20 billion floating-point
operations per second (GFLOPS).


It helped crack a prominent elliptic-curve cryptosystem challenge, was nominated for
the Gordon Bell prize for significant achievement in parallel processing and recently
ranked as the world’s 315th fastest supercomputer in a competition with 500
multimillion-dollar commercial systems at the Supercomputer ’98 conference in
Mannheim, Germany.


But that’s not enough for the Los Alamos team. By Labor Day, the engineers will
more than double its size to 144 CPUs.


“It’s ready to go,” said David Neal, a systems administrator for Los
Alamos’ Center for Nonlinear Studies. “The only thing holding us back is we have
to get more [electrical] power in the computer room.” Avalon’s processing power
could rise as high as 60 GFLOPS, he said.


The original Avalon has attracted wide attention.


It helped Los Alamos researchers get cheaper supercomputing cycles, and it also won
users more control over future design of the world’s fastest machines.


Use of off-the-shelf parts makes supercomputing possible for many more scientists, said
Avalon chief architect Michael Warren of the Los Alamos lab’s Theoretical
Astrophysics Group.


“Michael was the one who made it all come together,” Neal said, then gave the
blueprint to Neal and said, “Go buy it.”


With the help of David Moulton and Aric Hagberg from the lab’s Mathematical
Modeling and Analysis Group, Warren and Neal built Avalon for $152,000. Team members
noted, however, that price wasn’t the only breakthrough.


Benchmarks put Avalon “within factors of two or three in absolute performance of
64-processor parallel supercomputers” such as the Silicon Graphics Inc. Origin2000,
Cray Research Inc. T3E and IBM Corp. SP2, all of which cost $1 million or more, Warren
said in applying for the Gordon Bell prize.


Although the team is pleased by the honors, Warren was quick to say, “You
shouldn’t think we were looking necessarily to win awards. What we wanted was a tool
that would help us get our research done.”


Building Avalon took about 28 staff hours. On April 10, the lab took delivery of 70
Digital Equipment Corp. computers or nodes, completely assembled with the operating system
installed and configured. Each node consisted of:


The nodes are networked by two 36-port Fast Ethernet SuperStack II 3900 switches from
3Com Corp. of Santa Clara, Calif., four 3Com gigabit uplink modules, a 12-port 3Com
Gigabit Ethernet SuperStack II 9300 switch and three Cyclom-32-YeP serial concentrators
from Cyclades Corp. of Fremont, Calif.


“What’s amazing is the time,” Neal said. “We decided to build it
last Feb. 28.” In March they sent out a request for bids for the hardware, and in
April the machine was running.


“Hardware was no big problem,” Warren said. “What people in our
situation are usually concerned about is finding or creating the kind of software that can
operate the machine.”


The Los Alamos team used an open-source-code Linux operating system and Gnu software
tools, downloaded free from the Internet. Avalon executes one instance of the OS on each
node, Neal said. The master node runs Version 2.0.35 of Linux, while the remaining nodes
run Version 2.1.103. The Linux package came from Red Hat Software Inc. of Research
Triangle Park, N.C., is priced at $50 and includes support.


“Linux was great because it allowed us to do some parallel processing, bundling
many chips together to create a system much more powerful than its individual parts,”
Warren said.


Warren has since done some tweaking to improve performance, Neal said. He replaced the
operating system kernel with a newer version, Hagberg said.


“Compiling and replacing a kernel is a straightforward procedure that many Linux
users do regularly to take advantage of new features or performance improvements,”
Hagberg said.


“It was and is this ability to modify the kernel that has contributed
significantly to Avalon’s success,” Neal said. “For example, Mike modified
the network driver used by the nodes for improved performance.”


The only application software Avalon runs that’s not standard with Linux is the
freeware mpich, which consists of libraries of routines that aid in designing parallel
codes for Avalon, Hagberg said.


“Mpich is a freely available portable implementation of MPI, the new standard for
message-passing libraries,” he said.


One of Avalon’s first tasks was to simulate a shock wave going through 60 million
atoms. The simulation ran for more than 300 hours calculating at an average 10-GFLOPS
rate.


Next on Avalon’s agenda: simulation of galaxies.  


Although the Avalon may be one of the world’s
fastest supercomputers, it is not the first homegrown one.


Thomas Sterling conceived the idea in 1994 while working at
NASA’s Goddard Space Flight Center in Greenbelt, Md. His system, called Beowulf,
boasted many of the same price and performance advantages as the Avalon.


“Beowulf was also created out of necessity because, six or
seven years ago, supercomputer companies that solved problems associated with nuclear
weapons and arsenals began going out of business,” Sterling said. “They began to
go under in part because of their multimillion-dollar prices and in part because of the
highly specialized uses.”


The Beowulf project simply interconnected CPUs built from
mass-market components and ran free software. A Beowulf supercomputer has minimal
requirements:


With Sterling’s creation of the first Beowulf system, of
which Avalon is the latest incarnation, the mold was cast. In 1996, Los Alamos National
Laboratory’s Michael Warren and scientists in the Theoretical Astrophysics Group
faced a situation similar to Sterling’s. They opted to build their own computer from
off-the-shelf parts. Called Loki, it cost only about $55,000 to construct.


Loki is a 16-node parallel machine with 2G of RAM—128M per
node—and 50G of disk storage. Each node has a 200-MHz Pentium Pro CPU with 256K
integrated Level 2 cache, 1.2G hard drive and Fast Ethernet adapter.


“Most scientists just want to work their problems, not
develop computers,” Sterling said. “But many of them who need big power and
number-crunching ability are going out on their own and improvising.”


Sterling, who now works at the California Institute of Technology
in Pasadena, Calif., said, “In the long run, it makes more sense to have your own
machine than to have supercomputer access for only two hours per week—often in the
middle of the night—to run big problems.”


Machines such as Avalon are best for preliminary problems and
number crunching but slow compared with state-of-the-art machines in many national
laboratories, Sterling and Warren agreed.


“Machines like the Avalon do one thing really great: They
help me solve more problems faster,” Warren said. 


—Jonathan Ewing

inside gcn

  • open doors to cloud (Sergey Nivens/Shutterstock.com)

    New vendors join FedRAMP Connect

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above