What coders can learn from supercomputing
- By Joab Jackson
- Jan 03, 2008
COULD SOFTWARE PROGRAMMERS TAKE ADVANTAGE of techniques first developed in the high-performance computer community? For some time, this community has had to divide its programs in large clusters across hundreds or even thousands of processors.
The approach of choice has been a library called the Message Passing Interface (MPI), an open-source library of bindings that can coordinate tasks among various processors when used within programs.
'MPI is an interface for the hardware to pass [intra-program] messages around,' said Matthias Gobbert, a mathematics professor at University of Maryland Baltimore County and an administrator for UMBC's Center for Interdisciplinary Research and Consulting. His comment came during an on-campus talk about MPI. Volunteers originally developed MPI, and the Defense Advanced Research Projects Agency and the National Science Foundation funded later work.
In an ideal world, if people have two processors, they would want a program designed to run on both processors to run twice as fast.
And with three processors, they would like that program to run three times as fast.
However, programs designed for concurrent execution rarely hit such benchmarks.
On the hardware side, Gobbert noted, performance might be hampered by how fast the network fetches and retrieves data, and by processors competing for shared resources such as memory and communication buses. Gobbert said he has seen cases where a 32- processor system ' with two processors per node ' barely outperforms a system made up of 16 single-processor nodes. The dual-processor systems suffered when running a large job with frequent calls to memory, where large amounts of data were stored. The two processors sharing a node competed so much for memory resources that the improvements were minimal over the system with half as many processors.
MPI was designed to help the programmer reduce the amount of inter-processor communication that a program undertakes. 'If you can get the number of program communications down, you can get the best performance,' Gobbert said. There are two versions of MPI: MPI-1.2, which has been widely implemented, and the newer, though less widely used, MPI-2.1.
'MPI actually forces you to make decisions,' Gobbert said. Many older vector-based and shared-memory computer systems divided the work automatically. However, behind the scenes, the operating system furiously worked to eliminate cache incoherence or update all the different copies of a piece of data across the system when a change was made. Doing all this work slowed performance. MPI allows programmers to boost performance by identifying the particular parts of a program that can best be executed simultaneously on different machines.
Of course, the problem is the programmer must know how to do this.
One aspect of MPI is that it doesn't greatly alter a programmer's environment. The library is primarily a set of bindings, available for Fortran, C and C++ applications, among others. The program code remains a single file, even if different processes are carved off for different processors to tackle.
With MPI, the programmer does not need to know how many processors the program will have to work with. The programmer defines that number as a variable, and then allows the program to get that number from the operating system at runtime. 'The program says split [the work] across all processors,' Gobbert said. Each MPI-driven program has a head node, which then assigns functions to other nodes, and coordinates the results when they are returned.
Once the programmer is finished, the application is compiled with an MPI wrapper that runs in conjunction with the compiler for the native language of the program itself. Many Linux distributions offer mpicc and mpiCC, both MPI wrappers for C compilers. The MPI commands are handled by the MPI compiler while the non-MPI aspects are handled by the native compiler.
MPI is not a cure-all for concurrent programming, Gobbert said.
How the tasks of the program are divided up is where the genius resides in any parallel program ' and can make all the difference in how well a program performs. And this discipline requires plenty of smarts and practice. 'You have to turn around your thinking sometimes, compared to serial thinking,' Gobbert said.
For a Web site of MPI resources, maintained by Argonne National Laboratory, go to GCN.com/890.