Does parallel processing require new languages?
Now that almost all new servers and computers are running processors with multiple cores, the software-design community is trying to figure out the best way of making use of this new architecture. Unfortunately, the community is divided about what the best way would to split their programs across these multiple cores.
Getting the full workload of multicore processors can be tricky because, in order for a program to make use of more than one core, it must divvy its workload in such a way that it doesn't take more effort than the gains achieved by adding more cores. Most programming languages were written assuming just one processor would be working through the code sequentially, line by line.
What readers are saying:
It is often more cost-efficient to redevelop a software from the start as a parallel program than trying to reverse engineer the parallelism out of a dusty-deck sequential program.
Add your own comments at the bottom of this article
"The challenge is that we have not, in general, designed our applications to express parallelism. We haven't designed programming languages to make that easy," said James Reinders, who works in Intel's software-products division and is the author of a book on parallel programming titled "Intel Threading Building Blocks."
Parallel programming requires attention into two areas, Reinders explained. One is "decomposing" the problem in such a way that it can be run in multiple chunks, side by side. "The art of figuring out how to divide things up so that they are independent is a big challenge," he said. One operation can't be dependent of another operation that hasn't been completed yet.
More on parallel processing
Does parallel processing require new languages?
Best practices for getting Java to work for multicore processors
Livermore Lab pioneers debugging tool
The Prescient Amdahl
What coders can learn from supercomputing
DOD tackles multicore computing
The fastest computers are going hybrid
Multicore does not equal core
Lawrence Livermore erects HPC test bed
Microsoft brings F# to Visual Studio
Fortress does the math
The second area requiring attention is that of scalability. The programmer does not know how many processors his or her creation will run on, just that it should run on as many processors as are available for the task. If the code specifies how many processors are being used, then it is badly written code, Reinders said.
The Defense Advanced Research Project Agency (DARPA) has been working on the issue through its High Productivity Computing Systems program (HPCS), at least for what is called coarse-grained parallelism, or programs that run across many processors. It has funded the development of a number of new languages that developers could use to write such programs.
DARPA's new languages use an architecture called the Partitioned Global Address Space. PGAS does two things: It allows multiple processors to share a global pool of memory, but at the same time it allows the programmer to keep individual threads in specified logical partitions so they will be close to the data as possible, thereby taking advantage of the speed boost brought about by "locality," as this is called.
"This is an attempt to get the best out of both worlds," explained Tarek El-Ghazawi, at a PGAS Birds-of-a-Feather session held at the SC08 conference held in Austin, Texas, last winter. El-Ghazawi is a George Washington University computer science professor who has helped guide the development of PGAS
"The idea is to have multiple threads, concurrent threads…all seeing one big flat space. But in addition, the threads would locality-aware, and you as a programmer would know what parts are local and what parts are not," he said.
One DARPA language created under this model is Chapel, which is being developed by Cray. Chapel was designed to "reduce the gap between parallel languages and the mainstream" languages, said Cray engineer Brad Chamberlain.
IBM is creating another DARPA-funded language called X10, which can run on a Java Virtual Machine, making it usable across multiple platforms. Again the focus is on familiarity. The plan was to "start with a sequential programming model that works" and add more elements of concurrency and distribution explained IBM researcher Vijay Saraswat.
But is it really necessary to develop entirely new languages? Reinders argues that extending commonly used languages, rather than building parallel-specific languages anew, would better suit for programmer needs.
"It is an interesting thought exercise to ask if we were start from scratch to build the perfect parallel programming language, what would we do? X10 and Chapel are very interesting projects and are very exciting but I don't see them catching on in any big way," he said. Why? They are too radically different from the programming languages most coders are used to. They would be too difficult to learn.
Look back over the last decade, Reinders urges. The languages that caught on, such as Java and C#, were not that different from languages that were widely used at the time, such as C++ or Visual Basic. "They felt familiar" and so it was easy for programmers to adopt them. Hence their success.
Likewise any move forward into the exciting world of parallel programming will be along the easiest path forward.
"People with legacy code need tools that have strong attention to the languages they've written and give them an incremental approach to add parallelism," Reinders said. If languages like X10 and Chapel do turn out to be popular, their advancements will be integrated into more popular languages.
Not surprisingly, Intel itself has taken this approach. It has developed an extension to C++ called Threading Building Blocks (TBB).
To build TBB, Intel developers rewrote those aspects of C++ that might lead to unpredictable results when used in a multiprocessor or multicore environment, such as memory management.
To use TBB, developers just include a link to the TBB library files in their code headers, and the TBB functions will be compiled into the code. Intel itself offers an extension to Visual Studio, called Intel Parallel Studio, that supports TBB. Using TBB, programmers don't even have to worry about writing for multiple process, or multicore processors.
Reinders offered an example of how a TBB-enhanced C++ app could work. Say a program is running across all four cores of a quad-core processor. But when another program is loaded onto one of the cores, say a virus checking software program, performance of the piece of the program running on that one core now slows down, which in turn slows the entire program. TBB functionality would automatically see that slowing in performance and move that portion of the program off that core and onto the other three.
TBB is not the only parallel-focused extension to popular languages. At that same SC08 PGAS session, other researchers showed off how they were extending popular language for parallel duties. For instance, GWU researcher Bill Carlson is developing Unified Parallel C (UPC), an extension of the C programming language for parallel environments. Over at University of California Berkeley, work is being done on dialect of Java called Titanium. Elsewhere, John Mellor-Crummey presented his work on Co_Array Fortran, an extension of Fortran 95 that is also being prepared for the next version of that language.
Whatever the approach, the goal of writing programs that can run concurrently on several processors, or processor cores remains elusive. "Concurrency is complicated, as an industry we are still coming to grips with the programming model," said Sun Microsystems engineer Brian Goetz at a talk about processors at the JavaOne conference being held this week in San Francisco. Are small changes or big changes needed?
"I'm skeptical of people who say we have to throw everything out about computing and start from scratch. We clearly don't have to do that – it's very expensive to do." Goetz said. "I think there is an incremental path to get there, but I do think we need to change the way we think."
Posted by Joab Jackson on Jun 05, 2009 at 9:39 AM