Multiple-choice test

What do supercomputers and your new desktop computer have in common? More than you'd think these days.

The thing that makes supercomputers super is that they run so many processors concurrently, breaking big and complex problems into small, simple ones that each single processor can handle. Now that approach is hitting home: If you've purchased a new desktop computer in the past year or so, odds are it runs on a central processor unit with two or more cores, which also means the processor can run more than one program thread at the same time.

If only someone could teach it how.

Multicore chips can run programs faster, but those programs must be written in a way that lets them take advantage of the processor's ability to run in parallel.

'Applications will increasingly need to be concurrent if they want to fully exploit CPU throughput gains,' said Herb Sutter, software architect at Microsoft and chairman of the International Organization for Standardization's C++ Standards Committee.

There are considerable challenges to creating and coordinating multiple software threads without tangling their results. Luckily, years of lessons from highly concurrent supercomputers can help solve those problems.

Parallel process

At first, having multiple cores on a chip sounds wonderful. It's like having multiple workers on the same task, right? Sort of.

It all depends on how you split the task. On a given problem, core A might need to wait around for the result of core B's work. Or core A and core B could be trying to work on the same data at the same time. Clearly, the problem with using multiple cores is similar to the problem of using multiple workers: You must handle them wisely.

Software developers and those who manage the development process must learn new skills.

James Reinders, director of marketing and sales at Intel, said certain fundamental concepts are essential to developing software for multicore chips: scalability, correctness and maintainability.

Users have instinctive ideas about how scalability should work. A program should run faster on two cores than it does on just one, faster on four cores than on two, and so forth.

That might not always work out exactly, of course. If a program does just one thing at a time, running it on two cores isn't going to help much. The same can be true of a naturally two thread program that you try running on, say, four cores. 'You're getting diminishing returns,' said Margaret Lewis, director of commercial solutions at Advanced Micro Devices.

Some techniques are also more scalable than others. Intel has a library of template building blocks to extend C and C++, making it easier to take advantage of multicore chips.

Intel has made this library open source through the Threading Building Blocks Open Source project.

That brings up the question of what the optimum number of processors is for a given program.

'Determining this is hard to do,' Reinders said. Experts can make an educated guess, and there are tools that try to spot opportunities for concurrency.

One type of concurrency is called data parallelism ' routinely doing the same thing with different pieces of data. Examples of data parallelism can be found in graphics, games, physics, video and data mining programs.

Such tasks are what the experts call embarrassingly parallel and are the best opportunities for making performance gains in an application.

Many developers are already experienced in programming for a CPU and a graphics processor: That separation of independent processes is what you're looking for.

There's also task parallelism ' performing multiple steps on each data element. An example of task parallelism is handling airline reservations.

One step involves checking user identification, another checks for a seat on the flight, and a third step might involve charging the cost to a credit card.

Even if tasks aren't embarrassingly parallel, the goal is to break processing into pieces that don't depend on one another. 'This is the lowest-hanging fruit, the easiest place to get performance gains,' Lewis said. Of course, there are always some tasks you can't decompose and which won't benefit from concurrency.

Parallel correctness

The second fundamental concept is correctness.

Multicore chips are doing more than one thing at a time, which can result in new kinds of errors called race conditions. The problem is that a program can become nondeterministic: It runs differently each time. This happens when core X depends on a result of core Y's processing. If Y finishes in time, all is well: X gets the information it needs and proceeds. If Y doesn't finish in time, but X thinks it did, X may proceed using incorrect information.

'Such timing-dependent problems are difficult to track down,' Reinders said.

If developers aren't aware of the potential problems with multicore chips, two common results can occur. First, the application might never be shipped because it is obviously unstable. Alternatively, the application might seem stable, get shipped and cause strange problems for users.

Developers must accept that this can happen and work to reduce the likelihood. 'Developers must write code deliberately to prevent such problems and test deliberately to expose them,' Reinders said.

The third fundamental concept is maintainability, which Reinders also calls futureproofing.

The goal is to not be surprised by the chips to come, with four, eight, 16 or however many cores on them. And it's not just the number of cores, Reinders said. 'Future chips might have different types of cores on the same chip.'

How can developers plan to deal with multicore chips of the future? One key is to develop at the right level of abstraction and stay flexible.

Don't divide tasks and assign them irrevocably to specific threads. Instead, let the compiler handle these decisions so that the behavior isn't determined until runtime.

Concurrent events

Luckily, developers don't have to perform all this complicated analysis on their own. Tools are available to support three major classes of development efforts: programming, debugging and performance.

Programming tools include coding languages, associated libraries and compilers. For example, Intel offers libraries of basic algorithms in categories such as mathematics and multimedia. These libraries support development in C, C++ and Fortran, plus their extensions. By using these libraries, developers can stick with languages they know and still write code suitable for multicore chips.

Many major commercial software vendors make use of these libraries, including Oracle, Adobe, SAS and SAP.

Compilers can do a lot of the heavy lifting.

They can recognize data that's going through similar steps and perform automatic vectorization and parallelization. Well-known compilers include the free GCC from the GNU project and those from Sun Microsystems and the Portland Group.

Debugging tools help developers identify and localize bugs that are peculiar to multithreaded applications. By using these tools, developers can find the strange timing problems that can arise, including race conditions and deadlock, when two or more threads come to a halt, each waiting for the other to proceed.

The main attraction of multicore chips is the opportunity for performance gains, so it's not surprising that performance monitors, or profilers, are important. Such tools must be nonintrusive so they don't affect the processing that they're monitoring. The most common outcome of monitoring is that developers learn some cores are busily working while others are doing nothing whatsoever. Fixing such obvious problems can yield significant performance gains and justify use of the tool. Only rarely are these tools used to tweak the last few percentage gains in processing.

'Parallel programming is going to require better programming tools to systematically find defects, help debug programs, find performance bottlenecks and aid in testing,' Sutter said. You must look at the big software development picture. You can't develop deliberately parallel applications, and expect to test and integrate them the same old way.

Testing must aim to expose problems with correctness, for example.

This could mean testing under unusual conditions of data flow and intentionally messing with timing. Performing tests on different hardware platforms can be valuable because race conditions manifest themselves differently on different systems.

Integration can require special planning, too.

Are you integrating parallelization with single-threaded software? Be on the lookout for odd results if single-threaded applications don't get the information they expect, when they expect it.

Balance of power

The similarities between developing for multiprocessor systems and multicore chips mean you can draw on previous work. This can include literature, training, tools and personnel.

Make good use of resources already available for other purposes.

Using multicore chips to speed performance is usually desirable, but that's not the only approach or even necessarily the best one. After all, 'if it ain't broke, don't fix it' applies here, too. Rewriting some existing applications to take advantage of multicore performance gains could cause more trouble than it's worth.

Alternatively, Reinders suggests using concurrency to add new features to existing applications.

For example, you might want to improve a user interface with sound or video elements or better graphics and nicer fonts. Security improvements, such as additional encryption and decryption, are another possibility.

These features might have been beyond the ability of earlier processors, but assigning these enhancing, nonessential tasks to idle threads could make sense now.

Virtualization is another way to take advantage of multiple processing threads without major programming. 'Virtual servers are inherently parallel,' Lewis said. Assigning tasks to virtual servers may be more of an architecture or implementation decision than a development decision, but virtual servers can produce considerable use of multiple threads.

This is especially useful for the older code that government agencies often must deal with. Without rewriting code significantly, such older applications can apply parallel savvy virtual servers and reap the advantages of multicore chips. For many applications, this is the safest approach.

Awareness of the possibilities ' and challenges ' of multicore chips is essential. As multicores become more common, so will others' tools and solutions. Developers can glean ideas for their own development projects from the good examples. Soon, development for multicore chips will be as common as developing for single-core ones used to be.

DeJesus (exdejesus@gmail.com) is a freelance technical writer.

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above