Best practices for getting Java to work for multicore processors
SAN FRANCISCO — The method Java offers programmers to develop applications that run on multicore processors is tricky and can easily lead to error, said Jonas Bonér of Swedish consulting firm Crisp AB. Fortunately, he offered some alternative ways of writing Java programs to avoid such pitfalls.
Today, most new computers run on two or four processors, and the number of processors on each chip will continue to increase in the years to come. As a result, there is an emerging crisis in how software is written for such processors, Bonér said during a session at Sun Microsystems’ JavaOne conference in San Francisco.
Java programs can be engineered to run across multiple cores, but the most common method Java programmers have for doing that — namely, keeping track of individual threads and data locks — is confusing and not foolproof. A programmer might not know the order in which a multicore processor can run threads within a program or across multiple programs. Different sequences can yield different results, which is called indeterminacy.
A simple example is the software that regulates how an automated teller machine works. A customer might wish to withdraw some money and then get a receipt that reflects the new account balance. If the thread that checks the balance is executed before the one that withdraws the money, the balance would be incorrect.
Another potential problem could occur when another party tries to deposit money into the user's account at the exact moment the user is trying to withdraw money. In that situation, a deadlock might occur because two parties are vying for the same resource, which could prevent both from completing their desired actions.
Of course, banking software has safeguards to avoid such errors. But other programs might not be so tightly written and could result in varying outcomes or unexpected deadlocks.
"Sequential code is deterministic. And when you throw in threads and locks, everything becomes indeterministic," Bonér said. "Threads and locks are extremely hard. I've been bitten so many times by this."
Bonér offered three approaches for grappling with the problem, and all of them seem to be in the early stages of adoption within the Java development community.
One is called software transactional memory (STM), which acts like a transactional database. In a database, if a deadlock occurs, the database rolls back to an earlier state, and both parties try to execute their actions again, perhaps on a staggered schedule. Each operation — such as withdrawing money and then checking the balance — is atomic, meaning that if the entire sequence isn't completed, then the program returns to the state it was in before it started the transaction.
Although Java does not offer STM, another language that runs on Java Virtual Machine — a dialect of Lisp called Clojure — offers that capability. "Using Clojure, unless the entire set of transactions are completed, [the program] throws an exception," Bonér said. Atomic operations can be constructed within Java, but it requires more expertise on the part of the programmer.
Another method is message-passing concurrency. In that approach, programs are broken down into a series of individual lightweight processes, which communicate with one another through messages. The components, which can also be called actors, have no concurrent operations and instead perform all their operations sequentially.
That messaging approach "raises the abstraction level," Bonér said. “It is easier to reason about. It is easier to see where the problem might be.”
As an example, he presented a program written in the Scala programming language that emulates a ping-pong game. The game consists of two actors, or lightweight processes. The sole function of each actor is to direct the "ball" back to the other actor. By breaking down each process into individual components, the orderly progress of the game is maintained.
Java messaging frameworks that support this approach include Kilim, Jetlang and ActorFoundry.
The third approach, which Bonér sketched out but didn't talk about in great detail, is dataflow concurrency. It is completely data-driven, and "threads are blocked until data is available," he said. Rather than having a process plug values into a function as it goes along, the process doesn't start until all the values have been supplied. Deadlocking and variable answers are not possible. "It is completely deterministic," he said.
Pervasive Software offers a commercial implementation of the concept in its DataRush framework.
Joab Jackson is the senior technology editor for Government Computer News.