Garth Gibson | Faster storage systems through parallelism
- By Joab Jackson
- Jul 25, 2008
Garth Gibson has devoted a career to looking at ways to use parallelism for faster and more reliable storage. As a graduate student at University of California, Gibson was one of the inventors of the now-ubiquitous Redundant Arrays of Inexpensive Disks (RAID), a technique for spreading data over multiple disks so that if one disk failed, the data could be recovered from other disks. More recently, he has been driving development of the Parallel Network File System protocol (pNFS), an effort to standardize parallel operations. Gibson is on the faculty of Carnegie Mellon University's computer science department, and is cofounder and chief technical officer of Panasas, a company that offers clustered storage systems. He is also principal investigator at the Petascale Storage Institute.
Did you expect RAID to become as widely used as it has been?
Garth Gibson: Oh no. It started in 1986. I was a processor architect working on shared memory multiprocessors. We were projecting that CPUs would eventually get so fast that they really wouldn't matter at all. If we can't feed them data, the bottleneck would be in permanent storage.GCN:
So RAID addressed performance issues rather than reliability ones?Gibson:
Absolutely. I've always been primarily focused on how to process huge amounts of data faster. We were doing multicore parallel computing, so we already knew the answer to all performance issues was parallelism. So it was obvious to us to take multiple disks and stripe the data.
People would say, 'If you replace one disk with 100 disks, then you would have 100 disks that could fail, and moreover these disks must be poor quality because they must be much less expensive. So you guys have a reliability problem.' Trying to figure that out, we described the RAID taxonomy ' Raid levels 0, 1, 2, 3, 4, 5 ' so we could understand how we could do the performance and reliability issues in one concise, clearly understood way.
I thought it was too esoteric for the world. But a bunch of people in the database community read the paper and asked vendors for something like RAID. So by 1994, we had a really fairly healthy industry.GCN:
Your work on pNFS is about to be included in Version 4.1 of NFS. How is this different from NFS?Gibson:
A plain old NFS server is on one point of the network. So when you mount a file system on your client, you give it one [network address]. So the [vendors] ' including Panasas ' found various ways to stripe files across many servers across the network. What all these proprietary parallel file systems did was talk to multiple servers at the same time. And almost all of them did this by putting software on the client machine that would know how to talk to different back-end servers. NFS couldn't do this, because it had the concept of having one link into the server.
So around 2003, my biggest customer, Los Alamos National Laboratory, said they would really like [is for the technology to be available] no matter what happens to any one company. So [the answer we came up with] is to standardize the technology so many companies could provide it. We proposed the key central idea [underpinning] parallel file systems be put into standard NFS. So the NFS 4.1 working group started looking at how to use the [Panasas' PanFS parallel file system] for NFS.
Right off the bat, storage vendors saw the value. The core team that is designing pNFS are [engineers] from NetApp, Panasas, IBM, EMC, and Sun Microsystems. That standard has been bubbling and boiling since mid-2004, and we have passed last call within the working group. When we're done crossing t's and dotting i's, we will forward that to the Internet Engineering Task Force's architecture committee. When it passes, it will be the first openstandard, industry-backed parallel file system.GCN:
How will Panasas work with pNFS?Gibson:
We will have products as soon as we can, and we intend to have the best parallel NFS around.GCN:
Recently Panasas introduced something called tiered parity. What is that?Gibson:
Drives have got a lot more dense in the past 10 years ' there is a lot of data on them. It is not the catastrophic failure of the whole disk that is now the principal issue, the principal issue has become the failure of the individual sector. Unrecoverable read errors. This was always the case, but it's been a fairly rare occurrence, maybe one read in 100 terabytes read would incur an error. And that was a lot of data 20 years ago.
The problem we have now is that disks are getting up to a terabyte in size. And you may have 10 of them in a RAID. When a disk fails, you need to reconstruct its contents and read all the surviving data. If you read 10 terabytes, then the probability of losing a sector starts to become real. If you run into one sector you can't read, the reconstruction fails. That's what's happening.
What we at Panasas did is focus on media errors. We added a family of code on each disk that will catch the media errors and fix them before they get up to reconstruction. This is part of our Release 3.2 of the ActiveScale clustered file system, which is in beta.GCN:
What is happening at the Petascale Storage Institute?Gibson:
The Petascale Storage Institute is an Energy Department-funded activity to anticipate the problems that the largest installations will experience as they go from terascale- to petascaleto exascale[-sized systems]. Our job is to anticipate and help solve those problems.
We're gathering a lot of information about how real installations experience failures, how their file systems work, what sorts of distributions of files they have them on them, and we're projecting that forward.
Generally, scale is hard. When you have to do 10,000 things together, it is much harder [to manage] than when you have to do 10 things together. The rate of increase of speed in supercomputers is doubling every year. You need twice as much memory and disk space every year.
One of the keys issues that have come forward is manageability, the total cost of ownership for scale. If the [storage pool] grows twice as wide, the operators do not want to hire twice as many people. They want the same number of people or maybe even less. What you need is to have is abstractions of systems, ways to display what 10,000 things are doing that you can glance at and know what is going on.
So Panasas has gone pretty far down this path of giving a user interface that aggregates everything. And within the research world, we're trying to do more self-tuning and self-monitoring, looking at ideas of autonomic computing for storage.
Joab Jackson is the senior technology editor for Government Computer News.