Unix veteran keeps an eye on big data

John R. Mashey, chief scientist for graphics supercomputer maker SGI, describes himself as 'an ancient Unix person.' He started working on Unix at Bell Labs in 1973.

A decade later he moved to Convergent Technologies Inc. of San Jose, Calif., and in 1985 he joined Mips Technologies Inc., which SGI later acquired. As a troubleshooter and evangelist for SGI, he deals with engineering and marketing personnel and with customers in areas that cross divisions and disciplines.

Mashey has managed projects in technical and commercial computing, worked on the design of the Mips RISC architecture, and chaired technical conferences on operating systems and CPUs. As one of the founders of the System Performance Evaluation Cooperative benchmarking group, he has given more than 500 public talks on software engineering, RISC design, performance benchmarking and supercomputing.

He has a bachelor's degree in mathematics and a master's and doctorate in computer science from Pennsylvania State University.

GCN chief technology editor Susan M. Menke interviewed Mashey by telephone from his Mountain View, Calif., office.

GCN: What's happening in SGI's recent move to open-source its Irix operating system?

MASHEY: The response has been very good. SGI has a lot of technology that we would like to get to a broader audience, but it's been difficult because of the structure of the industry. The current round of open sourcing, particularly with Linux, has given us an avenue to get things out.

We've done this before in the OpenGL initiative. It's been very good for the industry and has made possible a lot of programming that would not have happened any other way. We have a bunch of other open-source initiatives that are being well received. The most recent was to make the XFS high-performance journal file system available.

We have not thrown Irix itself into open-source. We're taking selected pieces that make sense into the open-source world by the Linux route. Nobody with any sense tries to impose anything on the open-source community'that's like herding cats. What you do is encourage other software to be written to certain interfaces. If you have good technology, it will be of interest.

Lots of our other software will stay proprietary because it's specific to certain hardware we have.

GCN: Do you view the evolution of open-source software and Linux as a help or a hindrance for SGI?

MASHEY: I'm an old Unix guy. In 1973, there were a grand total of 20 Unix machines in the world. To get a new release, you drove to Murray Hill, N.J., and got onto Ken Thompson's machine and made a tape of anything that looked new and interesting. That was in the good old days.

Linux to me is the third round of a particular kind of behavior. In the first round in the 1970s, Bell Labs pretty much gave Unix away to the universities. A lot of smart people did good things in Unix. The second round was in the early 1980s with Berkeley Unix, a big part of it government-related. That died off after a while.

Linux is the current, hot round. There's room for many more smart people to contribute good software. It's like the early days when Unix started to get into the commercial world. People would compete, but the fact that more than one company was doing it lent credibility. It's one of the reasons you see as much Unix in the government as you do. Different versions of Unix and portable software have been available, but there was room for competition.

GCN: How are your new graphics workstations for Microsoft Windows NT doing?

MASHEY: There's a strong bifurcation of customer interest. Some people really want NT because they need the environment or the applications. Others say, 'No, we don't like Microsoft; we want something else.' The workstations have done pretty well and have gotten a spectacular number of best new product awards.

They represent something we haven't done much before, which is to incorporate our own special brand of magic into a standard environment.

Personally, I use an SGI O2 on my desk'we still believe in Irix around here'and at home I have an Indy with Irix. And I have a 320 visual workstation with one of the nice flat-panel displays. When I work in Microsoft Excel and Word and PowerPoint, I use that. A lot of our customers' mechanical and computer-aided design applications have gone into that area, as have digital content creation and other markets where people want a workstation with PC software or a PC that has SGI magic in it.

The Unix side and the NT sometimes overlap, but often they don't. We're quite happy to be doing both.

GCN: Do you see your NT side and your Unix side continuing to march together?

MASHEY: Yes, I think so, for a good while. We recently announced several new generations of Mips CPUs because there are things that Irix on Mips does that nothing else in the world can do, period. There aren't a lot of 128-processor systems out there, except from us.

We've been 64-bit for a long time. We deal with very large data sets. Our government customers need ultra-high-speed networking. There are government customers who want multigigabyte-per-second interfaces between machines. They ask for tests such as reading 7 gigabytes per second from one file. Put in context, the highest-end Sun Microsystems Inc. machine has half that total input/output bandwidth. You couldn't think of doing 7 gigabytes per second from one file.

Our Mips Irix systems have been tuned for years for high-end visual simulation and image processing. The customers don't want us to change that too fast. They like the idea of the cheaper Intel IA-64 architecture, but they are going to need Mips systems for years. On the other hand, there are customers who don't need them. Between NT, Linux and Irix, we find surprisingly small overlap in what the customers want.

GCN: What is your theory of infrastress?

MASHEY: I get asked a couple of times a week to give my talk called 'Big Data and the Next Wave of Infrastress.' That's stress on the infrastructure of computing. It's what happens when technologies move at different speeds and put stress on the parts that aren't moving so fast.

Disk storage capacity is growing faster than it ever has in the history of computing. By 2003, my guess is that we'll be in the 200G to 500G range. You can put a couple of those disks in a workstation and have a terabyte.

Government customers are unhappy right now that we can't put more than about 100T in a single disk farm. They want petabytes. Whether it's on one machine or spread over a couple of machines, they want it all easily accessible for satellite images, intelligence and economics data.

In the history of disk drives, I claim there's a rule: Disks are either brand new or they're full. What fills them up? Take a look at an average user's PC. There are huge amounts of online help, images, templates for documents, tons of clip art and photos. It burns up space real fast.

What you hope is that the disks get bigger so that you don't have to do the housekeeping. People do something now that was insane a few years ago, which is to keep every e-mail message. They never know when they might need them. If disk space gets cheaper and cheaper, it changes the trade-offs you make. Your time is worth more than buying another disk.

People now expect near-instant access to data all over the world because of the Internet. Caught in the middle are the bandwidths of the computers and the networks. People get angry when they click and it's slow.

When you click on a Web page, it might go halfway around the world and talk to hordes of databases and machines. You don't care. If it's slow, you're mad. The stress falls on the data management in between and the ability to transfer data at a reasonable speed and the ability of software to manage huge numbers of files.

In the next couple of years, the explosion of disk capacity at low price is going to cause a lot of trouble for systems administrators who have to worry about disk backup and about interpreting huge amounts of data.

SGI builds systems that can paw through huge amounts of data and make sense of it. We're used to it. Our government customers have pushed us pretty hard.

GCN: Do you have any year 2000 predictions?

MASHEY: The odd thing about infrastress is that when you plot the stress, the trends peak around 2001. It's just an extra problem that happens around the same time as Y2K.'' I'd put Y2K in the pecked-to-death-by-ducks category. It's not a knife in the back, it won't kill you, but everyone will endure some irritation. The survivalists are off the deep end.

What's more

  • Last book read: To Say Nothing of the Dog by Connie Willis

  • Leisure activities: Hiking, biking, jogging, travel, fine arts and reading

  • Worst job: Clerk in a department store

  • Best job: 'This one. I get to stick my nose into almost anything and spend a lot of time with the most interesting customers in the world, who are trying to do almost impossible but important things'and all without having to carry a pager.'

