Wu Feng | New frontiers in high-end computing
- By Joab Jackson
- Feb 14, 2006
Dr. Wu Feng
Dr. Wu Feng is one of few government employees whose work has been cited in the book of Guinness World Records. In 2003, Wu and his team at Los Alamos National Laboratory engineered the highest-ever throughput for a sustained long-distance network connection'2.38 Gbps between Los Alamos and Switzerland (the record was broken the following year).
Wu recently left Los Alamos to join the Computer Science Department at Virginia Polytechnic Institute, where he will help manage the school's new Center for High-End Computing Systems and head up his own lab.GCN: How did you first get involved in high-performance computing?
Wu: My doctoral dissertation was on real-time networking. I specialized in the time-sensitive delivery of information. Back in the early 1990s, such timely delivery over the shared Ethernet was not even pseudo-real-time. If you reached 40 percent utilization, the network was pretty much crippled.
In general, for 'soft,' real-time networking such as audio and video delivery, 'as fast as possible' often sufficed. However, military systems tended to [operate in] what is 'hard' real-time. They are hard in the sense that if something doesn't happen by a given time T, then something bad will happen.
Anyway, I came to the realization in the mid-to-late 1990s that if I shifted my research focus to high-performance networking, and in turn high-performance computing, that I could make broader contributions to society.GCN: What networking problems did Los Alamos face when you were hired?
Wu: We were looking at mechanisms and policies for high-performance networking. They had expertise in the hardware. What they needed was the expertise in the low-level systems software, specifically to overcome the bottleneck at the host interface. You could put in the fastest network you want, but if you can't get the information through the host interface and into the machine, then it doesn't matter how fast your network is.
In the early '90s, the bottleneck was the operating system. You can view the OS as the middleman. For example, when you type in a Google search, you are not directly putting that information onto the network; the OS is doing it on your behalf. So how do you dramatically improve performance when you have a middleman? Get rid of the middleman.
So we came up with an OS-bypass protocol. Now, when an application wants to make a high-speed connection between two end points, the network card's processor knows where it is allowed to write. This frees the OS from having to deal with network activities.GCN: What are the goals for your new SyNeRGy Lab at Virginia Tech?
Wu: I started out with networking, but over time, I've had a lot of offshoot computer projects that were more about enabling science, engineering and even liberal arts. For example, I have been working on things like low-power and power-aware computing, self-adapting software, autonomous computing and even bioinformatics.
Rather than have technology be an impediment for a music student or an arts student or an aerospace engineer, we want to make it much easier for people to use computers to get things done.GCN: It sounds like you're moving into the field of human-computer interaction.
Wu: In some sense it is, but I'm not going all the way up to that level yet. There are a lot of things that the applications have to worry about that they ought not to worry about, stuff the system software should do automatically. For example, we see large-scale supercomputers with mean time between failures measured in hours. It is projected that by the end of the decade, we'll be talking more about the number of failures per minute on large-scale systems. Currently, the burden of having to deal with these failures falls on the applications rather than on autonomous systems software.
One place where the burden of having to deal with failures is alleviated from the application'and hence, the end user'is Google. Google has tens of thousands of processors in its search-engine farm. Even with failures occurring almost every hour, Google is always available. A user doesn't have to be aware of the fact that some part of the Google search-engine farm has failed since the Google software automatically accounts for it. ...GCN: So you could see this sort of automation being applied to more difficult tasks?
Wu: From the literature that I'm aware of, there is no system software on large-scale scientific computing platforms that automatically does fault tolerance and provides 100 percent availability in light of unreliable components. But it could happen by the end of this decade.
Anyway, a lot of my research is learning how to automate things in the right way, particularly between systems software and applications software. Hence, the name of my laboratory'the SyNeRGy Laboratory. SyNeRGy [stands for] Systems, Network and Renaissance Grokking.GCN: Grokking?
Wu: When you 'grok' something, you have a fundamental understanding or insight of that thing. 'Renaissance' is borrowed from the name of an institute led by my esteemed colleague, Professor Dan Reed'the Renaissance Computing Institute [a joint venture between Duke University, North Carolina State University and the University of North Carolina at Chapel Hill]. He wanted to evoke a vision of rebirth in computing, about how to use the computer to facilitate a number of endeavors that otherwise would not be well facilitated. ...
When you approach computing this way, you help to raise the level of sophistication and innovation in computer science and engineering work. We as a country have the know-how and knowledge to create and innovate. Of course, in order for that innovation to continue, federal, state and commercial entities have to keep funding research. Relative to the number of people, the funding pie has been getting smaller.GCN: So you've felt the crunch?
Wu: Oh yes. Over the past four years or so, there have been many talented scientists and engineers who go overseas to work, or stay overseas to work, mainly due to funding issues.
We understand that we need to balance the budget and to funnel money to more short-term pursuits, but are we mortgaging our future for the present? When you don't put money into long-term research, the pipeline of ideas will just dry up.
Joab Jackson is the senior technology editor for Government Computer News.