Eng Lim Goh | SGI's memory of the future

Interview with Eng Lim Goh, Silicon Graphics Inc. chief technology officer

Eng Lim Goh, SGI CTO

Courtesy of SGI inc.

Last spring Eng Lim Goh, chief technical officer at Silicon Graphics Inc. of Mountain View, Calif., surprised many at the annual High Performance Computing and Communications Conference when he focused on issues of computer memory (of all things) during his talk. The conference, held every year at Newport, R.I., usually doesn't devote a lot of bandwidth to working memory. But SGI believes better use of memory could open the door to new possibilities. The company's NUMAlink architecture allows multiple computers to join together to create one giant memory block'up to 128 terabytes' worth. And despite SGI filing for bankruptcy last May, it continues to push the envelope in memory utilization. GCN sat down with Goh to find out more.

GCN: So, SGI has been focusing its efforts on memory management of late.

Goh: It stems from us seeing a rising concern from our government customers with the deluge of data they are getting. For various reasons, they are not able to exploit that data effectively. So what we started on is how we could leverage our current architecture to accelerate knowledge discovery.

What we did was tinker with the idea of putting an entire database in memory. NUMAlink allows multiple nodes to be tied tightly together, so that all the memory pieces are seen as one. Once the processors can see all the memory across all nodes as a single memory, then they can load a large database entirely into that memory. So a complex query that would normally take seconds to return a response'because the disk query takes some time to scan the database'could be returned in under a second.
When we went out with the idea, we got enthusiastic responses. We heard how it could fundamentally change the discovery process. When you ask questions with complex queries, you sit and wait for a response. It breaks the thinking process, because you might want to converge on an idea by quickly firing off questions and getting quick responses. You want to have a conversation with the data.

GCN: What is largest block of memory you've shipped so far? What could all this aggregated memory be used for?

Goh: We have unclassified examples of multiterabyte systems. Thirteen terabytes is the largest we've shipped.

In the commercial world, NUMAlink could be used for telephone directories. If you pick up a cell phone, dial the number and press send, sometimes it takes a long time before the other end rings. That is because the system tries to search for the callee number on disk. So we are looking at a scheme where we load these terabyte-sized phone books in memory so the call-connection would become instantaneous.

GCN: Running terabytes of memory sounds like it can get expensive. What is SGI doing to help contain costs?

Goh: First, we don't use special memory. We wanted to use cheap PC memory. Although we like the cheap PC memory prices, our customers can't tolerate the cheap PC reliability. If you put two gigabytes of memory together, it will be fairly reliable, but if you put 1,000 times more memory together, the chance of failure is higher. So we put reliability features into our chip sets.

The second way we reduce cost is very fundamental. With most systems vendors, you have to buy expensive systems just to get the memory you want. That is a problem, because these database applications don't really need that many processors. So we came up with a concept of CPU-less nodes. Since our chip sets already have the intelligence to deal with the gluing together of the memory, we can sell nodes that have no CPUs in them. So you just buy a board. Once you plug one in, all the chip sets orchestrate among themselves to recognize all the extra memory, without the need for processors.

We have examples of systems with 32 processors and 4TB of memory. There's no way you can achieve that without memory-only nodes.

GCN: Many high-performance computing companies, such as SGI, seem to be setting their sights on the enterprise market. Do you see the two fields, HPC and enterprise computing, blurring?

Goh: They are starting to. The common problem they now share is data. In the past, data wasn't so much an issue, because the HPC world was more focused on improving simulations. It needed better algorithms and better ways to improve the accuracy and predictability of their simulations. But as they came up with these improvements, the results started to get very difficult to exploit, because they were so huge. It started to become a data problem rather than a simulation problem.

Enterprises have a purely data problem. Organizations collect huge amounts of subscriber data and operational data, and they want to improve operations by doing predictive analysis. Because these two are having the same data problem, the lines between them are starting to blur.

GCN: Recently you talked about how SGI is developing chip sets that can work with both commercial processors [called scalar processors] and special vector processors on one board. Why would your customers need vector processors?

Goh: If you have a scalar processor [tapping into] multiterabytes of memory, it is very efficient for the processor to pull in contiguous data. By contiguous, I mean cache lines. Most accesses are sequential, therefore pulling blocks of data is more efficient than pulling one word at a time. Our cache line length is 128 bytes, so this is 16 words of data. So if you need any one word, the system pulls the [surrounding words]. Hopefully, we will reuse the cache line and get more bang for the buck.

However there are some unique cases in database queries where you don't need contiguous sequential data. You need one word of data here, another word there, another word there. In this case, we are consuming 16 times more bandwidth than we actually need. Instead of pulling three words, you are pulling 48 words. In fact, some of our customers are complaining that caches are becoming a nuisance rather than being a help.

Our idea is to put a vector unit in the chip set. The vector unit knows how to pull the [individual] words and collects them into one cache, and presents the packed data to the processor. So you are not restricted by the scalar pattern of the access.

GCN: How will SGI's customer relations and research and development be affected by the bankruptcy?

Goh: We have visited customers in the U.S., Canada, Europe, Japan, China and other parts of Asia. Customers [now] have a better understanding of the financial re-engineering process that we have launched. They now know we are going to be around for some time. Their next concerns were ensuring delivery, support and our R&D road map commitment. None of these areas are impacted.

About the Author

Joab Jackson is the senior technology editor for Government Computer News.

inside gcn

  • artificial intelligence (ktsdesign/Shutterstock.com)

    Machine learning with limited data

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group