Data center balances storage and networking of five universities
- By Carolyn Duffy Marsan
- Mar 26, 2015
Many government data centers generate data transactions at a rate that calls for using a blend of commercially available storage media, getting the right mixture of flash, disk and tape-based storage to meet the demands of citizen customers nationwide.
For some organizations, that often means contending with the need to also move data stored in different storage formats between one or more locations across the country -- or across town.
The Massachusetts Green High Performance Computing Center (MGHPCC), is one such organization, a joint venture of five Massachusetts universities working on government-funded research projects such as climate change modeling, genome sequencing and security analysis. The Center takes advantage of the entire menu of commercial storage offerings.
The MGHPCC uses flash for temporary storage when its supercomputer is working on a problem. “Flash gets you a lower latency,” said John Goodhue, executive director of the MGHPCC.
“When you’re working with a dataset where you need to grab a large number of very small chunks of data out of a very big data set, flash is very good choice,” Goodhue said. “As the size of flash drives goes up and the cost goes down, the affordability of flash is improving over time.”
Even so, Goodhue said, sometimes regular disk drives are better than flash.
“You need to think hard about the cost of flash and where it is really going to benefit you because it is very problem-dependent,” Goodhue said. “Flash isn’t going to lift all boats, but it is going to lift a lot of them. Make sure that the speed really matters. Often, disk is just as good and is less expensive.”
The MGHPCC has also adopted a cloud-based approach for the many petabytes of scientific data it stores in a two-year-old facility in Holyoke, Mass.
MGHPCC has several tiers of storage. Scratch, for short-term data, provides 10 terabytes of temporary storage for computers that are working on a problem. High-performance parallel file systems store petabytes of data after it has been processed and network-attached storage systems handle the most critical files, including home directories.
Goodhue emphasized the need to make sure the network is powerful enough to support cloud storage. “We use a cloud strategy for our storage,” Goodhue said. “That’s why we place a huge emphasis on high bandwidth and very efficient networking.”
“What you’re seeing in a facility like ours is a large amount of data stored right next to the compute resources,” he said. “Instead of moving the data to the scientist’s computer, we’re moving the compute to where the large dataset is stored.”
Goodhue said network speed is critical given that MGHPCC is located in Western Massachusetts while the researchers it supports are in Boston and other parts of the state. To work these digital distances, MGHPCC has 10G links to its university partners and plans to upgrade to 100G links.
“We pride ourselves on looking like a local resource to our users,” Goodhue said. “It’s important from a networking and storage management point of view that it is very fast and very easy to move data from a workstation in Harvard out to Holyoke and back.”
Goodhue pointed out that having a high-speed network connection doesn’t necessarily mean that data will transfer at a fast rate. He recommends data center operators consider the network protocols that they use, too.
“There are protocols that are good at moving data over high-bandwidth links, and protocols that are not good at that,” Goodhue said. “We’ve had several instances where we had to rethink how we connected to locations to keep data in sync between two storage pools because the protocols that we had been using were either too sensitive to latency or very sensitive to high-bit errors.”
Previous: Balance tape and flash in layered storage mix
Next: Social Security Administration looks for a new storage approach
Carolyn Duffy Marsan is a writer based in Milwaukee, Wisc., covering enterprise technology.