Balance tape and flash in layered storage mix
- By Carolyn Duffy Marsan
- Mar 25, 2015
For agency data center managers looking to improve the economy of their data centers, the best options for cost savings lie in how their storage mix is managed.
Thanks to new virtualization techniques and emerging flash storage technologies, data center operators now have ways to control how they deploy storage systems to speed backups and reduce costs.
“Data centers [operators] should be introspective and look at how their storage is being accessed and find ways to customize storage solutions for each different workload,” said Jason Hick, head of the Storage System Group at the Energy Department’s National Energy Research Scientific Computing Center (NERSC).
“There are a variety of emerging storage technologies. Using all of them for what they do best is key,’’ Hick added. “You need to understand flash, disk and tape and come up with the best mix based on your internal operations. It pays huge dividends for us when we balance our workloads well.’’
GCN talked to operators of several cutting-edge government data centers to find out which new storage technologies they are deploying and what best practices they are using to keep their costs steady and their performance escalating. This is the first of a series.
Flash memory should always be considered as a key part of an agency’s storage repertoire, said Hick, provided it is deployed for the right circumstances and requirements.
Flash memory is more expensive than disk drives or tape, but it can be a good choice for performance-intensive applications because of its speed.
The DOE’s NERSC in Oakland, Calif., deployed flash for its file system metadata in August. Hick said flash is working well in this key application, which affects all 6,000 of the center’s users for services such as logging in to the supercomputer.
“Backing up our critical file system was taking 12 hours. Users noticed because the file system became slow and unusable,” Hick explained. “After deploying flash, the backups are down to three hours … I’m not sure we have any complaints now. I don’t think the users even know we are backing it up.”
Hick encouraged data center operators to conduct a careful analysis of flash memory and determine the trade-offs for each application.
“There are certain cases where it makes a lot of sense to substitute flash even at an increased cost because the performance benefits are there,” Hick said. “If you can get four-times benefit in performance and reduce user complaints down to zero, I would say it’s worth it.”
NERSC is so happy with how flash is working with its file system metadata that it plans to have a layer of flash technology built inside of its next supercomputer.
“The flash will be on [an] interconnect inside the supercomputer to store data for the duration of a simulation,” said Katie Antypas, deputy for data science at NERSC. “This will be another layer of storage that our users will have access to.”
Antypas explained the Center’s different storage tiers: “ Now we have scratch, project and archive,” she said. “Scratch data we keep for up to 12 weeks. Project data we keep for a couple years, and our archive goes back 40 years. Flash will store data for hours or days. The trade-off is that it offers really high bandwidth.”
Don't be afraid of tape
NERSC also has a total of 72 petabytes of data stored on tape systems, some for long-term archival purposes and some to support ongoing projects. Although it is an older technology, tape is cost-effective, Hick said.
“Tape is often reported to be dead or about to die,” Hick said. “One of our newest users, the Joint Genome Institute, didn’t use tape at all, only disk storage. They were struggling with how to store all of their data, and their budget was out of control. Yet they were very skeptical about why we would use tape. We have a lot of experience with tape, and we taught them about it.
“Tape is not all great,” said Hick. “But in the end it solved their data growth and budget problem for storage.’’
Hick said tape offers significant cost and capacity advantages over disk systems and is a viable solution for government data centers that are not keeping an archive because they think it is too expensive.
“I talk to a lot of government sites that don’t have an archive. They are in compliance for email, but beyond that they don’t understand the value of retaining data,” Hick said.
Next: How Massachusetts’s Green High Performance Computer Center balances storage and bandwidth to meet research demands.
Editor's note: This article was changed March 30 to correct the acronym and location of NERSC.
Carolyn Duffy Marsan is a writer based in Milwaukee, Wisc., covering enterprise technology.