The life & times of data
Tiered-storage approach looks at the long-term value of data and future needs for access to it
Greg Hilsenrath of Overtone says agencies often don't know the characteristics of their datasets.
The Army Surface Deployment and Distribution Command faced an age-old problem: The amount of data it needed to keep was growing faster than the storage system designed to hold it. But instead of purchasing more expensive hardware, the command devised a tiered system for handling the data, wherein older information could be automatically offloaded to cheaper storage devices.
Tiered storage is not a new concept. Administrators have long moved older material over to tape drives in order to save space on more expensive disk drives. But a confluence of factors'cheaper storage technologies, more sophisticated software and an explosion of data'may be pushing agencies towards making more complex plans for saving data. Vendors call it information lifecycle management.
'A lot of storage software is going that way,' said Andrew Ferguson, who oversees the administrative storage system at the Energy Department's Brookhaven National Laboratory in Upton, N.Y. In an information lifecycle management system, an organization will establish a process for moving data from one storage medium to another, and then automate that process using ILM software.
'Information has value and that value changes over time,' said David Goulden, an executive vice president of customer operations of EMC Corp. of Hopkinton, Mass. The trick is to develop a strategy to balance the value of that information with the cost needed to manage it. That is where ILM software can help.Analyzing data
Headquartered in Fort Eustis, Va., the Army Surface Deployment and Distribution Command coordinates the movement of personnel and equipment around the globe. When the conflict in Iraq flared up early last year, the command's ranks ballooned with reservists. As a result, the administrative storage space'7.5T of data split between Microsoft Windows-based file servers and Network Appliance Filers'bulged with files, from e-mails to PowerPoint presentations.
Rose DuBoise, a technical adviser to the command, was already adding new drives to the system every three months before increased ranks stressed the system even further. Prior to making yet more storage purchases, DuBoise took a step back and looked at ways of characterizing her data, or describing it based on a number of attributes. Using ManageTone 2004 Lifecycle Suite, from Overtone Software Inc., of Bethesda, Md., she characterized the command's data by size, age, access history and origin.
What she found was interesting: More than 60 percent of the data she was keeping had not been accessed in over 90 days.
With this in mind, DuBoise procured a NearStore R200 from Network Appliance Inc. of Sunnyvale, Calif. NearStore, which is a disk-based 'nearline' storage system, doesn't offer the same response speed as network attached storage units, but it costs considerably less. By configuring the Overtone software to periodically sort through data, DuBoise was able to direct older, infrequently accessed data to the NearStore unit, saving storage costs. The software left pointers to the files' new location in their original directories, so most users weren't even aware they had moved and they could open the files as they had before.ILM versus HSM
Information lifecycle management may be a new buzzword, but the concept has been around for awhile.
Hierarchical storage management had similar goals, said Jeremy Burton, senior vice president and chief marketing officer of Veritas Software Corp. of Mountain View, Calif. With HSM, administrators could offload little-used data to tape units. The end users could still call up the data, although it might take a few seconds longer to come back to the desktop. That was an acceptable tradeoff for the savings in purchasing tape over more expensive disk-based solutions.
The difference between HSM and ILM is that HSM was a technical answer to a problem of managing data, whereas ILM takes a larger, intelligent view of the entire process. What data needs to be accessed quickly? Which data do you need to store, for legal reasons, but don't have much use for? The software allows for greater nuance in making decisions about how and where to store data, and a greater ability to automate movement of data across all media, not just tape.
'If we don't manage it somehow, it will be useless,' Burton said of most organizations' tremendous growth in data. But, he said, 'you don't want to micromanage files. What you save in disk space, you'll lose in labor costs.'
In addition, ILM software helps users get a handle on what type of data they have.What type of data?
Greg Hilsenrath, vice president of business development for Overtone, said when he meets with customers, he often finds that they don't know the characteristics of their datasets. Is most of the storage space taken up by e-mail archives? By MP3 music files? ILM software can summarize what types of data reside on storage systems, allowing administrators to make better decisions of how to provision storage and what sorts of additional storage might be needed.
For the South Carolina Department of Transportation, ILM has reduced the amount of primary storage it needs and allowed the agency to set up an additional backup site for disaster recovery, according to Lee Foster, system manager for the agency.
At present all the agency's data'what Fosters calls the 'active set''resides in an 11T fiber-attached EMC Clariion CX600 storage array. It includes everything the agency has created, from e-mail to large design files of bridges.
Foster is in the process of configuring Veritas NetBackup software to separate the data into three different levels: that which will remain in the active set, old data that will move to an existing 160T tape-based Dell PowerVault, and data that hasn't been read or changed in 90 days. This mid-term data will be automatically offloaded onto a 4T serial ATA-based storage system, which the agency recently purchased. Users can still access data in this location, though not as quickly as data stored in the active set.
By developing these tiers it will 'reduce our active set tremendously,' Foster said.
Reducing the active set of data affords the agency some other benefits in addition to reducing its storage costs. A smaller data set means that the storage volume will be defragmented more easily, and backed up more quickly, Foster said. Foster expects that once the system is up and running he could do full replications on a weekly basis.
Reducing the size of the active set will also allow the agency to set up a near-line storage facility, about 100 miles from the main facility, for use in disaster recovery. The new facility will employ IP-based storage and RAID cabinets, with Veritas Storage Replicator software backing everything up from the agency's primary fiber-attached storage system. Using IP and disk-based storage for off-site disaster recovery means the DOT can be back online quicker in the event of a system disruption.
'We can have a lot faster access to our data than if it were stored on tape,' Foster said.
As continuity of operations becomes more important to agencies all kinds, ILM will play a prominent role.