The life & times of data

The Market: ILM vendors

EMC Corp., Hopkinton, Mass. EMC has centered its software development and marketing on information lifecycle management, tweaking 30 products to this approach. The company has just released Celerra FileMover as a feature for its network attached storage products. Administrators can use FileMover to schedule when data is to be moved to NAS devices. The company has also just introduced a number of enhancements for its EMC Symmetrix DMX Series of networked storage systems, including the ability to mirror data across numerous sites and the ability to toggle between real-time and near-real-time backup of data.

Hewlett-Packard Co., Palo Alto, Calif. HP has a wide range of ILM-based products. Earlier this year, the company introduced its Reference Information Storage System, a storage system based on a building-block architecture of individual storage nodes, each with a dedicated processor, search engine and management tools.

IBM Corp., Armonk, N.Y. Big Blue offers a number of products for data management, including programs for message monitoring and archiving of DB2-stored messages. IBM's Tivoli Storage Manager offers fine-grained policy control over which files are backed up.

Network Appliance Inc., Sunnyvale, Calif. NetApp views the emergence of ILM as a perfect fit for its NearStore storage appliance line, competitively priced backup units that allow files to be accessed by end-users. The company's Data Fabric Manager software, working in conjunction with NetApp's storage operating system Data OnTap, provides the application interfaces for software vendors and organizations to craft custom data management programs. The company itself offers an ILM package.

Overtone Software Inc., Bethesda, Md. Overtone's ManageTone Lifecycle Suite software can move data, residing on either Microsoft Windows-based or Unix systems, to storage media, using such criteria as creation date and the type of content or regulatory requirements. The software can store data on a secondary storage application from Network Appliance and keep that data visible to users.

Sand Technology, Boston. Sand has added information lifecycle management capabilities to its data warehouse analysis software. The software allows administrators to compress infrequently used data into read-only files that can be stored on cheaper storage mechanisms, while still remaining available for analysis.

Veritas Software Corp., Mountain View, Calif. Earlier this year, Veritas acquired KVault Software Ltd. of Berkshire, U.K. Using policies set by the administrator, KVault's software indexes and archives data held by Microsoft Exchange and other office-oriented applications and can work with SAN, NAS and other storage architectures. It will eventually replace Veritas' own Data Lifecycle Manager.

Greg Hilsenrath of Overtone says agencies often don't know the characteristics of their datasets.

Dan Gross

Tiered-storage approach looks at the long-term value of data and future needs for access to it

The Army Surface Deployment and Distribution Command faced an age-old problem: The amount of data it needed to keep was growing faster than the storage system designed to hold it. But instead of purchasing more expensive hardware, the command devised a tiered system for handling the data, wherein older information could be automatically offloaded to cheaper storage devices.

Tiered storage is not a new concept. Administrators have long moved older material over to tape drives in order to save space on more expensive disk drives. But a confluence of factors'cheaper storage technologies, more sophisticated software and an explosion of data'may be pushing agencies towards making more complex plans for saving data. Vendors call it information lifecycle management.

'A lot of storage software is going that way,' said Andrew Ferguson, who oversees the administrative storage system at the Energy Department's Brookhaven National Laboratory in Upton, N.Y. In an information lifecycle management system, an organization will establish a process for moving data from one storage medium to another, and then automate that process using ILM software.

'Information has value and that value changes over time,' said David Goulden, an executive vice president of customer operations of EMC Corp. of Hopkinton, Mass. The trick is to develop a strategy to balance the value of that information with the cost needed to manage it. That is where ILM software can help.

Analyzing data

Headquartered in Fort Eustis, Va., the Army Surface Deployment and Distribution Command coordinates the movement of personnel and equipment around the globe. When the conflict in Iraq flared up early last year, the command's ranks ballooned with reservists. As a result, the administrative storage space'7.5T of data split between Microsoft Windows-based file servers and Network Appliance Filers'bulged with files, from e-mails to PowerPoint presentations.

Rose DuBoise, a technical adviser to the command, was already adding new drives to the system every three months before increased ranks stressed the system even further. Prior to making yet more storage purchases, DuBoise took a step back and looked at ways of characterizing her data, or describing it based on a number of attributes. Using ManageTone 2004 Lifecycle Suite, from Overtone Software Inc., of Bethesda, Md., she characterized the command's data by size, age, access history and origin.

What she found was interesting: More than 60 percent of the data she was keeping had not been accessed in over 90 days.

With this in mind, DuBoise procured a NearStore R200 from Network Appliance Inc. of Sunnyvale, Calif. NearStore, which is a disk-based 'nearline' storage system, doesn't offer the same response speed as network attached storage units, but it costs considerably less. By configuring the Overtone software to periodically sort through data, DuBoise was able to direct older, infrequently accessed data to the NearStore unit, saving storage costs. The software left pointers to the files' new location in their original directories, so most users weren't even aware they had moved and they could open the files as they had before.

ILM versus HSM

Information lifecycle management may be a new buzzword, but the concept has been around for awhile.

Hierarchical storage management had similar goals, said Jeremy Burton, senior vice president and chief marketing officer of Veritas Software Corp. of Mountain View, Calif. With HSM, administrators could offload little-used data to tape units. The end users could still call up the data, although it might take a few seconds longer to come back to the desktop. That was an acceptable tradeoff for the savings in purchasing tape over more expensive disk-based solutions.

The difference between HSM and ILM is that HSM was a technical answer to a problem of managing data, whereas ILM takes a larger, intelligent view of the entire process. What data needs to be accessed quickly? Which data do you need to store, for legal reasons, but don't have much use for? The software allows for greater nuance in making decisions about how and where to store data, and a greater ability to automate movement of data across all media, not just tape.

'If we don't manage it somehow, it will be useless,' Burton said of most organizations' tremendous growth in data. But, he said, 'you don't want to micromanage files. What you save in disk space, you'll lose in labor costs.'

In addition, ILM software helps users get a handle on what type of data they have.

What type of data?

Greg Hilsenrath, vice president of business development for Overtone, said when he meets with customers, he often finds that they don't know the characteristics of their datasets. Is most of the storage space taken up by e-mail archives? By MP3 music files? ILM software can summarize what types of data reside on storage systems, allowing administrators to make better decisions of how to provision storage and what sorts of additional storage might be needed.

For the South Carolina Department of Transportation, ILM has reduced the amount of primary storage it needs and allowed the agency to set up an additional backup site for disaster recovery, according to Lee Foster, system manager for the agency.

At present all the agency's data'what Fosters calls the 'active set''resides in an 11T fiber-attached EMC Clariion CX600 storage array. It includes everything the agency has created, from e-mail to large design files of bridges.

Foster is in the process of configuring Veritas NetBackup software to separate the data into three different levels: that which will remain in the active set, old data that will move to an existing 160T tape-based Dell PowerVault, and data that hasn't been read or changed in 90 days. This mid-term data will be automatically offloaded onto a 4T serial ATA-based storage system, which the agency recently purchased. Users can still access data in this location, though not as quickly as data stored in the active set.

By developing these tiers it will 'reduce our active set tremendously,' Foster said.

Reducing the active set of data affords the agency some other benefits in addition to reducing its storage costs. A smaller data set means that the storage volume will be defragmented more easily, and backed up more quickly, Foster said. Foster expects that once the system is up and running he could do full replications on a weekly basis.

Reducing the size of the active set will also allow the agency to set up a near-line storage facility, about 100 miles from the main facility, for use in disaster recovery. The new facility will employ IP-based storage and RAID cabinets, with Veritas Storage Replicator software backing everything up from the agency's primary fiber-attached storage system. Using IP and disk-based storage for off-site disaster recovery means the DOT can be back online quicker in the event of a system disruption.

'We can have a lot faster access to our data than if it were stored on tape,' Foster said.
As continuity of operations becomes more important to agencies all kinds, ILM will play a prominent role.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected