Everybody into the pool!
Virtualization software promises better utilization of storage assets and cost savings
The city of Richmond, Va., is doubling its storage capacity without buying any new arrays. How are officials performing this magic? Through storage virtualization software, which lets them make better use of the storage they already have.
'With the virtualization, you can have one single, large pool of storage. We have a single point of control for all our storage and servers,' said Lyle Gleason, a Richmond senior engineer.
As an organization's storage resources grow increasingly unwieldy, its IT administration staff must look for ways to simplify management, not to mention save money, by making better use of what they have. Storage virtualization products can help by acting as a single gateway through which all the resources can be managed and accessed.
'Just on asset utilization alone, you can justify purchase of virtualization equipment,' said Rob Peglar, a member of the Storage Networking Industry Association and vice president of technology marketing for storage area network vendor Xiotech of Wildwood, Mo. 'If you are spinning 300 terabytes and using 150, your chief financial officer is not a happy guy.' Peglar spoke at the Storage World Conference held in Long Beach, Calif., earlier this year.
Although storage virtualization has been around for the past decade, 'the last two or three years, we've seen some real changes,' said Jeff Hornung, vice president and general manager of enterprise files services and storage networking for Network Appliance Inc. of Sunnyvale, Calif.
Once the domain of start-up companies, virtualization software is now offered by most of the established storage players, such as EMC Corp. of Hopkinton, Mass., Hewlett-Packard Co. and Hitachi Ltd. of Tokyo.Virtualization now for the future
Richmond's data center runs more than 120 servers, both for internal use and to provide information and services to the public. The center is supported by a combined 2T of storage space, spread out over four SANs as well as the hard drives of the servers themselves, according to Libby Mounts, Richmond's director of information technology.
By mid-2004, Richmond was finding storage increasingly troublesome. On average, only 40 percent of available storage was being used, yet individual SANs and hard drives were packed to capacity, incurring the downtime and additional cost of upgrades. Plus, the IT office found mundane storage tasks occupying an ever-growing share of the staff's time.
When Mounts started looking for software to ease management issues, storage virtualization software came immediately to mind. James Smith and Ravi Nair's recently published book, Virtual Machines (Morgan Kaufmann Publishers, San Francisco, 2005), defines virtualization as the act of taking the interface and resources of one system and mapping them onto the interfaces of another. 'The real system is transformed so that it appears to be a different, virtual system,' they write.
Storage virtualization software sits between the servers that need storage and the storage resources themselves.
'A virtualization system looks like an array to the server, but it looks like a big server to back-end arrays. Virtualization translates between the two,' said Robert Sadowski, product marketing manager for EMC. In other words, the server talks to the virtualization engine rather than to the SAN itself.
Instead of partitioning each array separately, the administrator carves up all the storage from a central (usually Web) console. This approach can save space. The Storage Networking Industry Association estimates that most organizations use only 30 to 50 percent of their storage. Each server has a hard drive, which may only hold the data for the applications it runs. A SAN may be dedicated only to a particular set of applications.
By offering a single console through which different types of storage systems can be provisioned, virtualization software can ease management burdens. This centralized approach also allows for more efficient management of backup, mirroring and other advanced data management techniques.
Although the concept of storage virtualization may seem simple, vendors offer a number of different approaches, Peglar said. Virtualization can happen in one of three places in the network'at the host, array or switch level. At the host level, virtualization software is installed on the servers themselves. Or a storage array may aggregate its own resources.
Much of the virtualization discussion these days, however, revolves around virtualization at the network, or switch, layer. In this case, virtualization software acts as an intermediary between servers and the storage resources. Network-level virtualization happens in either of two ways. One is the in-band approach, in which all requests to the storage resources flow directly through the virtualization software, usually installed on a switch. The out-of-band approach involves a virtualization appliance, which sits as a node on the network. The server gets the network path to the requested storage resource from the appliance.
There are advantages and disadvantages to each approach, Peglar said. In-band does not require software on each server because the switch is already acting as the gateway. The out-of-band approach requires driver installation on all the servers but can be faster than in-band, since the switch is not slowed by the additional virtualization duties.
For the past few years, storage experts have debated whether the in-band or out-of-band approach offers better performance.
But Peglar insists that, these days, the two approaches are nearly identical. He recommends the in-band approach for networks where the server count is relatively high, while out-of-band products would be best suited for smaller environments.
Richmond looked at a number of virtualization software packages, including those offered by EMC and Hitachi. The team went with an in-band, network-level virtualization package from IBM Corp.
Officials liked the IBM TotalStorage SAN Volume Controller largely because it rested external to the IBM storage server itself.
This independence lets Richmond buy storage from manufacturers other than IBM, should others offer better deals, Mounts said. When they purchase a new SAN, they set the logical volumes as large as possible and then configure the Volume Controller to recognize the new array, said Ron Riffe, storage software strategist for IBM. The controller can also control the hard drive space on each of the servers.
Although Richmond is only partially finished migrating control of its SANs to the Volume Controller, the implementation team is already optimistic about the results. The office aims to fill 85 percent of its storage resources; today it uses only about 40 percent.
'That's much higher than we could have driven through our individual servers,' Gleason said. 'The bottom line is, for the same amount of storage, we can save money because we have a much higher utilization.'Vendor marketplace
The advantages of pooling storage resources can perhaps best be seen in Los Alamos National Laboratory's Pink supercomputer. When architects laid out this 1,000-node Linux cluster design, they eliminated hard disks on the individual servers altogether. All the storage for this 9.8 TFLOP system is provided by 16 5T ActiveScale Storage Cluster shelves from Panasas Inc. of Fremont, Calif.
'You don't need binaries out on the node,' said Ron Minnich, team leader of cluster research at Los Alamos. Instead, when a node fires up, it fetches an operating-system kernel and application from a master controller.
With 1,000 machines running constantly, disk failure is a real problem, Minnich said. Each time a disk fails, the node has to go offline and the administrator has to physically replace the disk. With the disks eliminated, Pink's uptime has been considerably higher than that of many other clusters, Minnich said. It also cuts costs. Pink cost about $6 million; while another, similar cluster cost $15 million, Minnich said. It also makes good use of its storage. Thanks to pooling storage resources, the storage arrays are usually 85 to 90 percent filled.
While Los Alamos did not use virtualization software, Pink shows the benefits of pooling storage. Other agencies could enjoy similar performance gains.
'Virtualization is relatively mainstream now,' Peglar said. As a result, companies are offering advanced features in their designs. EMC's newly released Fibre Channel virtualization software, Invista, offers the ability to move a logical volume from one device to another, or from one array to another, without taking the data offline. IBM's Volume Controller features a caching mechanism that speeds storage retrieval times.
NetApp's FlexVol virtualization software allows administrators to oversubscribe disks, allocating more data per disk than actually exists. This technique, which NetApp officials call thin-provisioning, is handy for cases where not all the applications use all their allocated disk space at the same time.
One of the chief advantages of virtualization software is that administrators can control all storage arrays, not just those offered by the maker of the virtualization software. Here, though, the customer must be very careful. Today, most vendors work with one another by swapping application programming interfaces, or APIs.
While most vendors support their competitors' equipment, not all do. As a result, customers should make sure their own arrays will work with the software they are planning to purchase.
The Richmond office faced compatibility problems with its implementation. Steve Forstner, Richmond's engineering manager, said the EMC support person questioned if EMC Clarion arrays would work with IBM software. IBM was so sure its software would work with EMC that it guaranteed it would replace EMC's SAN free of charge, should it fail to operate with IBM's software. It worked.
In the works is a set of industry standards for interoperable storage that is designed to obviate compatibility problems.
SNIA's Storage Management Interface Specification (SMI-S) aims to standardize network-based virtualization so 'a developer of applications doesn't have to write device drivers for every piece of storage out there. It eliminates the need for proprietary APIs,' Peglar said.
SMI-S offers standardized interfaces for obtaining operational information on creating local volumes, block virtualization, operations monitoring, capacity checking and other aspects.
With the adoption of SMI-S, the full promise of virtualization may finally be realized.