What's in store
Main story | As agencies start to grapple with a flood of large files, they may want to consider clustered storage.<@VM>Sidebar | What is clustered storage?<@VM>Sidebar | iSCSI storage can help keep your costs down
- By Rutrell Yasin
- Oct 21, 2007
A growing demand for digital content ' audio, images, video, satellite imagery ' is sparking the need for more sophisticated systems to manage and store this digital tsunami.
Take for instance, NASA's World Wind program, an open-source, virtual globe that lets online users explore the Earth and its terrain through high-resolution 3-D imagery.
Information technology administrators handling World Wind at NASA Ames Research Center needed a storage system that would let them consolidate a massive collection of satellite images into a single global name directory for simplified workflow, and to give users immediate access to content.
Or consider the case of Vandenberg Air Force Base, where the media production department films, edits, stores and analyzes footage of a number of spacecraft and missile programs.
The department is moving to a digitized world, as manufacturers stop making 35- and 16-millimeter equipment.
The media production employees currently use a variety of 35- and 16-mm, SD and HD video cameras to capture space flights ' sometimes resulting in feature-length films ' originating from the base that carry weather satellites and other payloads into orbit. A small project can include 12 to 15 terabytes of data, said Stan Bellew, a media specialist with the 30th Space Communications Squadron at Vandenberg. Larger projects can include more than 45 T of data and, in the near future, could even reach petabyte size.
To handle these large loads, both NASA and Vandenberg deployed clustered storage systems to handle massive amounts of data and digital content.
Typically, clustered storage solutions provide a single logical storage system, so any application can access any piece of data stored on it through any storage controller in the cluster, no matter how large the overall volume of data, according to a report from Enterprise Storage Group. A true storage system also provides aggregation of all hardware resources as a single system, ESG said.Data chunks
For World Wind, NASA used Isilon System's IQ clustered storage. NASA pulls Landsat 7 satellite imagery and Shuttle Radar Topography Mission data to enable Word Wind online users to zoom in any place on Earth to view its surface.
World Wind works with a company called I-cubed, which performs complex geoprocessing to convert raw satellite data into a high-resolution image.
That data is then sent to a Microsoft .Net-based application for staging and online delivery. World Wind's traditional storage limited I-cubed's processing applications, making it difficult for World Wind to process satellite data and deliver content to users, NASA officials said.
'We have a whole bunch of systems that respond to requests for data, but they need to pull that raw content ' which is very large chunks of information ' from somewhere,' said Randy Kim, general developer at NASA World Wind. It wasn't practical to store the information on traditional storage devices because the data was too big, he said.
'That's where the Isilon machine comes into play,' Kim said. 'It holds all of our original data. Not only is it able to host a lot of in-and-out traffic, but literally it is a safeguard for our information. We know that the data we keep on that box is not going anywhere.'
Isilon offers storage systems ' called nodes ' that consist of CPUs, memory, processor power, networking capabilities and hard drives. IQ's cluster architecture allows these nodes to be grouped together to act as one by One File System, Isilon's operating system software, said Sam Grocott, the company's senior director of product management.
The IQ system can scale from 4 terabytes to 1.6 petabytes in a single file system.
The IQ architecture includes software applications that provide data availability management and protection capabilities.
Traditional network-attached storage (NAS) and storage-attached network (SANs) devices were built for structured block and file data ' basically text-based data, Grocott said. But growing demand for audio, video and graphics requires something built from the ground up to handle this unstructured content, he said.
NASA deployed a four-node cluster of Isilon IQ 3000 units. There are 3 terabytes to each node, Grocott said, so NASA has 12 terabytes within that single file system.
World Wind has consolidated more than 8,000 large Landsat 7 satellite images into one single volume and single file system. This boosts performance and access for geoprocessing of the data and to Word Wind's technicians to stage and deliver processed images to users.
Kim noted that the system can be used for data exchange as well as for hosting data. 'By the time the satellite image gets to a personal computer, it has to go through a lot of preprocessing,' he said.
Isilon 'can take a lot of bandwidth,' Kim said, 'especially if it is on a local-area network.'Rocket movies
IT workers at Vandenberg Air Force chose SGI's InfiniteStorage Clustered Extended File System (CXFS). Besides filming spacecraft missions, the media facility will record and evaluate the testing of missile-intercept vehicles launched from Vandenberg.
'Before, we would faint at the sight of a gigabyte file,' Bellew said. Now files are in the terabytes, he said. Camera crews are shooting anywhere from 240 to 240,000 frames within minutes. A traditional dedicated disk array storage system just could not handle the data volume, he said.
Now, the media team can download a data set from a camera, process it and have it ready for conformity and review in 20 minutes; before, it would take a whole day. Film is sent to the Defense Department or various other agencies, he said.
CXFS works in conjunction with the SGI InfiniteStorage SAN. The SAN provides high-speed connections between multiple hosts and disk storage. CXFS provides the software infrastructure that allows simultaneous shared access to that storage.
'CXFS allows users to add more storage and then have multiple clients access that same storage pool,' said Raj Das, vice president of storage at SGI. 'But [in addition to accessing] the same storage pool, CXFS allows multiple clients to read and write the same file and we handle all the semantics behind it.'
Vandenberg is a classic user of clustered storage, he said. Users of digital media files like to work on the same file or film because they have multiple processes.
Without clustered storage, they would have to make multiple copies of the file ' which is inefficient because they're spending more money on storage to keep multiple copies of the same data, Das said.
Multiple copies also create a synchronization problem, because they have different people in the workflow process on the project. Then they have to combine all of the work after it is done.
'The advantage with a true clustered file system is that you have a single copy of the data and you have multiple parts of the workflow working on the same data,' Das said. One copy of the data also makes management easier, he said.
A slew of other companies offer different implementations of clustered storage, including 3PAR, BluArc, Data Direct Networks, EMC, EqualLogic, Exanet Ibrix and LeftHand.
For example, Ibrix Fusion is scalable, parallel file-serving suite, said Milan Shetti, vice president of marketing at Ibrix. It has an integrated Scalable Volume Manager function that lets users deploy and configure a common storage pool and single namespace using any combination of SAN-attached or direct-attached storage.
Some of the other Ibrix components include Fusion High Availability, a clustering solution that provides failover capabilities; FusionManager, an interface for administering and monitoring Ibrix Fusion clusters and file systems; FusionSnap, a file recovery feature; and FusionFileReplication, which create replicas of an individual file.
NAS and SAN vendors are also looking to move into this space.
For example, Hewlett-Packard recently bought Polyserve, which has technologies that could help the company move its legacy architecture to a clustered environment in the future.
Meanwhile, Network Appliance's acquisition of Spinnaker three years ago has borne fruit. NetApps now offers the ONTAP GX system.
Demand could pick up in government beyond the research and science and defense sectors, said Ken Terry, senior manager for the public sector at BearingPoint.
As federal agencies increasingly collect massive amounts of unstructured data, there will be a demand for systems that can store, retrieve and manage that information, Terry said.According to 'The Clustered Storage Revolution,' a white paper from Isilon Systems (GCN.com/865), a clustered-storage system pulls together two or more storage devices to behave as a single entity.
The white paper states that clustered storage can be broken down into three types: failover clustering, namespace aggregation and clustered storage with a distributed file systems.
Failover clustering is keeping a second, redundant copy of the data, which can be used in case the main source becomes available.
Namepsace aggregation means that data across all nodes is indexed in a central location, or namespace. This approach allows data to be kept in a wide variety of storage systems and be viewed as a single entity (GCN.com/866).
Clustered storage with distributed file systems also offers users a single view into all the data across all data resources. In this case, however, that data is kept at the node level, rather than at a central location. Nodes can be distributed across the entire network. In Isilon's view, this provides the most robust way to store data. 'Each node in the cluster is a coherent peer, meaning each node knows everything about the other,' according to the white paper.' Joab Jackson
Do you have a lot of digital content but want to keep costs low? While clustered storage is one way to go, take a look at another emerging approach based on the Internet Small Computer Serial Interface, or iSCSI.
Clackamas County, Ore., scrapped a five-year-old IBM Fibre Channel storage-area network for an iSCSI SAN from EqualLogic to tackle the county's application growth.
'The big thing we've had to solve is the explosion in storage requirements and the ability to manage that,' said Chris Fricke, a senior microcomputer specialist with the county.
'E-mail, document imaging and digital media such as movies and maps and all those services that are being offered by the county is growing exponentially and we needed a way to manage all of that,' he said. 'So everything is hosted off our EqualLogic iSCSI SAN.'
Approved by the Internet Engineering Task Force in February 2003 as a standard for communicating with computer components over a network, iSCSI encapsulates SCSI commands in TCP/IP packets so they can go out over an IP-based network. This allows computers to access disks on the network in the same way they access internal disks. iSCSI can cut the costs of storage considerably because commodity networking equipment can be used, eliminating the need for Fibre Channel, which requires specialized cabling, switches and adapter cards. SCSI disks also tend to be cheaper and require no specialized training beyond basic system administration skills to manage.
Clackamas County, near Mount Hood, is home to more than 338,000 people. The information technology department of 50 employees manages internal and external IT activities for more than 2,000 users.
'The biggest thing we're doing now, storage-wise, is virtualizing our servers,' Fricke said.
The IT department is deploying VMWare virtualization software, combining application servers and Microsoft SQL servers as well as Exchange servers. Fricke wants a more virtual environment so systems can quickly fail over to another site in the event of emergencies. He wants to be able to replicate data and host servers where it makes sense without provisioning hardware to do it.
Fricke just completed an upgrade on the three-year-old system with no downtime or interruptions.
'None of the things that you have to deal with Fibre Channel are there ' no custom cabling, zoning issues nor vendor-centric hardware,' he said. 'When you want to connect devices to storage, you have multiple options.'