Backup to the future

 

Connecting state and local government leaders

Innovations help close the window on potential data losses.

The volume of data agencies accumulate each day can push backup processes to the wall, expanding nightly backup windows to the point where they could overlap the daytime hours of productive work.

But faster chips, bigger hard drives and speedier networks are spawning strategies for backup and restoration that can reduce your backup window to zero, minimize the size of backups and let you choose practically any point to restore'even continuously.

'Even agencies that aren't 24/7 shops can benefit from strategies to reduce or eliminate this backup window,' said Agnes Lamont, co-chair of the Data Protection Initiative for the Storage Networking Industry Association and vice president of marketing for backup software vendor TimeSpring Software Corp. of Newport Beach, Calif.

Besides the growing backup window, there is the question of whether once-daily backups are sufficient. After all, a lot goes on in a single day. If you're missing all the data, such as e-mail discussions people generated since last night, that gap represents a potentially significant loss.

Also, some administrators are under the impression that backup is solely for large operations that handle heavy transaction loads. But it's just as important for small offices to pay close attention to backup, because they might not realize the value of their data or even have a backup strategy in place.

Backup and recovery are of interest to commercial enterprises, too, but government agencies have special needs'many legally mandated'for information assurance, continuity and accessibility.

For example, preservation of material also could entail establishing a chain of custody'metadata'for that material. Can you prove where information came from and that it's absolutely unchanged?

New backup strategies are becoming possible because of technology improvements in several areas. Disk space is increasing, networks are speedier and chips are faster. For many enterprises, the hardware cost of moving to these new backup strategies is negligible.

Logically, your recovery needs are what influence your backup strategy the most. 'Agencies must decide how long they can afford to wait to recover, and how much data they can afford to lose,' advised Greg Schulz, founder and senior analyst of the StorageIO Group of Stillwater, Minn., and author of Resilient Storage Networks (Digital Press Books).

Some agencies might be able to wait a few hours to recover, while others might need to be up and running again in seconds. This wait is the recovery time objective: The shorter the RTO, the faster your recovery must be, so the faster your backup media must operate and the more expensive your solution will be.

Similarly, you also must decide how frequently to back up data. Some agencies might be able to deal with losing a few hours of data, while others can't miss even a few seconds. This length of time you can afford to lose is the recovery point objective: The shorter your RPO, the more complete and granular your backup must be and, again, the more expensive your solution.

Once you estimate your agency's RTO and RPO, you can begin searching for backup and recovery methods to meet your needs. You must decide whether recovery methods, and the backup media that support them, are fast enough, and whether your backup methods are complete and granular enough.

New backup methods

Snapshots are one popular new strategy. A snapshot consists of a collection of time-tagged pointers to data. Some of that data could be ordinary production data in use; some can be saved versions. If you want to recover file X as of noon on Thursday, you simply search through the pointers for the one that indicates the version needed. The result is a near-immediate recovery.

Since the snapshot itself consists mainly of pointers, it often takes much less space than a complete backup. It is also faster to create the snapshot than it is to physically copy data, and with faster chips and speedier networks, there is little performance overhead. This reduces the time gap between backups, lowering your exposure to missing data.

Snapshots also reduce your nightly backup window because so much saving is already being done during the day. Of course, gaps remain'namely, the time between snapshots'but there are fewer than before.

The ultimate logical extension of reducing the time between backups is to make backups continuously'what's known as continuous data protection. Real or true CDP saves everything'every change to data'with no gaps in the backup data at all, and it has the ability to restore to any time necessary. 'We call this APIT: any point in time,' said Eric Burgener, vice president of marketing for Mendocino Software of Fremont, Calif.

Near-CDP, without the 'real' or 'true' label, may have gaps of minutes or hours. For example, the latest version of Microsoft's Data Protection Manager offers near-CDP at intervals of 15 minutes.

'The advantage of CDP is that there is no backup window,' Lamont said. Also, depending on implementation, there is little impact on production.

How does CDP work? When data changes'a user saves a file, for instance, or an e-mail arrives'that change is automatically made on the backup as well. Faster chips reduce the overhead to nearly nothing, while capacious hard drives provide the room.

There are different versions of CDP, each with different implications for your system. One example is making block level CDP tracks changes on a low level: what is physically changing on a hard drive, regardless of what file or application the data belongs to. Block level CDP is very fast, grabbing a chunk of data in one place and saving it in another, and is amenable to being implemented as a network appliance using special hardware.

'There's a lot of interest in block level CDP, because many enterprises use large databases that benefit from this method,' said Chris Stakutis, chief technology officer of emerging storage software for IBM Corp.

File level CDP takes a different approach and works at a higher level, tracking changes to files regardless of what blocks of data those files reside on. This lets you keep your data consistent from the point of view of the application that uses the file.

For example, reconstructing a large Microsoft Word document can be time-consuming with block level CDP: Different parts of the document might reside on different blocks, possibly on different machines, each of which would have to be restored to the right version in time. With file level CDP, it would be a matter of tracking the versions of that one file.

Naturally, this depends on the kinds of data your agency handles, as well as the volume of data. 'If you're concerned with recovery time, you need to choose a CDP implementation that is application-aware,' said Dan Tanner, founder of consultant ProgresSmart of Westboro, Mass.

'CDP is most useful for valuable data that changes quickly,' Burgener said. You must decide what events will trigger a CDP backup, including saving, deleting, opening, editing, or viewing a file or document.

Hardware or software versions of CDP are available. If opting for a software-based version, be sure it can use existing hardware and fit with your current operating systems and applications.

Some products, such as Mendocino Software's InfiniView, let you switch between true CDP and near-CDP. 'After data is a few days old, you probably don't need every version,' Burgener said. Instead, less frequent, periodic backup could be more appropriate.

New hardware appliances also can ease the backup burden. For example, the Active Archive Appliance from PowerFile offers an optical-disk media library as well as a RAID-enabled front-end cache.

'This removes nonchanging data from the backup challenge,' said Jonathan Buckley, vice president of marketing for PowerFile Inc. of Santa Clara, Calif. The smaller amount of current data is easier to manage, while the archived material is still accessible if necessary.

'We find that 85 percent of the data people need is sitting in the cache,' said Jim Sherhart, PowerFile's director of product management. Optical-disk access also is faster than conventional tape archive access, permitting new services such as PDF libraries.

Behind the backup

If keeping track of all the pieces that make up all the versions of all files sounds like a tough job, you're right. That's why some backup solutions use a database to manage everything.

'The Tivoli Storage Manager is built around a relational database,' said Tricia Jiang, technical attach' for IBM's Tivoli Storage Solutions. The database tracks metadata about each item of data. This metadata also is useful for migrating data to long-term storage media, as well as for auditing and using space more efficiently.

Identifying the data to back up and actually doing the physical copying can be two different things. 'You don't necessarily perform the save immediately,' Stakutis said. Instead, the backup system tracks the change, doing the save when it's most expeditious.

A process called deduplication can slash the amount of data you need to back up by storing duplicate data only once. For example, if you receive an e-mail that you share with 10 colleagues, most backup systems keep every copy of that e-mail separately.

Deduplication scans data headed for backup and compares it to data already saved. If it finds a match, it only saves the data once and simply points to that one instance for every copy.

'File-based deduplication is preferable to bit-based or block-based,' said Diamond Lauffin, founder and CEO of the Lauffin Group of Los Angeles. This is because so many frequently used files are fixed-content'such as PDFs or graphics. File-based deduplication even deals with renamed files, since it works on the content of the file.

Data sprawl

'Each increase in storage only increases the amount of backup necessary,' Schulz noted. Files get bigger. New capabilities, such as images, sound or video, also lead to larger backups. E-mail, too, represents an avalanche of sometimes-vital information flowing into an agency. 'Most CDP is sold to deal with Microsoft Exchange,' Burgener said.

The increasing mobility of the workforce also creates a backup concern. If a notebook PC is lost, how else can the data it contains be recovered? 'While the volume of data on laptops is comparatively smaller, it is usually more creative and may represent more value,' says Stakutis.

SNIA is evolving standards and best practices to cope with rapidly changing backup technology. As backup windows fade into memory and backup gets to be continuous, the process will become transparent to users. 'This is the same philosophy as antivirus software,' Stakutis proposes. 'You'll need it, won't even question it and won't notice it.'

Edmund X. DeJesus is a freelance technical writer in Norwood, Mass. (dejesus@compuserve.com).The Geographic Information Systems' department in Pierce County, Wash., needed the best of both worlds'the speed of disk to quickly retrieve data and the cost-effectiveness of tape backups. They found the best approach to be one that blended both tape and disk-backup systems, according to Linda Gerull, Pierce County's GIS Manager.

Gerull's department maintains the geographic data for the county, which is used by 800 users in 25 other departments. The county includes agricultural, high-tech manufacturing, urban (Tacoma) and tourism (Mount Rainier) features'all of which depend on geographic information for planning and management. The department also offers free GIS applications and services on the Web (see GCN.com/737).

Maintaining such data can take a lot of disk space. GIS-based maps tend to be large data files, ranging into the gigabytes. They could be satellite imagery, or 3-D terrain maps. Complicating the task is the availability of new orthophotography and LIDAR (Light Detection and Ranging) data for the area. While undeniably valuable to users, the 2,400-plus files are large and only add to the task of providing information and applications to the public.

Keeping all these different versions of files on disk would be cost-prohibitive, yet users demanded that at least the latest versions of each file be accessible within a matter of minutes through GIS software. So Pierce County used a combination of a 7TB disk-based unit, the TotalStorage DS4500 storage server from IBM Corp., along with the IBM TotalStorage Ultrium Scalable Tape Library.

With this approach, older copies of a file are backed to tape, while newer copies'those still accessed'remain on disk, according to William Morman, an information technology specialist for the county.

'Many customers are taking a blended approach,' said Peter McCaffrey, program director of TotalStorage marketing for IBM's Systems and Technology Group. 'They may use a disk-to-disk backup environment, where they do an image copy of data from primary disk storage. They may do that nightly, but then they will do a consolidated backup to tape.'

'For the number of copies we need to keep and how often they need to be accessed, tape is still the best way to go,' Morman said.At some point, most storage administrators have to decide whether to archive data on tape, which is cheap but can be slow to recover, or on disk, which is speedy but more expensive.

For the Marine Expeditionary Forces, however, the choice was easy. They had to use disks, not only because disks offered faster backups, but also because tapes could not withstand the harsh climates in which the forces operate, according to Patrick McLaughlin, a Marine master gunnery sergeant at the time of this interview who has since retired.

With more than 15,000 personnel, the Marine Expeditionary Forces are deployed at 30 locations worldwide, from Iraq to the Horn of Africa. About 70 servers running Microsoft Exchange deliver the mail to the command's headquarters in Camp Lejeune, N.C., as well as the many field units around the globe.

Originally, the command would back up the Exchange.PST mailbox files on tape drives. Each server had a tape drive, a solution that seemed to work well enough in North Carolina but did not perform as well in harsher climates. Some server facilities could heat up to 130 degrees or more, making the tapes pliable. Some would stretch when being read, becoming useless. Dusty climates also wreaked havoc.

If a device failed, forcing recovery of data from tape, nine out of 10 times 'the tapes would fail, the recovery itself would fail, or the best case was the tape recovery would be successful but it would take up to 36 hours to complete the recovery process,' McLaughlin said.

Needless to say, the command couldn't endure a 36-hour hold time for an e-mail account.

A tape-based approach had other problems: Mailboxes would be backed up once a day, which meant that a power outage at the wrong time could lose an entire day's worth of mail. Plus, backing up all the mailboxes could take six to eight hours'a process that could slow network and mailbox use during working hours. Making backups also required personnel to be on hand to swap out tapes.

So the command looked at using low-cost network attached storage devices as an alternative. Though more expensive than tape, NAS servers can be inexpensive enough to use as a feasible alternative. The Tape Technology Council, a consortium of tape vendors, has estimated that a tape-based system can cost about a fourth of a disk-based system.

The command has purchased more than 100 FAS 250 Fabric-attached storage systems from Network Appliance Inc. of Sunnyvale, Calif. The systems offer up to 4TB of storage per unit, and the command eventually will offer its personnel an aggregate total of 100TB.

The NAS approach also lets the command back up more often. The system takes a snapshot of each user's primary mailbox every 15 minutes. Such a snapshot usually takes only a minute or two to complete, and, after initial set-up, requires no administrator intervention.

For McLaughlin, disks provided the optimal solution to the problem of extreme environments and frequent backups. 'Up to that point, we really had no solution other than to keep doing what we were doing ' relying on an unreliable tactical solution [of] tape,' he said.

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.