2 technologies that can make storage a small matter

 

Connecting state and local government leaders

Data deduplication can save considerable space on backup servers by eliminating duplicated data. Thin provisioning can optimize disk space on primary storaqe. And both can save you money.

As the guardians of enterprise data, storage specialists tend to be conservative when it comes to adopting new approaches. However, many storage specialists at federal agencies are moving to adopt two fairly new technologies: data deduplication and thin provisioning.

“This is a very risk-averse crowd,” said Dave Russell, a vice president at Gartner Research. However, when it comes to deduplication and thin provisioning, Russell said, “these two technologies are getting a lot of attention, and they’re getting a lot of deploying, too, and deservedly so.”

“In my 20 years in storage, I’ve seen never a technology go from talked about to being deployed in major data centers this fast," Russell said of data deduplication.

Russell and other analysts say the primary factor that drives such rapid adoption of the new storage technologies is simple: dollars. Because both technologies reduce the investment required to move and store data, they result in lower equipment costs and savings in power and cooling.

Mike DiMeglio, product manager at FalconStor Software, a provider of deduplication software, estimated that deduplication could easily save as much as 50 percent of an organization’s wide-area network costs.


Related stories:

To each his own storage

The means to go green


In principle, deduplication is simple. Most data on a server contains a lot of duplicate data. Users might save 20 different versions of a PowerPoint presentation that has only a single changed slide. Or an e-mail message might go to 40 people with the same image attached. With deduplication, when that data is backed up or archived, only a single copy of the image or the presentation is saved, along with an index so individual files or presentations can be rebuilt, or rehydrated, if necessary.

The potential benefits in terms of required storage space — especially for federal agencies and departments, which are required by law to keep detailed backups for long periods din many cases — are immense.

With typical office data files, deduplication can reduce the size of backups by 90 percent or more. “We get on the order of 95 percent optimization,” DiMeglio said.

Moving smaller amounts of data not only consumes less power and other resources but also can change the way an organization works.

“The next cascading effect is that now we’ve got less data to move, and we can afford to replicate that data from one location to another," Russell said. "Whereas before, we might have only relied on writing out the physical tape and then either using our own people or hiring a service to come and pick up those tapes every day. Now we can transmit them over the wire.”

Although deduplicating can save greatly on storage requirements and related costs, it can also slow performance.

“Dedup is heavy math,” said Steve Foley, director of federal programs at 3Par, a utility storage vendor. “It requires CPU cycles and the system resources within a primary storage array. If it’s deduping while it is maintaining a Web site, say, it will affect the user experience, and the performance of the Web site would be affected because the system is searching for data that it can dedup.”

Potential performance problems also arise when a user needs to retrieve deduplicated data. “Once it’s deduped, the read/write heads have to do a fair bit of thrashing on reads to retrieve the data blocks,” said Dave Swan, an analyst at Knight Point Systems, a systems integration company.

However, vendors and analysts say most users won’t notice the performance lag caused by deduplication when it is used for backups and archiving. That’s partially because most users are moving from tape systems to faster disk-based backup, and disk backup with deduplication is generally significantly faster than tape backup.

Also, users’ performance expectations are different with backup and archiving than they are with many other applications. “In the backup world, if there’s going to be a performance hit of a second or two, you’d never even blink an eye," Russell said. "The same is true in an archiving situation. Now when we get into highly transactional credit card processing system…even minimal overhead is unacceptable.”

Even when used for backup and archiving, deduplication is more effective with certain kinds of data than others.

“What is kryptonite for deduplication?” Russell asked. “Anything that has been previously compressed. Also, anything that is encrypted.”

Compression algorithms generally remove much of the duplication of data in files. Similarly, encrypted files are encoded in a way that the software cannot recognize duplicated data. Files with large amounts of unique data, such as image files, also are not good candidates for deduplication.

Analysts caution that vendors approach deduplication in different ways, and agencies and departments that are considering using deduplication should understand which methods are the best fit for their data.

For starters, deduplication software can employ different chunking methods, or methods of defining blocks of data for comparison. Some systems will examine specified sizes of data chunks on drives, and others will compare only complete files, a method known as single instance storage. The most sophisticated and most CPU-intensive method is named sliding block, in which the software examines and analyzes data streams for repeating patterns.

Although some deduplication systems can only handle a single logical volume or disk spindle, others can be applied across an entire storage array.

Another major difference between deduplication systems is whether they work their magic during the backup process or after data has been moved to secondary storage, a process some call cache and crush.

Although cache and crush imposes fewer CPU demands on the primary storage network, it requires significantly more disk space in secondary storage because it will need to accommodate the entire dataset until deduplication takes place.

Deduplication systems also vary in a number of potentially important ways. For example, not all of them support Symantec’s Open Storage Technology. And not all vendors offer a Fibre Channel interface. Most, if not all, vendors support Redundant Array of Independent Disks 6 and employ redundant power supplies, but not all systems offer failover capabilities.

One complicating factor is that most storage vendors now bundle deduplication tools with their products, yet many customers are not aware of the differences among those tools. “It has become a very pervasive technology, but it’s not standardized,” Russell said. He said he advices clients to do a careful examination of requirements and predeployment testing before making a decision on new storage solutions.

Knight Point’s Swan said vendors might not be helpful in that process. His team was called in to help select a backup system for a new data center in the intelligence community that was to host thousands of servers, some virtualized and some not. The datasets included typical office files and specialized sets of encrypted data.

“When I got on the project, the first thing I noticed was that it was grossly undersized,” Swan said. “I believe they were sizing it based on price, not on requirements. We started talking about the amount of floor space necessary, the power and cooling necessary, the number of racks required to back up that much data. It was clear that this thing was going to consume half of the data center, which was not a good fit. So we said they really should look at deduplication as a means of reducing the physical footprint by eliminating redundant data in the backup.”

Swan talked to a number of vendors. “Each of them pretty much wet their finger and stuck it in the air, decided which direction the wind was blowing and said, ‘I think you have X millions of dollars. Here’s your solution.’ And they pushed it across the table,” Swan said. “Never once did any of them do any math. How much data do you have? How fast do you need to shove it in? How fast do you need to get it out? These are big issues, yet none of them did that until we forced them to do it.”

Ultimately, Swan’s team settled on FalconStor, primarily because it allowed them to deduplicate only the unencrypted data, a capability that some vendors didn't offer. Another factor was that FalconStor could deduplicate data across multiple virtual tape libraries, a capability that the other primary candidate lacked, Swan said.

Russell agreed that agencies and departments should be careful about requirements in considering deduplication offerings. “In the grand scheme of things, it’s still incredibly nascent technology,” he said. “It’s going to evolve. That could suggest purchasing tactically or at least trying a solution in a more limited scope as the market continues to evolve.”

Thin Provisioning

Deduplication primarily aims to save disk space on backup servers. Thin provisioning primarily aims to conserve disk space on primary storage.

As with deduplication, thin provisioning's underlying concept is simple. But instead of compressing data, thin provisioning works by telling a simple lie to applications.

Many applications, such as e-mail servers and databases, need to know how much disk space they have to work with. For example, a systems administrator might reserve a terabyte of disk space for a database, and the database might only need 100G at that time. With thin provisioning, the administrator tells the database that 1T of space is available, but only the space needed is delivered, in this case 100G. That frees 900G for other applications to use.

Thin provisioning lets organizations configure applications for the future without needing to come up with the entire capital outlay for hardware. With the cost of storage continually dropping, when the additional hardware is actually needed, it will cost less than if it had been purchased at the outset.

The potential reduction in required storage space is significant. “Some studies in the marketplace say that 75 percent of the storage has been presented but is stranded and is never written to,” 3Par's Foely said. The company is the first storage vendor to offer thin provisioning. “So if you have a 100T system, on average only 25 to 30 terabytes are actually written data.”

When Bryan Gilley, director of management information systems for Skokie, Ill., had to replace some servers, he looked to thin provisioning as a way to stretch his budget. “We were able to create a virtualized environment with VM and a storage-area network at less cost than we had planned to spend on the servers,” Gilley said. “We’re able to grow our storage environment in a more planned way and not have all that cost upfront.”

However, analysts warn that there are some risks involved with thin provisioning. Specifically, to avoid running out of disk space, administrators need to know the real storage requirements of all their applications and provide additional disk space in time.

“It does add the risk that if you don’t properly identify growth as it happens, you could run out of space,” said Andrew Reichman, senior analyst at Forrester Research. “It’s important for any organization that is considering using thin provisioning to be really clear about the reporting tools that you’re going to use to mitigate the risk and the processes that you’re going to use to look at those reports and make decisions and keep it safe.”

Thin provisioning has been adopted by most SAN vendors, and it is generally offered at no extra cost. But one area in which vendors’ offerings tend to differ is specifically in the reporting and alarm tools, Reichman said.

Swan said procurement cycles might not match up with an administrator’s need for new storage if growth is not adequately projected. When an administrator realizes there’s a crunch, it could take nine months to get new equipment, he said. “Well, it’s a little late,” he said. “My Oracle just crashed.”

Although most SAN vendors offer thin provisioning, analysts say there are significant differences among those implementations.

“The quality of an implementation, the functionality of an implementation, the ease of use of an implementation, the breadth of applications that can benefit from an implementation are going to vary by vendor and by model,” said Stanley Zaffos, vice president of Gartner Research.

For example, some thin-provisioning systems require administrators to assign all of the storage space in full, while others allow incremental assignments, Zaffos said. In addition, some solutions can span different types of storage arrays, but others cannot.

As Foley said, some thin-provisioning tools can’t recognize when a database administrator deletes files in Oracle and reclaim the disk space. That was a capability only recently added to 3Par’s thin provisioning. “Our array is now able to recognize that and reclaim space at the block level,” he said. “That makes tons of difference. In large environments, there are terabytes and terabytes of data that can be reclaimed and reused if it is removed at the database level.”

Zaffos said one unusual challenge with thin storage is that the answers are rapidly changing. “If you’ve got four or five questions and you’re looking at three or four vendors, this is really not an onerous task,” he said. “The problem is that the answers are changing over time. This is a dynamic landscape. Even if you provide a cheat sheet for your readers today, it’s likely to change.”

Swan said not all applications are good candidates for certain thin-provisioning systems. “A little thin provisioning can be a good idea," he said. "But a lot of thin provisioning is a bad idea. Not all applications are going to be good candidates. And the disk array that is used is going to matter. Does it allocate one block at a time? Or does it allocate a 128M chunklet, like 3Par does? Just be aware of the underlying mechanism and how rapidly it grows. Know how it works under the covers before you deploy it too heavily.”

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.