GCN Tech Blog

By GCN Staff

Blog archive

Mirroring is not backup; backup is not archiving

Mirroring is not backup; and backup is not archiving. Keeping these tiers of storage straight can avert many headaches down the road.

Last week, Slashdot posted a cautionary tale about the importance of good backups. Startup company Lagomorphics, which offered a hosted blog space called Journalspace, failed to make regular backups of all of its users' material.

Keep in mind, the company's administrators thought they were making backups. The company mirrored all the users’ data on a second set of drives. The idea was that if one of the drives failed, then a complete copy of all the material would still exist on the mirrored drive.

But what the company did not foresee was a case in which the original database was corrupted, say by an operating system bug or a <a href=http://thewhir.com/web-hosting-news/010509_Blog_Host_Shuts_Down_Following_Data_Loss>malicious administrator</a>, then the material in the other database would also be corrupted, thanks to speedy replication of the originals by the mirroring software.

"There was no hardware failure. Both drives are operating fine. ... The data was simply gone,” a company-posted Web page about the data loss explained.

And that is indeed what happened. Something had wiped out six years of journal entries on the primary database, and a short while later, that empty database was copied over in its entirety to the mirrored site. The company tried the data restoration services of DriveSavers, but to no avail.

For Journalspace, already on shaky ground, this data loss was the final blow. The owners shuttered the service. Users were left to scavenge their old blog entries through the cached copies of Google.

For the rest of us, the lesson is clear: Mirrored copies may be great for load balancing, for application development testing, or even for quick recovery. But don't consider your mirrored copies as the primary backups.

And just as you shouldn't think of mirrored sites as proper backups, nor should you look to backups as an archive, noted EMC director of marketing Sheila Childs, during a presentation on information compliance at the New York Interop conference last fall.

Childs was speaking about the proper ways to retain data for compliance reasons, including e-discovery.

"You should not use backups in any way shape or form for [archiving]. It is the wrong technology," she said. Backups are made to simply preserve working data in case it gets chewed up by misbehaving primary systems. Archiving, on the other hand, is the systemic retention of data that may have be needed later, for legal or historical documentation. Assuming your backups will serve your data retrieval needs during a time of litigation will just lead to lost records and endless hours of futile data searching.

Dixon explained in great detail how the two technologies are different: Backups are in place for a recovery process; Archives are designed for the retrieval of information. Backups house copies; archives hold the originals. Backup is for short term storage; archiving is long term. Backup files are periodically overwritten; archives are retained for long periods of time. Backups tends to be done with proprietary technology; archive technology tends to use more open standards, so material can better be retrieved years or decades from now.

"Backup technologies are meant to be used for recovery. They are meant to be recycled. The mindset of archiving is to take that material out of your system and put it some place and leaving it," she said.

Archiving can help with backups. By moving some of the older material off to archives, you can free some of the backup space and cut down backup times.

"Archiving is a technology that is complementary to backup ... but it is very very different," she said.

Posted by Joab Jackson on Jan 06, 2009 at 9:39 AM


inside gcn

  • cybersecurity (vs148/Shutterstock.com)

    NIST lays groundwork for encrypting IoT devices

Reader Comments

Sun, Jan 25, 2009 Bob S.

The problem with this analysis is that it implies that archiving creates a safe and permanent record that can always be relied upon to be available. The trouble with this logic is that archives need to be stored in some physical location. If that storage facility suffers a disaster your archives can be lost. Also, most archives are done to tape which can be corrupted fairly easily. Archives can also be difficult to locate if the record of where to find individual data is lost. This requires a backup of the database storing archive retrieval information, but suppose that backup gets corrupted? The right solution is far more complex and is rooted in Information Management, not merely archiving.

Mon, Jan 12, 2009 wrb Washington DC

Very interesting that you interviewed Ms. Childs since EMC has been trying to establish the Content Addresable Storage market for several years now with limited success. The lack of success can be attributed in part because of this issue of customers not distingushing between backup and archive. It is further exaerbated, as you point out, by all the mirroring/replication that makes the client feel secure. The latest technology that actually helps defuse some of this confusion is deduplication. Since you can use deduplication for either backup or archiving (with the best dedup rates within the backup apps 20-30x), you can then replicate after the data has been deduped to decrease the bandwidth requirements. You now reduce the dependance on mirrors and hence all the storage these techniques consume. Since EMC was trying to establish a market for their Centera product for single instance storage, they had to educate the consumer on the difference between back up and archive. Needless to say, neither EMC nor Netapp is interested in educating their customers on how to use less mirroring or replicating.....

Fri, Jan 9, 2009

You should look at products like Enterprise Vault from Symantec which is the best of the active archiving products around. It's not just email that should be archived, all e-content needs proper protection which is clearly NOT just backups. When you archive, data needs to be searchable, and indexing in-stream while archiving is the smart move. You can then take decisions on tertiary storage and retention categories as well for your next efforts.

Thu, Jan 8, 2009 Steven Moshlak Washington, DC

Can someone say FOOLS!!! This is what happens when you try to go CHEAP, CHEAP, CHEAP... I hope Drivesavers was upfront with them about trying to rebuild the database(s). dudes, when the CIO says "We need economical performance...", watch-out. This was an accident waiting to happen and I'll bet REAL MONEY that someone never did a switch-over to see if the mirrored image woudl function, in lieu of the REAL THING. This is why you use back-up software (I use 2) and test the system on a semi-annual basis to verify that you can perfrom a disaster recovery. Steve @ www.computerlegalexperts.com

Wed, Jan 7, 2009 Paulette Hatchett Lansing, MI

Addresses an ongoing issue for us.

Show All Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

resources

HTML - No Current Item Deck
  • Transforming Constituent Services with Business Process Management
  • Improving Performance in Hybrid Clouds
  • Data Center Consolidation & Energy Efficiency in Federal Facilities

More from 1105 Public Sector Media Group