Mirroring is not backup; backup is not archiving
Mirroring is not backup; and backup is not archiving. Keeping these tiers of storage straight can avert many headaches down the road.
Last week, Slashdot posted a cautionary tale about the importance of good backups. Startup company Lagomorphics, which offered a hosted blog space called Journalspace, failed to make regular backups of all of its users' material.
Keep in mind, the company's administrators thought they were making backups. The company mirrored all the users’ data on a second set of drives. The idea was that if one of the drives failed, then a complete copy of all the material would still exist on the mirrored drive.
But what the company did not foresee was a case in which the original database was corrupted, say by an operating system bug or a <a href=http://thewhir.com/web-hosting-news/010509_Blog_Host_Shuts_Down_Following_Data_Loss>malicious administrator</a>, then the material in the other database would also be corrupted, thanks to speedy replication of the originals by the mirroring software.
"There was no hardware failure. Both drives are operating fine. ... The data was simply gone,” a company-posted Web page about the data loss explained.
And that is indeed what happened. Something had wiped out six years of journal entries on the primary database, and a short while later, that empty database was copied over in its entirety to the mirrored site. The company tried the data restoration services of DriveSavers, but to no avail.
For Journalspace, already on shaky ground, this data loss was the final blow. The owners shuttered the service. Users were left to scavenge their old blog entries through the cached copies of Google.
For the rest of us, the lesson is clear: Mirrored copies may be great for load balancing, for application development testing, or even for quick recovery. But don't consider your mirrored copies as the primary backups.
And just as you shouldn't think of mirrored sites as proper backups, nor should you look to backups as an archive, noted EMC director of marketing Sheila Childs, during a presentation on information compliance at the New York Interop conference last fall.
Childs was speaking about the proper ways to retain data for compliance reasons, including e-discovery.
"You should not use backups in any way shape or form for [archiving]. It is the wrong technology," she said. Backups are made to simply preserve working data in case it gets chewed up by misbehaving primary systems. Archiving, on the other hand, is the systemic retention of data that may have be needed later, for legal or historical documentation. Assuming your backups will serve your data retrieval needs during a time of litigation will just lead to lost records and endless hours of futile data searching.
Dixon explained in great detail how the two technologies are different: Backups are in place for a recovery process; Archives are designed for the retrieval of information. Backups house copies; archives hold the originals. Backup is for short term storage; archiving is long term. Backup files are periodically overwritten; archives are retained for long periods of time. Backups tends to be done with proprietary technology; archive technology tends to use more open standards, so material can better be retrieved years or decades from now.
"Backup technologies are meant to be used for recovery. They are meant to be recycled. The mindset of archiving is to take that material out of your system and put it some place and leaving it," she said.
Archiving can help with backups. By moving some of the older material off to archives, you can free some of the backup space and cut down backup times.
"Archiving is a technology that is complementary to backup ... but it is very very different," she said.
Posted by Joab Jackson on Jan 06, 2009 at 9:39 AM