Picking up the pieces

Knowing what data or services must be restored first in any situation is vital to a well-managed recovery effort.

Michael J. Bechetti

To prepare for a disaster, whether natural or man-made, you will need both backup and recovery applications'and a plan

There are few things more critical to an IT department than having a good, thoroughly tested backup and disaster recovery program in place.

It doesn't matter whether the cause is a terrorist attack, flood, massive software failure, fried squirrel in the nearest power substation or even a bad software patch. Management always wants the same question answered: 'How soon can you get it up and running again?'

Regardless of the cause, the response to virtually any security problem requires you to restore the system or at least be able to verify exactly what was on the affected systems before the problem occurred.

Backup technology continues to improve'but it's really pretty simple to keep up with the hardware and software side of disaster recovery. The big challenge is developing a proper disaster recovery plan.

All the latest technology is worthless if no one knows what to do when disaster strikes.
The first step in developing a plan is determining just what constitutes recovery. That might seem obvious, but there is a difference between resumption of critical services and complete restoration of all computer functions and files.

This triage process must be done in advance because the plan affects the way backups are performed and because the usual 90/10 rule could apply: If you can restore 90 percent of functionality in the first 10 percent of the total time required, that's a wonderful safety net'but only if the right 90 percent is restored first.

Creating a recovery play can be complex. The most critical services could depend on the nature of the disaster. For example, a freak power outage should trigger a business-as-usual recovery process. If a terrorist action or a cyberattack is the cause, then the restoration would likely have a different set of priorities, probably including forensic analysis.

Consider probabilities

The second step is to consider the most likely disasters. For some installations, disasters are periodic events'such as for emergency services located in the usual path of hurricanes or in fire-prone locations. But most IT disasters result from comparatively minor problems: a broken water main, local power outage or local hardware failure.

This too could affect the basic recovery plan by altering the sequence of restoring services. For example, if a hurricane hits, no one will get to the office anyway, but if a virus strikes during the day everyone will be sitting around waiting to go back to work.

Once restoration priorities are set for various scenarios, it's vital to have a single person designated as the chief recovery officer, with a backup person named in case the chief isn't available.

The recovery team must include people from nearly every department. Don't forget the cleaning crew'they might be among the most important workers in a recovery operation.

Above all, plan for data safety. Bear in mind that your backup might be the only way to restore full functionality. So plan to avoid a second disaster by making certain that the backup data is carefully protected. What happens if the drive eats your backup tape? You should know that answer ahead of time.

The other critical parts of a disaster recovery plan are relatively basic:
  • Make sure everyone on the team knows his or her job.

  • Keep the plan up-to-date.

  • Remember to replace departing team members and train the replacements.

  • Run mock disaster drills.

  • Plan to record every step of the restoration process.

  • Conduct a forensic analysis of any drills and actual recovery events.

Classified operations should follow the same guidelines but will have additional concerns, such as making certain that any outsourced recovery services and people are fully cleared.

Finally, consider the worst-case scenario. What if the backups are blank, mislabeled, destroyed simultaneously, corrupted or simply unavailable because all aircraft are grounded, the Internet is down, or the only bridge between the office and the backup site is under water? What if no new hardware can be delivered? What if the backup media can't be read because the new operating system doesn't support the hardware? And so on. As we've recently learned, even some seemingly far-fetched situations can and do occur.

You simply can't recover from some disasters except by starting from scratch, but careful planning can greatly reduce the prospect of such situations.

John McCormick is a free-lance writer and computer consultant. E-mail him at powerusr@yahoo.com.

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above