Pulse

By GCN Staff

Blog archive
Hurricane Sandy

Can the cloud withstand a 'perfect storm'?

This article has been updated to clear up confusion, pointed out by a couple of readers, between Amazon's availability zones and regional centers.

Cloud computing providers such as Amazon Web Services are facing another test of reliability as Hurricane Sandy moves up the East Coast toward a date with two other significant weather systems. Weather forecasters are predicting a strong, wide, slow-moving storm that is likely to hit hard even in areas not in the storm’s direct path.

Even in areas that won’t see the worst of the storms (as of this writing, New England seems ticketed for the heaviest hit), a couple days of heavy rains and high winds are expected to take down trees and power lines, taking websites and Web services with them in some places. It’s nothing new — AWS, for example, lost power to its Northern Virginia data center during June’s derecho storm in in the Mid-Atlantic states, and with it service to some high-profile sites.

AWS, which claims 300 U.S. government agencies and 1,500 educational institutions as customers, has had other hiccups at its Northern Virginia center, including losing service in April 2011 to a remirroring problem, suffering a partial outage in June, and being hit with another outage earlier this week. Microsoft, another large cloud provider, has had a few problems of its own.

Outages happen. But can agencies that rely on cloud services do anything about it?

One way with AWS is to take advantage of Amazon’s multiple availability zones, distinct locations within a regional hub that are insulated from each other, or even to spread services across more than one of AWS’ eight geographic regions. So if service is lost in one zone or region it can be switched to another. Such services do come at a price, so for agencies, the question is whether the uptime is worth the extra money.

The Recovery Accountability and Transparency Board has such an agreement with AWS for its Recovery.gov site, which allowed it to avoid the April 2011 outage while other sites went down, sometimes for days.

Agencies also could use multiple cloud providers, available through cloud brokerages, which gives them more flexibility in the event one provider loses service. And they other options, including their own private clouds, hybrid clouds or a cloud service provided by another agency.

The string of outages over the past 18 months or so has brought some attention to cloud’s reliability, but it’s not likely to stop agencies from moving to the cloud. The latest storm is just one more example of why, as with any IT system, a backup plan is necessary.

Posted by Kevin McCaney on Oct 26, 2012 at 9:39 AM


Reader Comments

Tue, Nov 13, 2012 Tech Marketer LA

Nice article. Today’s businesses don’t spend all their time thinking about the worst-case scenario. Still, with the number of natural disasters such as hurricanes and wildfires in just the last year alone, many companies are beginning to recognize that living without a disaster recovery plan is truly risky behavior. Here's an interesting article talking about Cloud storage, backup and disaster recovery's importance in case of hurricanes or superstorms: http://www.dincloud.com/blog/cloud-backup-disaster-recovery-vs-hurricane-superstorm-and-more

Wed, Oct 31, 2012

The real question is how does a particular cloud provider architect, deploy, and operate their systems. Each is different. You should choose your provider based on having geographic separation as a default with no scheduled downtime for maintenance, patches, and upgrades. Compare each providers uptime historical statistics being careful to compare apples to applies in how it is calculated (some are more transparent than others posting it publicly) to legacy on premise systems in terms of total uptime, availability, and mean time to recover. How do you know if your disaster recovery solution is as strong as you need it to be? It's usually measured in two ways: RPO (Recovery Point Objective) and RTO (Recovery Time Objective). RPO design target should be zero, and the RTO design target is instant failover. This is extremely costly and hard to do with traditionally onsite hardware, software, and bandwidth challenges. Plus many companies found theirs didn't even work under real conditions.

Sun, Oct 28, 2012

This article confuses Amazon Regions and Availbility Zones. AZs are close to each other and are not geographically redundant.

Sat, Oct 27, 2012 Jesse

Just felt I should point out that availability zones do *not* provide geographical redundancy. They provides redundancy within a region (i.e. within the Herndon, VA data center) in that they divide the data center into constituent set of instances.

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

resources

HTML - No Current Item Deck
  • Transforming Constituent Services with Business Process Management
  • Improving Performance in Hybrid Clouds
  • Data Center Consolidation & Energy Efficiency in Federal Facilities