Storm cripples Amazon cloud, but do sites have to go down, too?
When a high-powered derecho storm swept through the Mid-Atlantic states June 29, knocking out power for more than a million people in the Washington Metro area and about 4 million overall on the East Coast, it took a piece of the cloud with it.
But did it have to?
Amazon Web Services lost power at its Northern Virginia data center, which knocked out cloud services for a host of sites, including high-profile site such as Netflix, Pinterest and Instagram, which were down for several hours.
Amazon cloud crash keeps Energy site offline
Google, Microsoft cloud crashes: Is this the new normal?
AWS also has 187 government customers, according to a report in the New York Times, and although Amazon didn’t give a list of affected sites, it said on the afternoon of June 30 that most of the down sites had been restored. By Monday morning, AWS’s Service Health Dashboard reported that all of its services were operating normally.
Amazon, like other cloud providers, has had its share of outages. It suffered a partial outage June 14 at its Northern Virginia data center that affected some of its customers. In August 2011, a lightning strike in Dublin took out some European cloud services for both Amazon and Microsoft.
And in April 2011, what the company called a “remirroring storm” took down the Virginia center and with it sites such as Reddit, Quora, HootSuite and at least one Energy Department research site.
The latest outage is likely to fan the debate about cloud’s reliability. But although power outages happen, sometimes with a vengeance — on Monday, hundreds of thousands in the Washington area were still without power, and power companies said it could take until the end of the week to restore everyone’s service — sites using cloud services don’t have to go dark.
Amazon offers geographic redundancy to its customers, so that when services at one of its data centers is disrupted, operations can be backed up to another. The hitch is that it can double the costs of services, according to an article on ZDNet.
The question is whether Amazon and other cloud services should provide redundancy as part of their package, or whether it should be up to customers to decide if 100 percent uptime is important enough for them to spend the extra money.
Intelligent Business Research Services advisor Jorn Bettin told ZDNet that he thinks it should be up to the customers. "They [customers] could operate at a higher level of redundancy, so that these sort of outages would only have a minimal impact on them. It's a matter of cost," Bettin said.
On a large scale, he said it would be a “dangerous proposition” if everyone relied on a provider such as Amazon to manage all of their risk.
For customers, it may come down to affordability. Start-ups might not have the money to pay for redundant connections, for instance, though established companies (like Netflix, perhaps?) would likely be able to afford it. For government agencies, it could be a combination of budget and how critical their services are.
After the remirroring crash in April 2011, the Energy site, OpenEI.org, a site that encourages public participation in clean energy research, was down for two days. But the Recovery Accountability and Transparency Board’s Recovery.gov, hosted at the same Virginia site as OpenEI.org, had a redundancy plan in place and stayed in operation.
And of course, for government agencies that have to meet the requirements of the Office of Management and Budget’s “cloud-first” and digital government policies, the mission-critical status of a program could determine whether it moves to the cloud at all.