Is Amazon's cloud crash a cautionary tale for government?
Problem at EC2 data center takes Reddit, Quora and other popular sites off-line for an extended stretch
- By Kevin McCaney
- Apr 21, 2011
Federal agencies are moving some online operations to the cloud, whether they like it or not. After Amazon Web Services crashed April 21, they might like it a little less.
Amazon’s Elastic Compute Cloud, which provides hosting services for a growing number of Web operations, suffered an extended outage at one of its data centers, starting in the early morning and extending at least into the afternoon. It took some popular Web 2.0 sites with it. Several federal sites hosted by EC2 appeared to be unaffected.
The outage hit at 4:41 a.m. Eastern Time at Amazon’s data center in Northern Virginia and brought down EC2 customers such as Reddit, Quora and HootSuite. At 3:10 p.m., Reddit was partially up in emergency read-only mode in which some submissions were displayed but users could not log in. Quora and HootSuite were still down.
Agencies, choose your clouds -- here are the 3 basic options
At last, a solid definition of what a cloud looks like
In a series of updates on AWS’ status dashboard, Amazon said it was experiencing connectivity problems and latencies in the eastern United States.
Shortly before noon, Amazon said a “networking event early this morning triggered a large amount of re-mirroring of [Elastic Block Storage] volumes,” which provide storage for the company’s cloud customers. The re-mirroring created a shortage of capacity and made it “difficult to create new EBS volumes,” the update said.
About 1:30 p.m., Amazon reported “significant progress” but could not estimate when all the affected storage volumes would be recovered. As for the cause of the problem, the update noted that “we always know more and understand issues better after we fully recover and dive deep into the postmortem.”
The incident could be of concern to federal agencies, which are under orders from the Office of Management and Budget to move some of their operations to the cloud. OMB’s cloud-first policy, part of its 25-point plan for IT reform, requires agencies to move three applications to the cloud within 18 months.
Commercial cloud hosting services are one option for agencies. Earlier this year, the Treasury Department moved its Treasury.gov and other public-facing websites to EC2. Those sites, including MyMoney.gov and the IRS Oversight Board's site, were working fine during the recent crash.
Any system or website faces the possibility of an outage, whether it’s cloud-based or not. But when a cloud provider goes down, it can take its customers with it. One of the concerns about cloud computing has been the implications of having a single point of failure for multiple sites or services. Amazon’s troubles could provide ammunition for that argument.
Kevin McCaney is a former editor of Defense Systems and GCN.