Is Amazon's cloud crash a cautionary tale for government?

Problem at EC2 data center takes Reddit, Quora and other popular sites off-line for an extended stretch

Federal agencies are moving some online operations to the cloud, whether they like it or not. After Amazon Web Services crashed April 21, they might like it a little less.

Amazon’s Elastic Compute Cloud, which provides hosting services for a growing number of Web operations, suffered an extended outage at one of its data centers, starting in the early morning and extending at least into the afternoon. It took some popular Web 2.0 sites with it. Several federal sites hosted by EC2 appeared to be unaffected.

The outage hit at 4:41 a.m. Eastern Time at Amazon’s data center in Northern Virginia and brought down EC2 customers such as Reddit, Quora and HootSuite. At 3:10 p.m., Reddit was partially up in emergency read-only mode in which some submissions were displayed but users could not log in. Quora and HootSuite were still down.


Related stories:

Agencies, choose your clouds -- here are the 3 basic options

At last, a solid definition of what a cloud looks like


In a series of updates on AWS’ status dashboard, Amazon said it was experiencing connectivity problems and latencies in the eastern United States.

Shortly before noon, Amazon said a “networking event early this morning triggered a large amount of re-mirroring of [Elastic Block Storage] volumes,” which provide storage for the company’s cloud customers. The re-mirroring created a shortage of capacity and made it “difficult to create new EBS volumes,” the update said.

About 1:30 p.m., Amazon reported “significant progress” but could not estimate when all the affected storage volumes would be recovered. As for the cause of the problem, the update noted that “we always know more and understand issues better after we fully recover and dive deep into the postmortem.”

The incident could be of concern to federal agencies, which are under orders from the Office of Management and Budget to move some of their operations to the cloud. OMB’s cloud-first policy, part of its 25-point plan for IT reform, requires agencies to move three applications to the cloud within 18 months.

Commercial cloud hosting services are one option for agencies. Earlier this year, the Treasury Department moved its Treasury.gov and other public-facing websites to EC2. Those sites, including MyMoney.gov and the IRS Oversight Board's site, were working fine during the recent crash.

Any system or website faces the possibility of an outage, whether it’s cloud-based or not. But when a cloud provider goes down, it can take its customers with it. One of the concerns about cloud computing has been the implications of having a single point of failure for multiple sites or services. Amazon’s troubles could provide ammunition for that argument.

Reader Comments

Mon, Apr 25, 2011 Barbara Duck Los Angeles

It did bring down a cardiac monitoring system that had no back up. You can read more at the link as I blogged it. Also, Amazon was nowhere in sight to help and this is a reality today when things go bad in the masses, customer service is very difficult in these situations, even noted on Twitter by Robert Scoble. http://ducknetweb.blogspot.com/2011/04/what-happens-when-cloud-server-goes.html

Sun, Apr 24, 2011

All cloud services are not equal in terms of capabilities, redundancy, COOP/DR, network performance, content distribution, and other germane attributes including expertise and track record, just as not all government agencies or departments have the same degree of robustness from their onsite managed service providers.

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above