Google, Microsoft cloud crashes: Is this the new normal?

Purveyors of cloud computing can promise users a lot of advantages, but one thing they can’t brag about is uptime.

In recent days, both Google Docs and Microsoft Office 365 suffered outages, adding to a string of incidents in which public clouds have gone dark.

Google Docs was out of service for about an hour on Sept. 7, the result of a “memory management bug” that was exposed after Google made a change to improve real-time collaboration in Google Docs, according to a blog post by Alan Warren, engineering director for Google Docs.

A day later, Microsoft suffered what it called a Domain Name System issue that knocked out services for several hours worldwide for Office 365, Hotmail and SkyDrive. The company eventually restored the services after updating its DNS configurations, according to a company blog.

The incidents followed disruptions of part of Microsoft’s Business Productivity Online Suite earlier this year and a crash in April of Amazon’s Elastic Cloud Compute service that took some popular social media sites off-line for a day and one Energy Department collaboration site for nearly two days.

And aside from software trouble, cloud data centers in Europe run by Amazon and Microsoft were taken off-line by lightning in August.

Outages of cloud services are becoming “a little bit of par for the course,” IT analyst Paul Burns told Jon Brodkin of Ars Technica. Burns, whose company uses Office 365, acknowledged that IT outages will happen from time to time, whether in the cloud or not, but he told Brodkin that he thought they were happening “a little too often” with Office 365.

The federal government is making a major push into cloud computing as part of former Federal CIO Vivek Kundra’s Federal Cloud Computing Strategy, which calls for about 25 percent of federal IT spending to go to cloud computing.

Kundra, who left government in August and has been replaced by former Federal Communications Commission Managing Director Steven VanRoekel, had put forth a plan for reducing the number of federal data centers and gave agencies an 18-month window to identify applications that could move to the cloud.

Agency IT officials have been debating whether it’s best to build private clouds, use one provided by another agency or go to public cloud providers such as Amazon, Google and Microsoft.

The decision can depend on the applications and the sensitivity of the data, but the public cloud has been a popular choice for e-mail. In July, the General Services Administration competed the transition of 17,000 e-mail users to Google Apps for Government. The National Oceanic and Atmospheric Administration, among other agencies, is also transitioning to Google Apps, while the Army is moving its e-mail to a private cloud hosted by the Defense Information Systems Agency.

In his blog post, Google’s Warren wrote that the company was analyzing what caused the recent outage with the goal of learning to prevent similar outages and learning how to identify, respond to and remediate outages more quickly.

So it might be that cloud providers will get better at this as they go along, and outages will be fewer and shorter-lived. But agencies moving to the cloud might also have to consider that the occasional outage is just something they have to live with.

About the Author

Kevin McCaney is editor of Defense Systems. Follow him on Twitter: @KevinMcCaney.

Reader Comments

Fri, Sep 16, 2011 five media Pakistan

When a cloud service goes down, everyone can hear you scream. But that's an advantage to this type of platform, not a disadvantage. And it means that problems get fixed quickly read this article to see how this threat becomes an opportunity http://cloudtechsite.com/blogposts/analysts%E2%80%99-remark-cloud-is-liable-to-disruption-just-as-any-other-computing-system.html

Thu, Sep 15, 2011

ASA probes Microsoft cloud reliability claims http://www.theregister.co.uk/2011/09/14/asa_microsoft_cloud_compliant/

Wed, Sep 14, 2011

Actually, yes, some cloud services have demonstrated actual higher average uptime over the course of a month, quarter, half year, and year when compared to costly, traditional client/server on premise architectures, operations, and maintenance. This article is comparing a one hour outage of one application that is part of a larger suite that was still online and available to that of all services from another provider being unavailable for hours. This is an apples to oranges comparison. What are the actual uptime statistics for these services over the last year? That is a more relevant indicator of reliability than reporting of single incident outages. With true cloud computing architectures and delivery service models, there is no scheduled maintenance, no desktop configuration dependencies, no patch management, and is independent and agnostic when it comes device, browser, and OS to give users, companies, and the government more lower cost choices and options. Using data driven analysis, you should be able to easily compare a cloud provider's uptime to your current services to include all scheduled and unscheduled downtime. Some cloud providers actually post publicly their dashboard statistics for all to see and evaluate. Others tend to hide this data and only make available to their customers on restricted portals or under NDA. Demonstrated uptime track record and transparency are good. These should be criteria when selecting any IT service.

Tue, Sep 13, 2011 TM

Yes, outages can occur anywhere in the system. However, there's a huge difference between a switch or a server going down affecting a limited group of people and a "cloud crash" putting a stop to the entire organization. An SLA won't do anything for lost productivity or missed opportunities.

Cloud computing has some good points, but here's one not falling for the hype, hysteria and problems.

Tue, Sep 13, 2011

Six federal data centers should be be sufficient to build a "private government cloud". The hardware solution is here: http://opencompute.org/ The Software-configuration could be this: Cloudstack - Virtual Machine Management Enstratus -Auto Scaling Splunk, CAMS & Nagios - Log, VM & System Monitoring Chef -Systems Automation Framework Nexenta - Storage Control Virtual User Images Applications & Guest OSs Zero Clients and Linux virtual user images, Open Office, and other Open Source applications! This means: Replacing all "MeTooSoft" technologies to save a lot of tax-payers money and have everything needed.

Show All Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above