Microsoft admits network outages violated service agreements

Microsoft this week disclosed that its Business Productivity Online Services (BPOS) had three service outages that affected BPOS customers in North America in August and September.

Morgan Cole, a Microsoft employee, admitted in a blog post that BPOS had service outages on Aug. 23, as well as on Sept. 3 and Sept. 7. All of the outages were associated with a Microsoft network upgrade effort, which initially knocked out the service for two hours on Aug. 23. A fix led to additional problems in September, including problems with the "sign-in service and administrative portals," Cole explained.

On Sept. 7, Microsoft had a problem with BPOS that had "more widespread customer impact, although the duration was relatively short," Cole stated, without explaining the nature of the problem. Microsoft is currently monitoring this situation after isolating "suspect traffic," according to the blog post.

A comment in that blog post by "Jim Glynn" indicated that Microsoft has credited some of its customers affected by the Aug. 23 BPOS outage. However, Glynn noted that customers should contract their BPOS representative to request compensation afforded by Microsoft for not meeting its service level agreement (SLA).

Uptime is a prime consideration for organizations using software-as-a-service (SaaS) applications instead of the more traditional customer premises-installed solutions. Microsoft's "all-in" organizational move to the cloud, providing services to businesses and organizations, hangs on meeting its SLA agreements. However, SLAs don't change the fact that businesses using hosted applications will be dependent on external infrastructure that they do not control.

Compensation for not meeting the SLA may not be equivalent to the costs of lost business time, but it's the common practice for service providers, according to Robert Mahowald, research vice president for SaaS and cloud services at the IDC research and consulting firm.

"Web applications rely on access to the Internet, which of course adds another potential weak link in the chain of getting access to information and functionality," Mahowald explained in a phone interview. "But it's pretty much common practice for SaaS providers to guarantee 'three nines' of uptime ... which is about 28 hours a year in which they will not be accessible. Most of that is supposed to be scheduled downtime. Essentially, it's pretty much common practice for providers to pay service credits in recompense for the lost opportunity and to not pay any monetary fine."

The two-hour outage on Aug. 23 appears to have violated Microsoft's SLA guarantee for BPOS applications. BPOS is expected to be available 99.9 percent of the time per month, or as Microsoft's FAQ specifies: "Microsoft provides a 99.9 percent uptime Service Level Agreement for Exchange Online, SharePoint Online, Office Live Meeting and Office Communications Online."

BPOS users are credited based on a calculation of the monthly uptime percentage, according to Microsoft's Exchange Online SLA document. If the service availability dips below 99.9 percent, then the service credit is 25 percent of the monthly service fees. If it dips below 99 percent, Microsoft pays out 50 percent of the monthly service fees. Lastly, Microsoft pays the full monthly service fee if the service availability dips below 95 percent.

Microsoft informs its BPOS customers and the public about BPOS uptime problems via a "Microsoft Online Service Notifications" RSS feed. According to that feed, Microsoft restored services on Sept. 7 for multiple applications, including Exchange Online, SharePoint Online, Office Live Meeting, Office Communications Online, plus a few others.

Microsoft had planned to conduct maintenance on some of its BPOS services on Sept. 11 in its North American data centers. However, the company has now postponed its network upgrade plans, according to the RSS feed.

Mahowald wasn't aware of any disaster scenarios for SaaS providers, but the prospect is "bound to happen," especially for educational institutions and governments that may have outsourced important operations by relying on SaaS. In such cases, SLAs will become even more important.

"It's an incredibly important issue to understand. It's no longer about simply saying, on a functional basis, 'does your application do what mine does and what's the price,'" Mahowald said. "I think understanding the SLA behind it and actually having some teeth in the SLA is going to become an even more important distinction than it is right now — perhaps more important than price as you go up the chain with mission-critical applications."

About the Author

Kurt Mackie is senior news producer for the 1105 Enterprise Computing Group.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected