Perspective on Cloud Uptime and Availability

When I served as Director of IT Systems at Insight, I put a lot of thought into what Service Level Agreements (SLAs) I should offer to internal system users. I finally settled on what IBM Global Services (IGS) offered us when we discussed outsourcing with them since I figured if I could do as well as IGS, then nobody could really present much of an argument with it.

IGS offered a 99.6% uptime (Calculated Monthly) for non-high availability systems and 99.99% for HA systems (Clustered or other HA technology). That is 2 hours and 53 minutes of downtime a month at 99.6%. So the fact that many Cloud service providers only offer three nines, 99.9%, should not really concern most people. 99.9% uptime allows for just 43 minutes per month of downtime and that is only if there is an outage. In most instances, the expectation is that systems are available 100% of the time but the SLA does allow for some outages before credits are given.

Five nines, 99.999% uptime, is the gold standard for the absolute most critical systems and every component has to be engineered for five nines in order to be able to achieve it. It does no good to have servers that are designed for five nines if the network, Internet, load balancers, data centers, etc., are not designed to support it. Five nines is really a target that is seldom achieved.

As more companies try to figure out where “Cloud” fits into their business and IT strategies, SLAs are one of many concerns to be considered along with security, costs, compliance, etc. It is important that SLAs be put in the proper context in the discussion. Many companies won’t even know what their internal SLAs are in comparison. Even if they measure it and achieve 99.99% uptime, it does not mean that their systems are engineered to the percentage of availability that they are achieving. You can achieve 100% uptime on a system engineered for 99.6% if you were lucky and did not experience any hardware or software failures for the period of measurement.

SLAs are just one of many factors that should go into choosing whether to go to the cloud and which provider to utilize. The conversation around cloud SLAs need to be kept in the proper context with potential customers since SLAs should not be the focus of any conversation around cloud offerings unless uptime and availability is absolutely critical to their business operations.

There has been a lot of focus on uptime and availability of cloud service providers since when there are service disruptions, they are publicized like plane crashes. The facts tell you that you are safer on a plane than other modes of travel, but emotions tell you that if there is a problem, it is a long way down.

Resources
Uptime Calculator - http://easyuptimecalc.com