Scattered clouds, with a chance of thunderstorms

Purposeful View: Amazon Cloud Outage (by Gene Dahl, 27 April 2011)

microscope

Last week, Amazon sustained a serious service interruption with its Elastic Compute Cloud (EC2) service, lasting nearly two days. Not all of Amazon’s subscribers were adversely affected; for those who were, however, it was a time of frustration and helplessness. This has generated some entirely predictable responses tolling the “death of the Cloud”, or at least the prospect of a pervasive slowdown in enterprise adoption of Public Cloud services. In our view, however, this event has simply underscored certain factors which, from the outset, have been critical to Cloud adoption in general and Public Cloud adoption in particular.

In many cases, there’s been a pervasive general impression of Cloud Services as dirt-cheap and infinitely scalable. Like anything else that sounds this good, it’s never been that simple. We at Purposeful Clouds hold the conviction that economical, easy-to-use Cloud Services are indeed attainable – provided your reliability needs are not very demanding. Consider the highly-publicized basic services as an entry-level model: they provide a functionally correct environment, but few promises are possible where operations are concerned.

As with any other part of a key operation – computers, building power, call center facilities, etc. – it’s incumbent upon the enterprise to develop a clear understanding of how much unplanned down time is acceptable to the business, and create a clear and actionable plan for dealing effectively with the inevitable service interruption, within acceptable cost parameters. This is not new to IT; the Public Cloud just adds another “utility” to be handled. In some cases, a premium service or a special arrangement with the service provider may be appropriate; this appears to have been the case for those who subscribed to Amazon’s FailSafe options, who were not severely affected by this event.

In other cases, where the business impact of an extended service outage would be catastrophic, it will be up to the enterprise to formulate alternate-operation scenarios that can be invoked under the complete control of the enterprise itself, or at least without recourse to the original service provider. They are presumably dealing with a widespread disaster or even a suspension of operations, and certainly have little incentive to help you shift to another provider. This is as true for Cloud Services as it’s always been for the corporate data center; there are just many more options.

It really is just like other forms of insurance. Sometimes you can do without it – if loss of the underlying asset will not cause too much hardship. Typically, Test and Development in the Cloud might fall into this category; informational or promotional sites also might, but this should be a subject for further analysis.

For those assets or operations that you really can’t do without, it’s important to have an acceptable level of “replacement value” insurance – something that enables you to continue an adequate level of operation after a non-cataclysmic period of service interruption. Service Level Agreements are important as a gauge of the Service Provider’s level of confidence and commitment, but being compensated with a fraction of a month’s usual charge will be scant comfort if you’re unable to generate revenue or ship product.

For those things that matter most, you might just need to be self-insured.

Comments? Questions? Contrary views? Some event we missed?
We welcome your feedback at talk@purposefulclouds.com

Purposeful Clouds helps companies assess and plan their best options for Cloud technology adoption, with before-the-fact consideration of contingencies, ROI, and further migration strategies. To discuss how we would be able to help you make the best decisions, contact us at info@purposefulclouds.com.

Download the View.