I had this question from a viewer of my Pluralsight course, Implementing the Reactive Manifesto with Azure and AWS, and thought I’d publish the response.
So why would you dual-run your cloud app by hosting it on Azure and AWS? Sounds like a lot of extra development and management overhead. Well the most compelling reasons are reliability and portability.
In 2012 I was working for a client who was making a big investment in the cloud, and at the end of the year we published their first external API for business partners. It was hosted in Azure and used some really nice features to route back into existing on-premise services.
We were able to publish a clean, simple API to partners, and hide away the underlying complexity of the internal services while still leveraging them to do all the work.
Two days after we went live, we were hit by the Azure SSL certificate expiry outage, and our API was unavailable for the best part of 3 days. Fortunately we had planned a gradual roll-out to partners, so the impact was minimal, but we’d been intending to ramp up quickly, and if the outage had happened a week or two later we would have been in a very bad place. Not least because our app could only run on Azure, we couldn’t package it up for another service without going back and reworking the code.
More recently AWS had an issue with a networking device in one of their data centres which caused an outage that took the best part of a day to resolve.
In both scenarios the SLAs are worthless, as you’ll get back a small percentage of your cloud expenditure, which is going to be negligible compared to your costs in dealing with the outage. And if your app is built specifically for AWS or Azure then if there’s an extended outage you can’t just deploy it onto a new set of kit from a different supplier.
And the chances are pretty good there will be another extended outage, both for Microsoft and for Amazon. But the chances are small that it will happen to both at the same time.
So my basic guidance has been: ignore the SLAs, go for better uptime by using two clouds. As soon as you need to scale beyond a single instance, start by scaling out to another cloud. Then scale out to different data centres in both clouds. Then you’ve got dual-cloud, quadruple-datacentre redundancy, so any more scaling you need can be left to the clouds to auto-scale themselves.
By running in both clouds, you’ve made your app portable, so in the highly unlikely event that both AWS and Azure go down in multiple regions, you’ll have a deployment package which will let you spin up a new stack on yet another cloud, without having to rework your solution.