How to Avoid a Cloud Calamity

By

Today we’re relying more and more on cloud platforms and it couldn’t have been more apparent than the recent AWS outage which impacted many internet sites and services this past week. Our teams were impacted as we rely heavily on services like Slack, Jira Cloud and many more. While they didn’t all go down we definitely had disruptions with various tools and it made me realize, how can this be?

The whole purpose of leveraging cloud is to improve the management and uptime of your platform right? Learning the whole calamity was the result of a user related typo got me thinking, how are these services we’re using configuring their environment to ensure our beloved apps stay online?

Being realistic I know guaranteeing 100% uptime isn’t possible, but surely we can avoid a major outage when a single AWS region goes south; or in this case east. The US-EAST-1 region appears to be the only one impacted by the user error and with the United States having four availability zones our SaaS providers leveraging AWS as a IaaS architected without high availability seems silly and careless.

So what could they have done to ensure our money well spent for their services didn’t have such high impacts? Easy! Design with High Availability in mind and leverage the true power of the cloud. Why use such a flexible and powerful hosting environment if you’re not going take advantage of its inherent capabilities? Unfortunately for companies selling us services there may not be a good answer for that because it’s likely they wanted to make top dollar, get us hooked and relying on fragile-ware.

If you are going to the cloud and you want to do better than those who may have wronged us during this outage you’re in luck. Cloud providers provide many capabilities we can leverage to ensure our application uptime. AWS for example has availability zones and adding a GTM “Global Traffic Manager” in front of ELB “Elastic Load Balancers” across a few zones alone can save you from such an outage. The beauty of is you don’t have to spend too much more to have High Availability if you leverage zones in an active passive mode and automate your server stand up and configuration.

If you found your organization was impacted by the outage and want to assess how your cloud management strategy should evolve, let’s talk to determine how you can avoid a future cloud calamity.

About The Author

Andrew Duncan is the Director of Software for Richmond. He is a driven technologist focused on modern technology stacks and best practices. Andrew believes nothing is more rewarding than making software needs a reality with a focus on flexible, scalable and supportable code.