The thing that wasn’t supposed to happen, actually happened last week when one of Amazon’s storage functions known as Amazon Simple Storage Service (S3) failed. According to Amazon, the failure was due to human error that had a cascading impact on servers that hadn’t been restarted in a while.
The complexity of S3 and the inability to quickly restart that service calls into questions a couple of things for Amazon’s offering.
- Does Amazon have technical debt in their own platform that will continue to cause outages as that platform grows?
- Is Amazon using best practices for DevOps? How could a manual change cause this big of an impact?
The bigger questions loom for the rest of the technology industry. Was this an isolated event? Should this failure change your cloud strategy?
We generally see customers fall into one of two camps regarding cloud:
The All-In Camp: “We are all-in on Amazon and believe the benefits outweigh the risks. We will consume all their services and we aren’t worried about vendor neutrality and interoperability.”
Vendor Neutral: “We are using Amazon but we want vendor neutrality and use best of breed services from a variety of software vendors. We want to avoid proprietary services provided by cloud vendors and be able to switch providers with little pain.”
If you fall into the first category, how will your team deal with these issues? Did you have a DR strategy for your services? Was your team using load balancing across geographic regions? The All-In-Camp is faster to market and typically cheaper if services are managed correctly. (Check out our blog: How much is the cloud really, for more on that.)
The risk to pricing increases and massive system failures is always there. If you are All-In, you must find ways of managing the risk and leverage automation to move servers into other regions as failures occur. These strategies need to be fleshed out and tested.
If you prefer the Vendor Neutral route for cloud deployment, your investment in planning, architecture and management for your cloud is critical. Instead of implementing proprietary services, you will have to implement your own components using open source or software from other vendors. Those licenses will need to be managed and your ability to move providers must be tested for all components in your technology stack. How can you use containers and container management to abstract the underlying technology to make moves easier and to automate the cloud deployments based on rate and service?
Whatever your strategy may be, make sure you have a DR plan and contingency for your services. As we witnessed with Amazon last week, cloud providers aren’t perfect, our false sense of security with their service levels was exposed and the potential for impact to our businesses is significant.