Minimize your Cloud Debt

Technical debt is a real liability. If you have spent any time working at a company that relies on technology, you have most likely felt the impacts of technical debt. It comes in many forms and is difficult to identify in many cases, but one thing is for certain – it has compounding costs. As organizations move infrastructure to the cloud, they have the opportunity to eliminate existing debt, but also risk creating it. Many newer companies that have leveraged the cloud since their inception have unknowingly, or often knowingly, accrued cloud debt.  

The primary business drivers for leveraging the cloud are reducing cost, decreasing time-to-market, and gaining business agility. We should be capitalizing on the opportunities available to maximize gains, especially when they require little up-front investment. In fact, moving to the cloud is often part of large modernization efforts intended to eliminate decades of existing technical debt. Why start off on the wrong foot? 

What is cloud debt? 

Immediate Debt: Lack of Automation 

Failure to automate infrastructure, configuration, or delivery pipeline creates cloud debt in the form of maintenance costs, time, and risk. While it’s not a cloud-specific deficit, it is typically a much lower investment to automate in the cloud, as opposed to on-premises. Essentially, every step that requires manual intervention is costly and risky. 

One of the wonderful perks of the cloud is that resources can be provisioned and managed using clean, well-documented APIs. This shift enables programmable infrastructure (Infrastructure-as-Code) through tools like Terraform, Heat, AWS CloudFormation, and Azure Resource Manager. You have the opportunity to code immutable infrastructure declaratively (including databases, load balancers, and network), and manage it using existing version control systems. There should be no excuses when it comes to exploiting this convenience, but too many do not.  

Additionally, tools such as Puppet, Chef, and Ansible have gained IaC capabilities, but are most notably configuration management tools designed to install and manage software on existing servers/VMs. Again, these tools allow you to write and maintain code that describes the configuration of your infrastructure. 

Another opportunity for automation is in your delivery pipeline. By using IaC and configuration management tools, you enable your cloud infrastructure provisioning and configuration to be triggered by a code change. Popular Continuous Integration tools like Jenkins, TeamCity, TravisCI, CircleCI, and Gitlab allow you to trigger provisioning and configuration of required infrastructure based on a code artifact. We’ve worked on projects rebuilding the entire environment for the development and testing of a feature, then tore it down once the feature was merged, without any manual intervention. These types of environments minimize bottlenecks in your code delivery process and empower engineers to focus on delivering business value. 

Have you automated your cloud infrastructure or delivery using any of these popular tools and practices? If not, you are likely hauling around a large amount of debt.  

Future Debt: Old-School Architecture 

Building cloud environments with a dated architectural mindset can also be costly. Assess your migration path for all applications and data with a cloud architect. Lift-and-shift cloud migrations are sometimes a good starting point, but never the ideal solution. To reap all the benefits and eliminate future costs utilize PaaS offerings, adopt modern architectural patterns, and refactor to enable scaling and redundancy. 

Public cloud offerings today are largely focused on PaaS offerings (I am including DBaaS, FSaaS, FaaS, MLaaS, etc.) to eliminate the need for you to manage any part of your infrastructure and supporting software. Unless your business requires complete control over the operating system required to run your applications, it is usually much more cost-effective to use PaaS offerings. By doing this, you eliminate the need to manage OS/software patches, scaling, distribution, and resiliency of the platforms supporting your applications and data. 

Microservices – It’s not just a buzzword (even though it is that, too). Driven by Google, Netflix, Amazon, Uber, eBay…you get the point…there is a discernable need for microservices architecture as technology in the cloud becomes a standard for businesses. There is a level of additional overhead in managing microservices, but the flexibility, speed, and scalability gains are overwhelming. Along with all the benefits of microservices comes new challenges, forcing us to solve problems with highly distributed, loosely coupled systems. Companies like Netflix paved the way in open-sourcing a plethora of tools to help offset this, and since, many others have followed. Some of the best tools can be found at the Cloud Native Computing Foundation. Designing a loosely-coupled, business-function-focused system promotes a healthy organizational structure as well, by empowering teams to own very specific functions with less bureaucracy. With that said, I would consider a traditional monolithic architecture in the cloud to be a serious form of cloud debt that is difficult to measure, but very expensive. 

Lastly, you could be missing key components of your architecture entirely, simply due to lack of education. There are many improvements that can be made in your architecture, with the added benefit of on-demand availability. Different messaging services for different purposes, load balancers in the right places, and global distribution of static or cacheable assets are just a few of the benefits that may be overlooked. Using these services can drastically improve resiliency and performance. Before (re)architecting your systems in a cloud environment, you should educate your architects or acquire outside consulting. 

Do people in your organization scoff at the mention of “Microservices”, “DevOps” or “Containerization”? If so, you’re likely accruing cloud debt with the potential for expensive failures.  

Potential Debt: Improper Governance 

One of the main benefits of the cloud is that it is self-service. This is a double-edged sword. We have historically gated our environments heavily and minimized operational overhead by constraining the platform. For instance, solely using Oracle or IBM WebSphere limits the tools for which we need compliance, security, and risk management. We now live in a wonderful world where we can easily use the best tool for each specific job. This is a blessing but can be a curse without proper governance. I refer to this type of debt as “potential debt” because it relates to potential risk for your business and cannot be quantified as a generalization.  

In order to enforce a cloud governance strategy, policies must be in place to ensure consistent methods for monitoring, logging, and tracing (a.k.a. Observability). Without a streamlined understanding of performance metrics, process telemetry, or logging methods, it is impossible to centralize results and take action when necessary. There are several ways to achieve this through modern, distributed architectural patterns and tools. There are too many to list, but using Docker and Kubernetes opens up a plethora of options. 

When it comes to customer satisfaction or continuous operations, some of the most important challenges are resiliency and scalability. These impact performance and uptime which are not optional concerns in this day and age. A proper governance policy should require certain standards for redundancy and disaster recovery from the very beginning, such as retention and replication requirements. Applications should also define their own scaling requirements and be enforced by the governance policy. 

Security is usually top-of-mind when it comes to governance or the cloud in general. I truly believe that the cloud is more secure than your on-premises environment unless you are a cybersecurity company. To avoid that debate, I want to clarify that there is a separation of responsibility when it comes to cloud security, and the security vulnerabilities almost always exist in the customer’s process and lack of governance. There are many ways to leverage the cloud, but there is always a need to secure any interaction with it, regardless of whether it is IaaS or PaaS. A proper cloud security governance model should be proactive and iteratively improved to meet platform requirements.  

Creating a governance policy isn’t enough. It must be enforced through automation and tested. Combining IaC with cloud governance policies enables you to thoroughly test that new changes do not create new vulnerabilities. It allows you to validate your backup strategy and make sure your disaster recovery plan works. It allows you to make sure your systems can scale to meet potential business demands and save you money when the demand isn’t there. 

Do you have a cloud governance policy in place with policy automation and integrated testing? If not, there is a large potential for future debt or severe damage to your company’s reputation. 

What next? 

The technology landscape has drastically changed over the past few years. We need to adapt with it to remain competitive in today’s market. A “cloud-first” or “cloud-native” architectural approach should be taken to future-proof our businesses. The cloud landscape is colossal, and individuals that are not embracing it will fall behind. Build cross-functional teams with mixed expertise in cloud technology and software engineering to maximize your output and minimize cloud debt. Get outside help from industry experts if your organization needs it, as the ROI can be immense. Build a governance strategy from the beginning with a plan for enforcing it. 

All in all, the combination of automation, modern architecture, and an enforceable cloud governance strategy can curtail your cloud debt, reducing costs and risk to your business.