As a DevOps/Cloud Engineering professional, and human being, I will make eight mistakes for every 100 words typed. This means I make hundreds, if not thousands, of mistakes each week.
So how do I catch my mistakes? I would like to say I write good unit and integration tests for my infrastructure-related code and have over 90 percent code coverage, but this would be a lie.
In fact, if you're like most DevOps and cloud engineering professionals, you are not expected to write unit and integration tests and will rely on external tools to test the infrastructure-related errors. So, why aren’t the same unit and integration testing procedures, which are applied to application code, being applied to infrastructure code?
So, why aren’t the same unit and integration testing procedures, which are applied to application code, being applied to infrastructure code?
While the infrastructure team can utilize resources like Terraform, localstack and Terraform-compliance to mock and test resources, they can not mock the platform and services which will live within the infrastructure. Thus, infrastructure teams will do actual deployments to the development environment, in order to test their infrastructure.
Unfortunately, from a developer-perspective, the development environment is "production", and is expected to be stable, and always available. Developers do not want downtime because the infrastructure team is deploying and testing an infrastructure change – and breaks something.
So, how do we resolve this conflict, in the simplest way possible (assuming the development environment is used 24 hours per day)?
I’ve had good results utilizing the same software testing strategy utilized for applications, for the infrastructure code-base.
By having infrastructure-related unit and integration tests written and tested against the infrastructure code prior to deployment to a development environment, you can ensure infrastructure changes will not break the development environment.
While development software testers could write Application/Platform/Service tests, they may not have the infrastructure and architectural knowledge to understand how to write good tests. Instead, a DevOps Software Tester team should be responsible for coordinating with all development software testers for infrastructure-related integration tests.
The infrastructure-related integration tests would then become part of the infrastructure deployment pipeline.
For example, before any infrastructure-related changes are deployed to the ‘development’ environment, the infrastructure should be deployed to a new environment and validated. Once all tests are passing, then the infrastructure is deployed. In addition, like with application code, infrastructure code should have at least 90 percent code coverage for all infrastructure resources, contain good infrastructure-related integration tests, and have 90 percent coverage for application-related integration tests.
While this solution does not guarantee an outage to your development environment, it applies a consistent, organizational-wide testing strategy for all code, and should help to catch many of the infrastructure-related coding mistakes.
It also provides an additional career path for software testers to enhance their knowledge and skills, and teach DevOps and cloud engineers how to do proper testing.
Or, you can continue to deploy infrastructure-as-code and not write any unit or integration tests.
Part I: Does DevOps Need Dedicated Testers?
Part II: 2019 Cloud Breaches Prove DevOps Needs Dedicated Testers