One of the foundational infrastructure architectural patterns is to split the infrastructure into multiple logical environments. The most common one is Dev, QA, and Production. Each environment uses the same type of resources (load balancers, compute, storage, etc.), but they differ in size and public availability. While Dev will use a low number of small instances and hidden from the Internet, production will use a larger sized of instances and will be accessible by anyone.
To apply DevOps best practices, these types of infrastructures are deployed using Infrastructure as Code, or IaC. This gives us the ability to easily repeat the infrastructure and version the code via Git or any other Version Control System — and for that, a popular tool of choice is Terraform.
I was recently brought into an organization that was struggling to implement IaC with the Open Source version of Terraform to support their AWS environment. With the right module structure and repeatable templates, we were able to set them up for success and fix issues they have been facing.
While Terraform is a great tool and a great time-saver, when it gets to more complex architectures like the ones described below, things can get unwieldy.
The organization had 4 environments — Prod, Dev, Testing & QA and their existing directory structure for this kind of infrastructure looked as shown above. As you can see, there were single giant environment files for their base foundation (global and regional configuration). The problem here was that teams were copying terraform files between environments, which led to redundancy, inconsistency, and inefficiency.
Solution: Create Reusable Terraform Modules
We were able to resolve the above issue by templating terraform files and storing them into a central repo(see below — gray box).
While spinning-up environments (prod, dev, QA and testing), each environment repo referred to a specific template directory within the central Template repo for deploying specific stack.
Below, I have conceptualized the environment and directory structure:
There are four repos in the production environment. Each repo has a list of parameters and Git source pointing to a specific directory in the Template repo. Templates are made flexible so that we can enter value based on size, count and the resource configuration. For example — a parameter of EC2 in production can have large instance type while other environments can be kept small. We can still use the same EC2 template and be able to stand up multiple instances of varying size in multiple environments without modifying the EC2 template file. The same would go for VPC, we can have a larger subnet size for the production environment and others can have relatively small CIDR size.
This solution helped the organization organize a formerly heavy environment that relied on copying and pasting Terraform files into centralizing templates and creating repos for each team per environment. It also enabled teams the autonomy of managing their repos and state files.
Today when we want to create a new AWS account, it goes through an automation process applying multiple layers in the order described below:
- Account BootStrap: Configures basic foundation at the global level — such as Logging, Monitoring, Enabling Security controls and adding basic IAM roles etc.
- Network layer: We chose two regions one as Primary and other for DR. Network team maintains their own repo which basically sets up Network components including VPC, Subnets, Routes, NACL, basic SG’s.
- Application layer: After the base foundation was set up, each application was pushed on top of the network within its own environment. If an application requires EC2, ECS or any other service, it would refer to a specific directory within the Template repo, and deploy the resources. Meanwhile, each app team was maintaining its own TF state.
In the above scenario, there was other layers set-up corresponding to governance, compliance, and security which have not been discussed here.
Managing State File:
This is a blog topic in of itself; however, I will just preliminarily describe how we managed state files. With the new approach, we now have multiple state files which can be cumbersome to manage but at the same time does not impact performance. If you have a single state file and it is being queried for every deployment, it affects the loading of modules when terraform is run which is not the case for multiple state files.
Each team manages single or multiple state files depending on their environment. For example — the production network team has multiple state files based on region. The state file is then shared with other State with other teams so they can refer them for dynamic querying of dependent resources. For example — When an application is deployed on ECS/EC2, the instance will require Subnet_id, which it will query from the Network state file.
Although, there are multiple ways to structure state files; the way we chose was by splitting state files per team. Each team can have one or more state files. For this solution, we ended up having 20 state files between 4 environments and 5 teams.
This is where Terraform Enterprise can be very beneficial for managing multiple state files. You can read more about workspaces here.
By centralizing templates, Terraform code can be structured in such a way from the beginning that growing the code base won’t be without it’s growing pains, but a lot easier to scale and repeat across environments. There are many strategies a team can employ when building their modules and this solution is just one approach in a universe of possibilities.
This article originally appeared on the Slalom technology blog.