Infrastructure as Code with Terraform: From Zero to Production
Infrastructure as Code is not a trend. It is a fundamental shift in how production infrastructure is managed, and Terraform has emerged as the dominant tool for multi-cloud IaC. But there is a long road between writing your first resource "aws_instance" block and running Terraform safely in production with a team of engineers.
This guide takes you through that journey. We will skip the "what is Terraform" preamble and focus on the patterns, practices, and hard-won lessons that separate toy Terraform from production Terraform. Our team at Zindagi Technologies uses Terraform daily to manage infrastructure across AWS, Azure, OpenStack, and VMware for clients in government, defense, and enterprise. Every recommendation here comes from production experience.
First Principles: What Terraform Actually Does
Terraform maintains a model of your desired infrastructure state in code (.tf files) and a record of the current real-world state (state file). When you run terraform apply, it calculates the difference between desired and actual state and makes API calls to reconcile them.
This model has three critical implications:
- The state file is sacred. It is the source of truth for what Terraform manages. Lose it or corrupt it, and Terraform loses track of your infrastructure.
- Terraform is declarative, not imperative. You describe what you want, not the steps to get there. Terraform figures out the order of operations through its dependency graph.
- Plan before apply. Always run
terraform planand review the output before applying. A careless apply can destroy production resources in seconds.
Project Structure That Scales
How you organize Terraform code determines whether it remains maintainable as your infrastructure grows.
Environment Separation
Never manage dev, staging, and production from the same Terraform state. Use separate directories or workspaces per environment:
infrastructure/
modules/
vpc/
compute/
database/
security-groups/
environments/
dev/
main.tf
variables.tf
terraform.tfvars
backend.tf
staging/
main.tf
variables.tf
terraform.tfvars
backend.tf
production/
main.tf
variables.tf
terraform.tfvars
backend.tf
Each environment directory has its own state file and variable values. Modules contain reusable, parameterized infrastructure definitions.
Module Design
Good modules are the backbone of scalable Terraform. Follow these principles:
- Single responsibility: Each module should manage one logical component (a VPC, a database cluster, a Kubernetes node pool). Not everything in one mega-module.
- Input validation: Use
validationblocks on variables to catch configuration errors before apply. - Sensible defaults: Provide default values for optional variables. A module should work with minimal configuration for common use cases.
- Output everything useful: Expose IDs, ARNs, endpoints, and other attributes that downstream modules might need.
- Version your modules: Use Git tags for module versions. Pin module versions in your environment configurations.
module "vpc" \{
source = "git::https://github.com/your-org/terraform-modules.git//vpc?ref=v2.1.0"
environment = var.environment
cidr_block = var.vpc_cidr
az_count = var.availability_zone_count
enable_nat = true
enable_flow_logs = true
\}
State Management: The Make-or-Break Decision
Local state files work for learning. They do not work for teams or production.
Remote State with Locking
Use a remote backend with state locking. For AWS, the standard pattern is S3 for storage and DynamoDB for locking:
terraform \{
backend "s3" \{
bucket = "myorg-terraform-state"
key = "production/infrastructure.tfstate"
region = "ap-south-1"
dynamodb_table = "terraform-state-lock"
encrypt = true
\}
\}
For Azure, use Azure Storage with blob leasing for locking. For OpenStack or on-premises environments, consider Terraform Cloud, GitLab-managed state, or a self-hosted backend like Consul.
State File Security
The state file contains sensitive data -- resource IDs, IP addresses, and sometimes passwords or API keys in plain text. Protect it:
- Enable server-side encryption on your state storage
- Restrict access to the state bucket/container to authorized CI/CD pipelines and senior engineers
- Never commit state files to Git
- Use
sensitive = trueon variables and outputs that contain secrets
State File Hygiene
Over time, state files accumulate drift and orphaned resources. Maintain them:
- Run
terraform planregularly (even without applying) to detect drift - Use
terraform state rmto remove resources that are now managed outside Terraform - Use
terraform importto bring existing resources under Terraform management - Never edit state files manually unless you are fixing a specific known issue and have a backup
Writing Production-Grade Configuration
Use Variables and Locals Deliberately
Variables are for values that change between environments. Locals are for computed values and repeated expressions:
variable "environment" \{
type = string
description = "Deployment environment (dev, staging, production)"
validation \{
condition = contains(["dev", "staging", "production"], var.environment)
error_message = "Environment must be dev, staging, or production."
\}
\}
locals \{
common_tags = \{
Environment = var.environment
ManagedBy = "terraform"
Project = var.project_name
\}
is_production = var.environment == "production"
\}
Resource Naming and Tagging
Consistent naming and tagging is essential for cost tracking, access control, and operational clarity. Define a naming convention and enforce it through locals:
locals \{
name_prefix = "$\{var.project_name\}-$\{var.environment\}"
\}
resource "aws_instance" "web" \{
ami = var.ami_id
instance_type = local.is_production ? "m5.xlarge" : "t3.medium"
tags = merge(local.common_tags, \{
Name = "$\{local.name_prefix\}-web"
Role = "web-server"
\})
\}
Data Sources Over Hardcoding
Never hardcode AMI IDs, VPC IDs, or subnet IDs. Use data sources to look them up:
data "aws_ami" "ubuntu" \{
most_recent = true
owners = ["099720109477"] # Canonical
filter \{
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
\}
\}
This ensures your configuration works across accounts and regions without modification.
CI/CD Integration
Running Terraform manually is a liability in production. Integrate it into your CI/CD pipeline.
The Standard Pipeline
A production Terraform pipeline has these stages:
- Validate:
terraform validatechecks syntax.terraform fmt -checkenforces formatting. Run on every pull request. - Plan:
terraform plan -out=tfplangenerates an execution plan. Save the plan file -- this is what gets applied. - Review: A human reviews the plan output. For production changes, require approval from at least two engineers.
- Apply:
terraform apply tfplanexecutes the saved plan. Only run after review and approval. - Verify: Post-apply verification -- health checks, smoke tests, monitoring validation.
GitOps Workflow
Infrastructure changes follow the same workflow as code changes:
- Engineer creates a branch and makes Terraform changes
- Pull request triggers validate and plan stages
- Plan output is posted as a comment on the PR for review
- After approval and merge, apply runs automatically against the target environment
- Deployment status is tracked and rolled back if verification fails
Protecting Production
Add guardrails for production applies:
- Require manual approval for any production plan that includes resource destruction
- Set
prevent_destroy = trueon critical resources (databases, state buckets) - Use Sentinel or OPA policies to enforce organizational rules (no public S3 buckets, no unencrypted volumes, required tags)
- Limit who can trigger production applies to senior engineers or a dedicated platform team
Advanced Patterns
Dynamic Blocks for Repetitive Configuration
When you need to generate multiple similar blocks (security group rules, DNS records), use dynamic blocks:
resource "aws_security_group" "web" \{
name = "$\{local.name_prefix\}-web-sg"
vpc_id = module.vpc.vpc_id
dynamic "ingress" \{
for_each = var.web_ingress_rules
content \{
from_port = ingress.value.port
to_port = ingress.value.port
protocol = "tcp"
cidr_blocks = ingress.value.cidr_blocks
description = ingress.value.description
\}
\}
\}
Terraform Workspaces (With Caution)
Workspaces allow multiple state files from the same configuration. They work well for simple environment separation but become confusing with significantly different environments. We recommend directory-based separation for production and workspaces only for temporary environments (feature branches, developer sandboxes).
Handling Secrets
Never put secrets in Terraform code or variable files. Options:
- Use AWS Secrets Manager / Azure Key Vault / HashiCorp Vault and reference them via data sources
- Pass secrets via environment variables (TF_VAR_db_password)
- Use SOPS or similar encrypted file tools for secrets in Git
Common Terraform Disasters (and How to Prevent Them)
The Accidental Destroy
An engineer runs terraform destroy against the wrong environment. Prevention:
- Use separate backends per environment (so the wrong state file is never loaded)
- Add
prevent_destroylifecycle rules to critical resources - Require environment confirmation in CI/CD pipelines
- Back up state files before every apply
The State File Conflict
Two engineers apply simultaneously, causing state corruption. Prevention:
- Always use remote state with locking enabled
- Never run Terraform locally against production -- use CI/CD exclusively
- Configure DynamoDB/blob lease locking correctly
The Drift Disaster
Manual changes in the console create drift between state and reality. When Terraform runs next, it either reverts the manual change or fails. Prevention:
- Prohibit manual console changes to Terraform-managed resources
- Run periodic drift detection (
terraform planin CI on a schedule) - When manual changes are necessary, immediately update Terraform code and state
The Module Version Conflict
Unpinned modules pull breaking changes. Prevention:
- Always pin module versions
- Test module upgrades in dev before promoting to production
- Use a module registry (Terraform Cloud, GitLab, or custom) for version management
What Terraform Cannot Do
Terraform excels at provisioning and configuring infrastructure resources. It is not designed for:
- Configuration management: Use Ansible, Chef, or cloud-init for OS-level configuration
- Application deployment: Use CI/CD tools (ArgoCD, Flux, Jenkins) for application deployment
- Secrets management: Use Vault, AWS Secrets Manager, or similar dedicated tools
- Monitoring and alerting: Terraform can create monitoring resources but should not be the monitoring system
Use Terraform for what it does best and integrate with specialized tools for everything else.
Getting Started Checklist
If you are adopting Terraform for your organization, follow this sequence:
- Set up remote state backend with locking and encryption
- Define your project structure and naming conventions
- Write your first module (start with networking -- VPC/VNET)
- Build a CI/CD pipeline with plan-review-apply stages
- Import existing infrastructure into Terraform management
- Establish team workflows (branching, review, approval)
- Document module interfaces and usage patterns
At Zindagi Technologies, our DevOps engineering team helps organizations adopt Terraform for multi-cloud infrastructure management. From initial setup through production operations, we bring the expertise to build infrastructure automation that is reliable, secure, and maintainable. Reach out to start your IaC journey.