DevOps

Infrastructure as Code with Terraform: From Zero to Production

Team ZT15 March 202611 min read

Infrastructure as Code is not a trend. It is a fundamental shift in how production infrastructure is managed, and Terraform has emerged as the dominant tool for multi-cloud IaC. But there is a long road between writing your first resource "aws_instance" block and running Terraform safely in production with a team of engineers.

This guide takes you through that journey. We will skip the "what is Terraform" preamble and focus on the patterns, practices, and hard-won lessons that separate toy Terraform from production Terraform. Our team at Zindagi Technologies uses Terraform daily to manage infrastructure across AWS, Azure, OpenStack, and VMware for clients in government, defense, and enterprise. Every recommendation here comes from production experience.

First Principles: What Terraform Actually Does

Terraform maintains a model of your desired infrastructure state in code (.tf files) and a record of the current real-world state (state file). When you run terraform apply, it calculates the difference between desired and actual state and makes API calls to reconcile them.

This model has three critical implications:

The state file is sacred. It is the source of truth for what Terraform manages. Lose it or corrupt it, and Terraform loses track of your infrastructure.
Terraform is declarative, not imperative. You describe what you want, not the steps to get there. Terraform figures out the order of operations through its dependency graph.
Plan before apply. Always run terraform plan and review the output before applying. A careless apply can destroy production resources in seconds.

Project Structure That Scales

How you organize Terraform code determines whether it remains maintainable as your infrastructure grows.

Environment Separation

Never manage dev, staging, and production from the same Terraform state. Use separate directories or workspaces per environment:

infrastructure/
  modules/
    vpc/
    compute/
    database/
    security-groups/
  environments/
    dev/
      main.tf
      variables.tf
      terraform.tfvars
      backend.tf
    staging/
      main.tf
      variables.tf
      terraform.tfvars
      backend.tf
    production/
      main.tf
      variables.tf
      terraform.tfvars
      backend.tf

Each environment directory has its own state file and variable values. Modules contain reusable, parameterized infrastructure definitions.

Module Design

Good modules are the backbone of scalable Terraform. Follow these principles:

Single responsibility: Each module should manage one logical component (a VPC, a database cluster, a Kubernetes node pool). Not everything in one mega-module.
Input validation: Use validation blocks on variables to catch configuration errors before apply.
Sensible defaults: Provide default values for optional variables. A module should work with minimal configuration for common use cases.
Output everything useful: Expose IDs, ARNs, endpoints, and other attributes that downstream modules might need.
Version your modules: Use Git tags for module versions. Pin module versions in your environment configurations.

module "vpc" \{
  source  = "git::https://github.com/your-org/terraform-modules.git//vpc?ref=v2.1.0"
  
  environment    = var.environment
  cidr_block     = var.vpc_cidr
  az_count       = var.availability_zone_count
  enable_nat     = true
  enable_flow_logs = true
\}

State Management: The Make-or-Break Decision

Local state files work for learning. They do not work for teams or production.

Remote State with Locking

Use a remote backend with state locking. For AWS, the standard pattern is S3 for storage and DynamoDB for locking:

terraform \{
  backend "s3" \{
    bucket         = "myorg-terraform-state"
    key            = "production/infrastructure.tfstate"
    region         = "ap-south-1"
    dynamodb_table = "terraform-state-lock"
    encrypt        = true
  \}
\}

For Azure, use Azure Storage with blob leasing for locking. For OpenStack or on-premises environments, consider Terraform Cloud, GitLab-managed state, or a self-hosted backend like Consul.

State File Security

The state file contains sensitive data -- resource IDs, IP addresses, and sometimes passwords or API keys in plain text. Protect it:

Enable server-side encryption on your state storage
Restrict access to the state bucket/container to authorized CI/CD pipelines and senior engineers
Never commit state files to Git
Use sensitive = true on variables and outputs that contain secrets

State File Hygiene

Over time, state files accumulate drift and orphaned resources. Maintain them:

Run terraform plan regularly (even without applying) to detect drift
Use terraform state rm to remove resources that are now managed outside Terraform
Use terraform import to bring existing resources under Terraform management
Never edit state files manually unless you are fixing a specific known issue and have a backup

Writing Production-Grade Configuration

Use Variables and Locals Deliberately

Variables are for values that change between environments. Locals are for computed values and repeated expressions:

variable "environment" \{
  type        = string
  description = "Deployment environment (dev, staging, production)"
  validation \{
    condition     = contains(["dev", "staging", "production"], var.environment)
    error_message = "Environment must be dev, staging, or production."
  \}
\}

locals \{
  common_tags = \{
    Environment = var.environment
    ManagedBy   = "terraform"
    Project     = var.project_name
  \}
  
  is_production = var.environment == "production"
\}

Resource Naming and Tagging

Consistent naming and tagging is essential for cost tracking, access control, and operational clarity. Define a naming convention and enforce it through locals:

locals \{
  name_prefix = "$\{var.project_name\}-$\{var.environment\}"
\}

resource "aws_instance" "web" \{
  ami           = var.ami_id
  instance_type = local.is_production ? "m5.xlarge" : "t3.medium"
  
  tags = merge(local.common_tags, \{
    Name = "$\{local.name_prefix\}-web"
    Role = "web-server"
  \})
\}

Data Sources Over Hardcoding

Never hardcode AMI IDs, VPC IDs, or subnet IDs. Use data sources to look them up:

data "aws_ami" "ubuntu" \{
  most_recent = true
  owners      = ["099720109477"] # Canonical

  filter \{
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
  \}
\}

This ensures your configuration works across accounts and regions without modification.

CI/CD Integration

Running Terraform manually is a liability in production. Integrate it into your CI/CD pipeline.

The Standard Pipeline

A production Terraform pipeline has these stages:

Validate: terraform validate checks syntax. terraform fmt -check enforces formatting. Run on every pull request.
Plan: terraform plan -out=tfplan generates an execution plan. Save the plan file -- this is what gets applied.
Review: A human reviews the plan output. For production changes, require approval from at least two engineers.
Apply: terraform apply tfplan executes the saved plan. Only run after review and approval.
Verify: Post-apply verification -- health checks, smoke tests, monitoring validation.

GitOps Workflow

Infrastructure changes follow the same workflow as code changes:

Engineer creates a branch and makes Terraform changes
Pull request triggers validate and plan stages
Plan output is posted as a comment on the PR for review
After approval and merge, apply runs automatically against the target environment
Deployment status is tracked and rolled back if verification fails

Protecting Production

Add guardrails for production applies:

Require manual approval for any production plan that includes resource destruction
Set prevent_destroy = true on critical resources (databases, state buckets)
Use Sentinel or OPA policies to enforce organizational rules (no public S3 buckets, no unencrypted volumes, required tags)
Limit who can trigger production applies to senior engineers or a dedicated platform team

Advanced Patterns

Dynamic Blocks for Repetitive Configuration

When you need to generate multiple similar blocks (security group rules, DNS records), use dynamic blocks:

resource "aws_security_group" "web" \{
  name   = "$\{local.name_prefix\}-web-sg"
  vpc_id = module.vpc.vpc_id

  dynamic "ingress" \{
    for_each = var.web_ingress_rules
    content \{
      from_port   = ingress.value.port
      to_port     = ingress.value.port
      protocol    = "tcp"
      cidr_blocks = ingress.value.cidr_blocks
      description = ingress.value.description
    \}
  \}
\}

Terraform Workspaces (With Caution)

Workspaces allow multiple state files from the same configuration. They work well for simple environment separation but become confusing with significantly different environments. We recommend directory-based separation for production and workspaces only for temporary environments (feature branches, developer sandboxes).

Handling Secrets

Never put secrets in Terraform code or variable files. Options:

Use AWS Secrets Manager / Azure Key Vault / HashiCorp Vault and reference them via data sources
Pass secrets via environment variables (TF_VAR_db_password)
Use SOPS or similar encrypted file tools for secrets in Git

Common Terraform Disasters (and How to Prevent Them)

The Accidental Destroy

An engineer runs terraform destroy against the wrong environment. Prevention:

Use separate backends per environment (so the wrong state file is never loaded)
Add prevent_destroy lifecycle rules to critical resources
Require environment confirmation in CI/CD pipelines
Back up state files before every apply

The State File Conflict

Two engineers apply simultaneously, causing state corruption. Prevention:

Always use remote state with locking enabled
Never run Terraform locally against production -- use CI/CD exclusively
Configure DynamoDB/blob lease locking correctly

The Drift Disaster

Manual changes in the console create drift between state and reality. When Terraform runs next, it either reverts the manual change or fails. Prevention:

Prohibit manual console changes to Terraform-managed resources
Run periodic drift detection (terraform plan in CI on a schedule)
When manual changes are necessary, immediately update Terraform code and state

The Module Version Conflict

Unpinned modules pull breaking changes. Prevention:

Always pin module versions
Test module upgrades in dev before promoting to production
Use a module registry (Terraform Cloud, GitLab, or custom) for version management

What Terraform Cannot Do

Terraform excels at provisioning and configuring infrastructure resources. It is not designed for:

Configuration management: Use Ansible, Chef, or cloud-init for OS-level configuration
Application deployment: Use CI/CD tools (ArgoCD, Flux, Jenkins) for application deployment
Secrets management: Use Vault, AWS Secrets Manager, or similar dedicated tools
Monitoring and alerting: Terraform can create monitoring resources but should not be the monitoring system

Use Terraform for what it does best and integrate with specialized tools for everything else.

Getting Started Checklist

If you are adopting Terraform for your organization, follow this sequence:

Set up remote state backend with locking and encryption
Define your project structure and naming conventions
Write your first module (start with networking -- VPC/VNET)
Build a CI/CD pipeline with plan-review-apply stages
Import existing infrastructure into Terraform management
Establish team workflows (branching, review, approval)
Document module interfaces and usage patterns

At Zindagi Technologies, our DevOps engineering team helps organizations adopt Terraform for multi-cloud infrastructure management. From initial setup through production operations, we bring the expertise to build infrastructure automation that is reliable, secure, and maintainable. Reach out to start your IaC journey.

Infrastructure as Code with Terraform: From Zero to Production

First Principles: What Terraform Actually Does

Project Structure That Scales

State Management: The Make-or-Break Decision

Writing Production-Grade Configuration

CI/CD Integration

Advanced Patterns

Common Terraform Disasters (and How to Prevent Them)

What Terraform Cannot Do

Getting Started Checklist

Related Articles

DevSecOps Pipeline: Integrating Security into CI/CD

Container Security: Protecting Docker and Kubernetes in Production

Kubernetes Security Best Practices for Production Workloads

Ready to build your cyber resilience?