Design Modular Cloud Infrastructure with Terraform

This post from Ian Ferguson, Associate Engineering Manager at The Movement Cooperative (TMC), was originally published on Medium and is reposted here with permission. We’ve lightly edited the language for clarity and consistency with our blog.

Across the movement, teams are building and managing their own systems alongside shared tools. At TMC, a core part of our work is supporting organizations as they connect their data and systems—so teams spend less time dealing with fragmented tools and more time focusing on their programs. The Terraform-based approach outlined below shows how teams can create flexible, scalable foundations that support that work over time.


Imagine you are the sole infrastructure engineer on a team with multiple stakeholders. Each of these stakeholders may have (see: definitely have) similar but discrete requirements—separate storage buckets, service accounts, varying degrees of API and IAM access, etc.

Pointing and clicking in the UI is a recipe for failure in this scenario—we’re going to quickly lose track of which pieces of infrastructure have been provisioned and which ones haven’t.

Enter: Terraform.

About Terraform

HashiCorp Terraform is the premier Infrastructure-as-Code (IaC) tool—meaning it allows  you to define and manage your cloud infrastructure in code instead of configuring it manually. It’s open-source, widely used, and well-documented.

You can think of Terraform as a wrapper for all the API calls you might otherwise write. Instead of clicking “Create Instance” ad infinitum in the UI or running aws or gcloud commands in the terminal, you can simply run terraform apply, review your changes, and let Terraform do its work.

Example Project

Let’s use Terraform to manage infrastructure for multiple fictitious clients. By the end of this article, we’ll provision the following:

  • A production BigQuery dataset
  • A scratch BigQuery dataset (with default lifecycle rules)
  • Optionally, a Cloud Storage bucket (if the client requests it)
  • A Google service account with access to the client’s resources (and ONLY their resources)

You can follow along with the code in this repository: https://github.com/IanRFerguson/modular-infrastructure

Designing a Terraform Module

Typically, a Terraform module will include the following:

  • main.tf — This is the main interface that will create our infrastructure. We’ll define all of our resources here, and they’ll be created based on the inputs in variables.tf.
  • variables.tf — Think of these like Python function arguments. We’ll supply them to our module, and the relevant infrastructure defined in main.tf will be created accordingly.
  • output.tf — If we wanted our module instance to be accessible to other components of our infrastructure, we would define those attributes here. This isn’t included in this example project, but HashiCorp’s docs on this topic are excellent.

Because these three files are bundled together in a subdirectory—a module—they are accessible to one another (i.e., main.tf can access the inputs defined in variables.tf, etc.)

Our example project is structured such that ./main.tf defines client modules, which are defined in our ./modules/client/main.tffile.

Ian @ infra $ tree .
.
├── main.tf
├── modules
│   └── client
│       ├── main.tf
│       └── variables.tf

3 directories, 5 files
// In ./main.tf
module "client_bar" {
  source           = "./modules/client"
  client_name      = "bar"
  provision_bucket = true
}

The variables used here—client_name and provision_bucket—are defined in ./modules/client/variables.tf like so:

variable "client_name" {
  type = string
}

variable "provision_bucket" {
  type    = bool
  default = false
}

These variables are fed into the main.tf file in the module, where they’ll be used to uniquely name and permission the resources we’ll need for our hypothetical clients:

1// Create client-specific service account
2resource "google_service_account" "service_account" {
3 account_id = "sa-$"
4 display_name = "Client service account for $"
5}
6
7// BigQuery datasets
8resource "google_bigquery_dataset" "prod" {
9 dataset_id = "$__prod"
10 friendly_name = "$__prod"
11 description = "Production BigQuery dataset for $"
12 location = var.bigquery-region
13}
14
15resource "google_bigquery_dataset" "scratch" {
16 dataset_id = "$__scratch"
17 friendly_name = "$__scratch"
18 description = "Scratch BigQuery dataset for $"
19 location = var.bigquery-region
20 default_table_expiration_ms = var.default_table_expiration
21}
22
23// BigQuery IAM
24resource "google_bigquery_dataset_iam_member" "prod-editor" {
25 dataset_id = google_bigquery_dataset.prod.friendly_name
26 role = "roles/bigquery.dataEditor"
27 member = "serviceAccount:$"
28}
29
30resource "google_bigquery_dataset_iam_member" "scratch-editor" {
31 dataset_id = google_bigquery_dataset.scratch.friendly_name
32 role = "roles/bigquery.dataEditor"
33 member = "serviceAccount:$"
34}
35
36// Cloud storage
37resource "google_storage_bucket" "default" {
38 count = var.provision_bucket ? 1 : 0
39 name = "bkt-$"
40 location = var.gcs-region
41}
42
43resource "google_storage_bucket_iam_member" "member" {
44 count = var.provision_bucket ? 1 : 0
45 bucket = google_storage_bucket.default[count.index].name
46 role = "roles/storage.admin"
47 member = "serviceAccount:$"
48}

Leveraging Modules

As a refresher, in our hypothetical scenario we were intending to provision BigQuery and Cloud Storage resources for multiple clients in the same project.

We’ve defined two client modules in this project, like so:

module "client_foo" {
  source      = "./modules/client"
  client_name = "foo"
}

module "client_bar" {
  source           = "./modules/client"
  client_name      = "bar"
  provision_bucket = true
}

After running terraform apply, we’ll have:

  • A production dataset
  • A scratch dataset with lifecycle rules
  • Data editor access for the client-specific service account
  • A bucket provisioned for the bar client (but NOT the foo client)

Extending These Ideas

This is a fairly high-level overview of modular Terraform, but it hopefully illustrates what’s possible. In addition to BigQuery and Cloud Storage, you could use Terraform to:

  • Create and manage Postgres users in a CloudSQL instance
  • Manage a VPC with custom ingress rules
  • Set custom IAM roles for different groups

The goal of this article is to get the wheels turning as you start thinking about your next infrastructure project. Thanks so much for reading!

About the author

Ian Ferguson headshot

Ian Ferguson

Associate Engineering Manager, The Movement Cooperative

Ian has been with TMC since 2023 and currently serves as the Associate Engineering Manager on the Data Engineering team. When he’s not building pipelines for the Cooperative, Ian can be found roaming the streets of Brooklyn with a camera in hand or cheering for the Knicks.

Ian Ferguson

Ian has been with TMC since 2023 and currently serves as the Associate Engineering Manager on the Data Engineering team. When he’s not building pipelines for the Cooperative, Ian can be found roaming the streets of Brooklyn with a camera in hand or cheering for the Knicks.