Skip to main content

Context

The eigenoid org needs to manage GCP infrastructure in a reproducible, versioned, and auditable way. Resources include: Workload Identity Federation (pools, providers, service accounts), GCP APIs, Artifact Registry, Cloud Run, Secret Manager, and potentially networking and compute in the future.

Key requirements:

  • Multiple stacks: the infrastructure is not monolithic. There are stacks with different lifecycles (foundation, platform, distribution) that must be independent but share the same pipeline.
  • Multiple environments: each stack is deployed across 3 GCP projects (dev, qa, prd) with the same code but different values.
  • Automated CI/CD: changes must go through PR → automatic plan → controlled apply, with no manual intervention.
  • Governance: changes to critical infrastructure (prd) must require explicit approval.
  • Reuse: CI/CD logic (auth, init, plan, apply, promote) must not be duplicated in every infrastructure repo.

Decision

We adopt Terraform as the IaC tool for all GCP infrastructure, with a producer/consumer model that centralizes CI/CD logic in a single repo and leaves each infrastructure repo responsible only for resource definitions.

Architecture

ComponentRepoRole
Producereigenoid/platform-actionsReusable workflows, centralized environment config
Consumereigenoid/iac-* (e.g., iac-foundation, iac-platform)Terraform resource definitions
Specterraflow.yamlDeclarative contract between consumer and producer
Boteigenoid-terraflow-botGitHub App for cross-repo tokens and PR comments

The terraflow.yaml contract

Each consumer declares its stack in a terraflow.yaml file at the repo root:

stack:
name: foundation
owner: platform
tier: foundation
cloud: gcp

terraform:
min_version: "1.14.9"

environments:
- name: dev
is_default: true
auto_deploy: true
project: eigenoid-dev
- name: qa
approval_required: true
depends_on: [dev]
project: eigenoid-qa
- name: prd
approval_required: true
depends_on: [qa]
project: eigenoid-prd

layers:
- name: platform-wif
path: platform-wif
- name: service-wif
path: service-wif
depends_on: [platform-wif]
yaml

The producer parses this spec and orchestrates the plan/apply of each layer, respecting dependencies (waves) and environments.

Workflow

Developer opens PR on consumer
→ terraform.yml invokes producer@v1
→ Orchestrator parses terraflow.yaml
→ Automatic plan on dev (layer by layer, wave by wave)
→ Bot comments result on PR
→ Reviewer comments "/terraflow apply"
→ Apply on dev
→ Merge → manual promote to qa → manual promote to prd

Authentication

  • GitHub to GCP: Workload Identity Federation (ephemeral OIDC tokens, no stored secrets)
  • GitHub to GitHub: GitHub App (eigenoid-terraflow-bot) for cross-repo access

Centralized config

Environment configuration (project IDs, WIF providers, SA emails, bucket prefixes) lives in the producer (platform-actions/config/environments.yaml), not in each consumer. This guarantees consistency and a single point of update.

Producer versioning

  • Each merge to main generates a release via release-please (vX.Y.Z).
  • The v1 tag is automatically moved to the latest release.
  • Consumers invoke @v1 and receive fixes without changing anything.

Consequences

  • One pipeline for all stacks: any iac-* repo with a terraflow.yaml and the consumer workflow works automatically. No per-repo CI configuration required.
  • Clear separation of concerns: the producer evolves the pipeline without touching consumers. Consumers define only resources.
  • Isolated state per stack and environment: each stack+env combination has its own GCS bucket. No risk of cross-state corruption.
  • Per-environment governance: approval_required: true activates GitHub Environment protection rules. Promotes to qa/prd require explicit approval.
  • Scope limited to GCP: this ADR covers only GCP infrastructure. GitHub configuration (repos, labels, rulesets) is managed with safe-settings (ADR-0006).
  • Dependency on the producer: if platform-actions is broken, no consumer can plan/apply. Mitigated by: semantic versioning, CI tests on the producer, and the ability to temporarily pin to a specific SHA.
  • Learning curve: the producer/consumer model and terraflow.yaml require familiarization. Mitigated by: documentation, template repo (iac-template), and the fact that consumers are standard Terraform.

Alternatives considered

  • Terraform without centralization (each repo with its own workflow): simple but duplicates auth, init, plan, apply logic in every repo. Pipeline changes require PRs in N repos. Does not scale.
  • Terragrunt: solves DRY for configuration, but adds an abstraction layer over Terraform with its own learning curve and limitations. The producer/consumer model with reusable workflows is lighter and GitHub-native.
  • Pulumi / CDK for Terraform: more expressive for complex logic, but eigenoid's infrastructure is declarative and does not need imperative logic. Terraform is the de facto standard with the best provider support and community.
  • OpenTofu: MPL-licensed fork of Terraform. Viable, but the provider ecosystem is identical and HashiCorp Terraform's stability is proven. If the BSL license becomes a problem, migration to OpenTofu is trivial.

References