Skip to main content

Context

The eigenoid IaC model (ADR-0008) requires each stack to have a GCS Terraform state bucket per environment. Before this decision, buckets were created manually:

  1. An operator followed the bootstrap runbook.
  2. Ran gcloud storage buckets create for each environment (dev, qa, prd).
  3. Configured versioning and public access prevention.
  4. Registered the bucket data in the producer configuration.

This manual process was error-prone (incorrect naming, forgotten versioning, missed environments) and created a barrier to entry for new IaC stacks. Additionally, there was no centralized registry of which buckets existed for which stacks.

On the other hand, repository governance (ADR-0007) already handles the repo lifecycle via issue templates and workflows. IaC repo creation goes through the same flow (issue → approval → safe-settings creates the repo), but state buckets were created separately.

Decision

We integrate state bucket creation into the repository governance workflow (approve-repo.yml). When the creation of a repo with category iac is approved, the workflow automatically creates 3 buckets (one per environment) before pushing the config YAML.

Flow

Issue "Request new repository"
→ category: iac
→ label: approve-repo
→ approve-repo.yml:
1. Generate config YAML
2. Auth GCP (dev) → create dev bucket
3. Auth GCP (qa) → create qa bucket
4. Auth GCP (prd) → create prd bucket
5. Update state-buckets.json (registry)
6. Push config + registry to main
7. Comment on issue with table of created buckets
→ safe-settings creates the repo
→ Issue closes automatically

Bucket details

gcloud storage buckets create gs://eigenoid-2cea55-{STACK}-tfstate-{ENV} \
--project=eigenoid-{ENV} \
--location=europe-west1 \
--uniform-bucket-level-access \
--pap

gcloud storage buckets update gs://eigenoid-2cea55-{STACK}-tfstate-{ENV} \
--versioning
bash

Where:

  • {STACK} is derived from the repo name (without the iac- prefix): e.g., repo iac-distribution → stack distribution
  • {ENV} is dev, qa, prd
  • --pap enables public access prevention
  • --versioning protects against state corruption

Authentication

The workflow uses WIF + dedicated service accounts (platform-bootstrap@eigenoid-{env}.iam.gserviceaccount.com) — not the Terraform CI SAs. This allows the governance workflows in platform-settings to create buckets without having access to plan/apply Terraform.

Registry

The .github/state-buckets.json file in platform-settings maintains a registry of all created buckets:

{
"iac-foundation": {
"stack": "foundation",
"buckets": {
"dev": "eigenoid-2cea55-foundation-tfstate-dev",
"qa": "eigenoid-2cea55-foundation-tfstate-qa",
"prd": "eigenoid-2cea55-foundation-tfstate-prd"
},
"created_at": "2026-04-25T...",
"created_by": "issue#64"
}
}
json

Cleanup on deletion

When an IaC repo is deleted:

  1. Phase 1 (archival): the notify-lifecycle-approvers.yml workflow detects whether the repo has buckets in the registry and lists them in the issue warning.
  2. Phase 2 (permanent deletion, 30 days later): the delete-archived-repos.yml workflow deletes the GCS buckets and the registry entry.

Pre-configured template

The IaC repo template (iac-template) comes pre-configured with:

  • terraflow.yaml with all 3 environments and correct project IDs.
  • Per-environment backend configs that reference the bucket naming convention.
  • No manual bootstrap needed.

Consequences

  • Zero manual bootstrap: a new IaC stack is CI-ready from the moment the repo is created. No manual steps between "repo created" and "first PR with plan".
  • Consistent naming: buckets follow the eigenoid-2cea55-{stack}-tfstate-{env} convention without exception. No ad-hoc bucket names.
  • Auditable registry: state-buckets.json documents which buckets exist, for which stack, and when they were created.
  • Automatic cleanup: buckets are deleted along with the repo, leaving no orphaned resources.
  • Coupling with governance: bucket creation depends on the repo workflow. If the workflow fails, buckets are not created. Mitigated by: retry (remove and re-add the approve-repo label) and manual fallback via gcloud.
  • 3 auth steps per creation: the workflow authenticates 3 times (once per GCP project). Adds ~30 seconds to the approval flow.
  • Idempotency: if the workflow is re-run (retry), already-existing buckets are not recreated — gcloud storage buckets create fails with "already exists" and the step continues.

Alternatives considered

  • Manual bootstrap (status quo): works but does not scale. Each new stack requires an operator to follow a ~15-step runbook. Prone to naming and configuration errors.
  • Terraform for the buckets (chicken-and-egg): Terraform needs a bucket to store its state, but cannot create that bucket without prior state. A local apply solves this but requires user credentials and is manual.
  • Script in the template repo: a Makefile or bootstrap script in the template. Requires the developer to run it locally. Shifts the responsibility to the human.
  • Single bucket with per-stack prefixes: one large bucket with prefixes (foundation/dev/, platform/prd/). Works but:
    • IAM cannot be applied per-stack (GCS IAM is per-bucket, not per-prefix).
    • Deleting one stack's state risks deleting another's.
    • A permissions error affects all stacks.

References