Skip to main content

Overview

Environment promotion moves infrastructure changes through dev → qa → prd using the Terraflow CI pipeline. All iac-* repos use Terraflow: comment /terraflow promote <env> on a PR to trigger a plan, review it, then approve to apply.

Each environment must be healthy before promoting to the next. Order within an environment matters — dependencies between repos mean that applying in the wrong sequence causes permission errors and plan failures.


Pre-flight checklist

Before promoting any environment, confirm all of the following:

  • dev is healthy — services responding, no errors in logs

  • All PRs are merged to main for every repo you're about to promote

  • GitHub environment variables are configured for the target environment in each repo (Settings → Environments → <env>):

    VariableExample
    GCP_PROJECT_IDeigenoid-qa
    GCP_SA_EMAILterraform-ci@eigenoid-qa.iam.gserviceaccount.com
    GCP_WIF_PROVIDERprojects/…/providers/github
    TF_BACKEND_BUCKETeigenoid-2cea55-foundation-tfstate-qa
  • For first-time QA/PRD setup: the GCP project exists and billing is enabled


Promotion order

Follow this order exactly

Dependencies between repos cause failures if applied out of sequence. Two critical ordering rules:

  • iac-access must come after iac-foundation/platform-wif — the CI service account needs roles/iam.roleAdmin before any custom role creation.
  • iac-api-gateway must come after iac-studio/cloud-run — the gateway plan performs a live lookup of the svc-studio Cloud Run service and fails immediately if it doesn't exist yet. See gotcha #7.

Phase 1 — Foundation

No external dependencies. Must complete entirely before any other repo.

StepRepoLayerWhat it creates
1.1iac-foundationplatform-wifWIF pool, deploy CI SA, IAM role bindings (incl. iam.roleAdmin)
1.2iac-foundationnetworkVPC, subnets, firewall rules, NAT
1.3iac-foundationservice-wifPer-service WIF bindings (depends on platform-wif)
1.4iac-foundationgovernance-saGovernance service accounts (depends on platform-wif)
Gate — do not proceed until Phase 1 is complete

platform-wif must finish before anything else. It creates the CI service account and grants it the permissions every downstream repo needs to deploy.

Phase 2 — Studio

StepRepoLayerWhat it creates
2.1iac-studioproject-baseBase project config
2.2iac-studioartifact-registry, secret-managerAR repositories, secrets (depend on project-base)
2.3iac-studiocloud-sqlDatabase (depends on project-base)
2.4iac-studiocloud-runCloud Run service (depends on cloud-sql, artifact-registry)
2.5iac-studiodnsDNS records (depends on cloud-run)
2.6iac-studiocloudflareCloudflare config
CF Worker — deploy code before cloudflare layer

The cloudflare_workers_custom_domain resource requires at least one Worker deployment to exist. If this is a new environment, deploy the app-studio-frontend-{env} Worker via wrangler deploy or the app-studio-frontend CI pipeline before applying the cloudflare layer. Skipping this step produces:

100124: Cannot attach custom domain: Worker has no deployments

Gate — do not proceed to Phase 3 until cloud-run layer is complete

iac-api-gateway performs a live API lookup of the svc-studio Cloud Run service at plan time. If the service does not exist in the target project, the gateway plan fails immediately. See gotcha #7.

Phase 3 — API Gateway

StepRepoLayerWhat it creates
3.1iac-api-gatewaygatewaycloudflared VM, CF tunnel, CF DNS records, CF Access apps
Gate — do not proceed until Phase 3 is complete

Access services route through the gateway tunnel. iac-access/service will fail without a healthy gateway.

Phase 4 — Access Platform

StepRepoLayerWhat it creates
4.1iac-accessfoundationSecret Manager secrets, Cloud SQL, IAM custom roles, service accounts
4.2iac-accessserviceCloud Run services (admin + public), env vars, secret bindings (depends on foundation)
4.3iac-accesspublic-portalPublic portal CF Worker + custom domain (depends on service)
CF Worker custom domain — deploy code first

The cloudflare_workers_custom_domain resource requires at least one Worker deployment to exist. If this is a new environment, deploy the app-access-admin-{env} and app-access-public-{env} Workers via wrangler deploy or the app-access-public CI pipeline before applying public-portal. Skipping this step produces:

100124: Cannot attach custom domain: Worker has no deployments

Phase 5 — Distribution

StepRepoLayerWhat it creates
5.1iac-distributionartifact-registryDistribution AR
5.2iac-distributionsample-artifact-registrySample AR

Phase 6 — Platform (PRD only)

StepRepoLayerNotes
6.1iac-platformsafe-settingsOnly deployed to prd — no dev or qa environment

Application deployments

Not handled by Terraflow

Application code (Docker images, Cloudflare Workers) is deployed through separate CI/CD pipelines — not by Terraflow. Several IaC layers have hard dependencies on app code existing before they can plan or apply successfully. This section documents those dependencies.

When app deployments are required

Before this IaC layerYou must first deployHow
iac-studio/cloud-runsvc-studio Docker image (must exist in Artifact Registry)Trigger the svc-studio CI pipeline to build and push the image
iac-studio/cloudflareapp-studio-frontend-{env} CF Workerwrangler deploy or app-studio-frontend CI
iac-access/public-portalapp-access-admin-{env} and app-access-public-{env} CF Workerswrangler deploy or app-access-public CI

The iac-api-gateway/gateway dependency on svc-studio is resolved by the IaC phase reordering (Phase 2 creates the Cloud Run service before Phase 3 plans the gateway).

Greenfield interleaved order

For a brand-new environment (QA or PRD), IaC and app deployments must be interleaved. Follow this exact sequence:

StepActionType
1iac-foundation — all layersIaC
2iac-studioproject-base, artifact-registry, secret-managerIaC
3Build and push svc-studio Docker image to Artifact Registry; deploy Cloud Run serviceApp CI
4iac-studiocloud-sql, cloud-runIaC
5Deploy app-studio-frontend-{env} WorkerApp CI
6iac-studiocloudflareIaC
7iac-api-gatewaygatewayIaC
8iac-accessfoundation, serviceIaC
9Deploy app-access-admin-{env} and app-access-public-{env} WorkersApp CI
10iac-accesspublic-portalIaC
11iac-distribution — all layersIaC

How to promote

Via slash command (standard)

# On the PR, post a comment:
/terraflow promote qa # or: prd

# Terraflow will:
# 1. Run terraform plan against the target environment
# 2. Post the plan output as a PR comment
# 3. Apply automatically once you approve the plan
bash

Via workflow dispatch (manual fallback)

GitHub → Actions → Terraform → Run workflow
Action: promote
Target environment: qa (or prd)

Verification

Run these after each phase before moving on.

Phase 1 — Foundation

gcloud iam service-accounts list --project=eigenoid-{env}
# Expect: terraform-ci@, governance-*, service-wif-* accounts listed
bash

Phase 2 — Studio

gcloud run services list \
--project=eigenoid-{env} \
--region=europe-west1
# Expect: svc-studio listed and READY
bash

Phase 3 — API Gateway

curl -s https://api-{env}.eigenoid.services/health
# Expect: 200 OK
bash

Phase 4 — Access Platform

gcloud run services list \
--project=eigenoid-{env} \
--region=europe-west1
# Expect: svc-access-admin and svc-access-public listed and READY
bash

Phase 5 — Distribution

gcloud artifacts repositories list --project=eigenoid-{env}
# Expect: distribution and sample-distribution repos listed
bash

Known gotchas

1. iam.roles.create permission denied

Symptom: Terraform apply fails creating a custom IAM role in any repo other than iac-foundation.

Cause: iac-foundation/platform-wif hasn't been applied yet. The CI service account needs roles/iam.roleAdmin to create custom project roles downstream.

Fix: Apply iac-foundation/platform-wif first. Always.


2. CF Worker custom domain — error 100124

Symptom: iac-access/public-portal apply fails with:

100124: Cannot attach custom domain: Worker has no deployments

Fix: Deploy the Worker code before the IaC layer that creates the custom domain binding. Use wrangler deploy or trigger the app-access-public CI pipeline.


3. Subnet CIDR conflict

Symptom: Terraform fails adding a subnet because the CIDR overlaps with an existing one in the project.

Cause: GCP does not allow overlapping CIDRs across regions. Terraform's destroy/create cycle during region changes can leave the old subnet in place long enough to conflict.

Fix: Change the CIDR in tfvars before applying. Pick a range that doesn't overlap with any existing subnet in the project.


4. CF Secrets Store 400 — secret_name_already_exists

Symptom: Terraform apply fails on a Cloudflare secret with HTTP 400 secret_name_already_exists.

Cause: The Cloudflare Secrets Store POST /secrets endpoint is not idempotent — it creates, it does not upsert.

Fix: Delete the secret in the Cloudflare dashboard (or via API), then re-apply.


5. Cloud Run min CPU error

Symptom: Terraform apply fails with a Cloud Run error about insufficient CPU when concurrency > 1.

Cause: Cloud Run requires total CPU ≥ 1 vCPU when concurrency is greater than 1. Removing a sidecar can push the total below 1 vCPU.

Fix: Increase the main container's CPU allocation to compensate before applying.


6. Terraflow tfvars path

Symptom: Terraflow cannot find variables for the target environment.

Cause: Wrong path. Terraflow expects:

<layer>/tfvars/<env>.tfvars ✅ correct
<layer>/envs/<env>.tfvars ❌ wrong

Fix: Ensure all .tfvars files are under <layer>/tfvars/.


7. svc-studio data source not found in gateway plan

Symptom: iac-api-gateway terraform plan fails with:

Error: projects/eigenoid-{env}/locations/europe-west1/services/svc-studio not found

Cause: iac-api-gateway/gateway/main.tf declares data "google_cloud_run_v2_service" "studio" — a live API lookup that runs at plan time. If svc-studio has not been deployed to the target environment, the plan fails before any resources are evaluated.

Fix: Apply iac-studio through the cloud-run layer to the target environment before promoting iac-api-gateway. This is why Studio is Phase 2 and Gateway is Phase 3 in the promotion order.


8. svc-access crash-loops after Turnstile widget change

Symptom: Cloud Run service access-public enters crash-loop immediately on start after a deploy to a new environment or after a Turnstile widget recreate.

Root cause: Since Plan 163, TURNSTILE_SECRET_KEY is required at startup unconditionally in all environments (the previous if env != production bypass was removed). If iac-access/foundation apply has not run for the target environment, the secret access-turnstile-secret-key in GCP Secret Manager has no ENABLED version, Cloud Run reads an empty value, and svc-access/cmd/public/main.go exits with os.Exit(1).

Fix:

  1. Run terraform apply in iac-access/foundation for the target environment
  2. Verify the secret has an ENABLED version: gcloud secrets versions list access-turnstile-secret-key --project=<env-project>
  3. Re-deploy svc-access (it will start cleanly)

Prevention: Always sequence Turnstile-related deploys as: iac-access/foundationsvc-accessapp-access-public. See ADR-0018 and ADR-0020.


Environment matrix

Repodevqaprd
iac-foundation
iac-api-gateway
iac-access
iac-studio
iac-distribution
iac-platform

Conventions

  • All iac-* repos use main as the default branch — not dev
  • PRs always target main
  • Terraflow resolves dev.tfvars for dev, qa.tfvars for qa, prd.tfvars for prd
  • GPG commit signing is via 1Password — if a commit hangs, re-authorize the 1Password SSH/GPG agent

Escalation

If a phase fails and the troubleshooting steps above don't resolve it, escalate to @shoootyou.


References