Overview
Environment promotion moves infrastructure changes through dev → qa → prd using the Terraflow CI pipeline. All iac-* repos use Terraflow: comment /terraflow promote <env> on a PR to trigger a plan, review it, then approve to apply.
Each environment must be healthy before promoting to the next. Order within an environment matters — dependencies between repos mean that applying in the wrong sequence causes permission errors and plan failures.
Pre-flight checklist
Before promoting any environment, confirm all of the following:
-
dev is healthy — services responding, no errors in logs
-
All PRs are merged to
mainfor every repo you're about to promote -
GitHub environment variables are configured for the target environment in each repo (
Settings → Environments → <env>):Variable Example GCP_PROJECT_IDeigenoid-qaGCP_SA_EMAILterraform-ci@eigenoid-qa.iam.gserviceaccount.comGCP_WIF_PROVIDERprojects/…/providers/githubTF_BACKEND_BUCKETeigenoid-2cea55-foundation-tfstate-qa -
For first-time QA/PRD setup: the GCP project exists and billing is enabled
Promotion order
Dependencies between repos cause failures if applied out of sequence. Two critical ordering rules:
iac-accessmust come afteriac-foundation/platform-wif— the CI service account needsroles/iam.roleAdminbefore any custom role creation.iac-api-gatewaymust come afteriac-studio/cloud-run— the gateway plan performs a live lookup of thesvc-studioCloud Run service and fails immediately if it doesn't exist yet. See gotcha #7.
Phase 1 — Foundation
No external dependencies. Must complete entirely before any other repo.
| Step | Repo | Layer | What it creates |
|---|---|---|---|
| 1.1 | iac-foundation | platform-wif | WIF pool, deploy CI SA, IAM role bindings (incl. iam.roleAdmin) |
| 1.2 | iac-foundation | network | VPC, subnets, firewall rules, NAT |
| 1.3 | iac-foundation | service-wif | Per-service WIF bindings (depends on platform-wif) |
| 1.4 | iac-foundation | governance-sa | Governance service accounts (depends on platform-wif) |
platform-wif must finish before anything else. It creates the CI service account and grants it the permissions every downstream repo needs to deploy.
Phase 2 — Studio
| Step | Repo | Layer | What it creates |
|---|---|---|---|
| 2.1 | iac-studio | project-base | Base project config |
| 2.2 | iac-studio | artifact-registry, secret-manager | AR repositories, secrets (depend on project-base) |
| 2.3 | iac-studio | cloud-sql | Database (depends on project-base) |
| 2.4 | iac-studio | cloud-run | Cloud Run service (depends on cloud-sql, artifact-registry) |
| 2.5 | iac-studio | dns | DNS records (depends on cloud-run) |
| 2.6 | iac-studio | cloudflare | Cloudflare config |
cloudflare layerThe cloudflare_workers_custom_domain resource requires at least one Worker deployment to exist. If this is a new environment, deploy the app-studio-frontend-{env} Worker via wrangler deploy or the app-studio-frontend CI pipeline before applying the cloudflare layer. Skipping this step produces:
100124: Cannot attach custom domain: Worker has no deployments
cloud-run layer is completeiac-api-gateway performs a live API lookup of the svc-studio Cloud Run service at plan time. If the service does not exist in the target project, the gateway plan fails immediately. See gotcha #7.
Phase 3 — API Gateway
| Step | Repo | Layer | What it creates |
|---|---|---|---|
| 3.1 | iac-api-gateway | gateway | cloudflared VM, CF tunnel, CF DNS records, CF Access apps |
Access services route through the gateway tunnel. iac-access/service will fail without a healthy gateway.
Phase 4 — Access Platform
| Step | Repo | Layer | What it creates |
|---|---|---|---|
| 4.1 | iac-access | foundation | Secret Manager secrets, Cloud SQL, IAM custom roles, service accounts |
| 4.2 | iac-access | service | Cloud Run services (admin + public), env vars, secret bindings (depends on foundation) |
| 4.3 | iac-access | public-portal | Public portal CF Worker + custom domain (depends on service) |
The cloudflare_workers_custom_domain resource requires at least one Worker deployment to exist. If this is a new environment, deploy the app-access-admin-{env} and app-access-public-{env} Workers via wrangler deploy or the app-access-public CI pipeline before applying public-portal. Skipping this step produces:
100124: Cannot attach custom domain: Worker has no deployments
Phase 5 — Distribution
| Step | Repo | Layer | What it creates |
|---|---|---|---|
| 5.1 | iac-distribution | artifact-registry | Distribution AR |
| 5.2 | iac-distribution | sample-artifact-registry | Sample AR |
Phase 6 — Platform (PRD only)
| Step | Repo | Layer | Notes |
|---|---|---|---|
| 6.1 | iac-platform | safe-settings | Only deployed to prd — no dev or qa environment |
Application deployments
Application code (Docker images, Cloudflare Workers) is deployed through separate CI/CD pipelines — not by Terraflow. Several IaC layers have hard dependencies on app code existing before they can plan or apply successfully. This section documents those dependencies.
When app deployments are required
| Before this IaC layer | You must first deploy | How |
|---|---|---|
iac-studio/cloud-run | svc-studio Docker image (must exist in Artifact Registry) | Trigger the svc-studio CI pipeline to build and push the image |
iac-studio/cloudflare | app-studio-frontend-{env} CF Worker | wrangler deploy or app-studio-frontend CI |
iac-access/public-portal | app-access-admin-{env} and app-access-public-{env} CF Workers | wrangler deploy or app-access-public CI |
The iac-api-gateway/gateway dependency on svc-studio is resolved by the IaC phase reordering (Phase 2 creates the Cloud Run service before Phase 3 plans the gateway).
Greenfield interleaved order
For a brand-new environment (QA or PRD), IaC and app deployments must be interleaved. Follow this exact sequence:
| Step | Action | Type |
|---|---|---|
| 1 | iac-foundation — all layers | IaC |
| 2 | iac-studio — project-base, artifact-registry, secret-manager | IaC |
| 3 | Build and push svc-studio Docker image to Artifact Registry; deploy Cloud Run service | App CI |
| 4 | iac-studio — cloud-sql, cloud-run | IaC |
| 5 | Deploy app-studio-frontend-{env} Worker | App CI |
| 6 | iac-studio — cloudflare | IaC |
| 7 | iac-api-gateway — gateway | IaC |
| 8 | iac-access — foundation, service | IaC |
| 9 | Deploy app-access-admin-{env} and app-access-public-{env} Workers | App CI |
| 10 | iac-access — public-portal | IaC |
| 11 | iac-distribution — all layers | IaC |
How to promote
Via slash command (standard)
# On the PR, post a comment:
/terraflow promote qa # or: prd
# Terraflow will:
# 1. Run terraform plan against the target environment
# 2. Post the plan output as a PR comment
# 3. Apply automatically once you approve the plan
Via workflow dispatch (manual fallback)
GitHub → Actions → Terraform → Run workflow
Action: promote
Target environment: qa (or prd)
Verification
Run these after each phase before moving on.
Phase 1 — Foundation
gcloud iam service-accounts list --project=eigenoid-{env}
# Expect: terraform-ci@, governance-*, service-wif-* accounts listed
Phase 2 — Studio
gcloud run services list \
--project=eigenoid-{env} \
--region=europe-west1
# Expect: svc-studio listed and READY
Phase 3 — API Gateway
curl -s https://api-{env}.eigenoid.services/health
# Expect: 200 OK
Phase 4 — Access Platform
gcloud run services list \
--project=eigenoid-{env} \
--region=europe-west1
# Expect: svc-access-admin and svc-access-public listed and READY
Phase 5 — Distribution
gcloud artifacts repositories list --project=eigenoid-{env}
# Expect: distribution and sample-distribution repos listed
Known gotchas
1. iam.roles.create permission denied
Symptom: Terraform apply fails creating a custom IAM role in any repo other than iac-foundation.
Cause: iac-foundation/platform-wif hasn't been applied yet. The CI service account needs roles/iam.roleAdmin to create custom project roles downstream.
Fix: Apply iac-foundation/platform-wif first. Always.
2. CF Worker custom domain — error 100124
Symptom: iac-access/public-portal apply fails with:
100124: Cannot attach custom domain: Worker has no deployments
Fix: Deploy the Worker code before the IaC layer that creates the custom domain binding. Use wrangler deploy or trigger the app-access-public CI pipeline.
3. Subnet CIDR conflict
Symptom: Terraform fails adding a subnet because the CIDR overlaps with an existing one in the project.
Cause: GCP does not allow overlapping CIDRs across regions. Terraform's destroy/create cycle during region changes can leave the old subnet in place long enough to conflict.
Fix: Change the CIDR in tfvars before applying. Pick a range that doesn't overlap with any existing subnet in the project.
4. CF Secrets Store 400 — secret_name_already_exists
Symptom: Terraform apply fails on a Cloudflare secret with HTTP 400 secret_name_already_exists.
Cause: The Cloudflare Secrets Store POST /secrets endpoint is not idempotent — it creates, it does not upsert.
Fix: Delete the secret in the Cloudflare dashboard (or via API), then re-apply.
5. Cloud Run min CPU error
Symptom: Terraform apply fails with a Cloud Run error about insufficient CPU when concurrency > 1.
Cause: Cloud Run requires total CPU ≥ 1 vCPU when concurrency is greater than 1. Removing a sidecar can push the total below 1 vCPU.
Fix: Increase the main container's CPU allocation to compensate before applying.
6. Terraflow tfvars path
Symptom: Terraflow cannot find variables for the target environment.
Cause: Wrong path. Terraflow expects:
<layer>/tfvars/<env>.tfvars ✅ correct
<layer>/envs/<env>.tfvars ❌ wrong
Fix: Ensure all .tfvars files are under <layer>/tfvars/.
7. svc-studio data source not found in gateway plan
Symptom: iac-api-gateway terraform plan fails with:
Error: projects/eigenoid-{env}/locations/europe-west1/services/svc-studio not found
Cause: iac-api-gateway/gateway/main.tf declares data "google_cloud_run_v2_service" "studio" — a live API lookup that runs at plan time. If svc-studio has not been deployed to the target environment, the plan fails before any resources are evaluated.
Fix: Apply iac-studio through the cloud-run layer to the target environment before promoting iac-api-gateway. This is why Studio is Phase 2 and Gateway is Phase 3 in the promotion order.
8. svc-access crash-loops after Turnstile widget change
Symptom: Cloud Run service access-public enters crash-loop immediately on start after a deploy to a new environment or after a Turnstile widget recreate.
Root cause: Since Plan 163, TURNSTILE_SECRET_KEY is required at startup unconditionally in all environments (the previous if env != production bypass was removed). If iac-access/foundation apply has not run for the target environment, the secret access-turnstile-secret-key in GCP Secret Manager has no ENABLED version, Cloud Run reads an empty value, and svc-access/cmd/public/main.go exits with os.Exit(1).
Fix:
- Run
terraform applyiniac-access/foundationfor the target environment - Verify the secret has an ENABLED version:
gcloud secrets versions list access-turnstile-secret-key --project=<env-project> - Re-deploy
svc-access(it will start cleanly)
Prevention: Always sequence Turnstile-related deploys as: iac-access/foundation → svc-access → app-access-public. See ADR-0018 and ADR-0020.
Environment matrix
| Repo | dev | qa | prd |
|---|---|---|---|
iac-foundation | ✅ | ✅ | ✅ |
iac-api-gateway | ✅ | ✅ | — |
iac-access | ✅ | ✅ | ✅ |
iac-studio | ✅ | ✅ | — |
iac-distribution | ✅ | ✅ | ✅ |
iac-platform | — | — | ✅ |
Conventions
- All
iac-*repos usemainas the default branch — notdev - PRs always target
main - Terraflow resolves
dev.tfvarsfor dev,qa.tfvarsfor qa,prd.tfvarsfor prd - GPG commit signing is via 1Password — if a commit hangs, re-authorize the 1Password SSH/GPG agent
Escalation
If a phase fails and the troubleshooting steps above don't resolve it, escalate to @shoootyou.