Manage safe-settings deployment as IaC on Cloud Run with automated image mirroring

Context

ADR-0006 established safe-settings on Cloud Run as the source of truth for GitHub configuration. The initial deployment was manual: pull the image from GHCR, tag for Artifact Registry, push manually, and gcloud run services update from the terminal.

This manual process had several problems:

Not reproducible: the service state depended on who ran which command and when. There was no declarative state.
No version tracking: the image on Cloud Run could be any version. There was no source of truth for which version was running.
No update pipeline: updating the image required ~5 manual commands (pull, tag, push, update service, verify). A different operator could forget a step.
Manual secrets: the secrets (APP_ID, PRIVATE_KEY, WEBHOOK_SECRET) were managed ad-hoc in GCP Secret Manager. No Terraform declared them.

Decision

We manage the full safe-settings deployment as Terraform on Cloud Run, in the iac-platform repo, with an automated image mirroring pattern from GHCR to Artifact Registry.

Architecture

iac-platform/safe-settings/
├── main.tf              ← Cloud Run service, AR repo, secrets, IAM
├── mirror.tf            ← Image mirroring workflow (GHCR → AR)
├── IMAGE_TAG            ← Source of truth for the version (e.g., "2.1.20-rc.3")
├── variables.tf
├── outputs.tf
├── providers.tf
├── backend.tf
├── versions.tf
└── tfvars/
    └── prd.tfvars       ← deploy_service=true, project=eigenoid-prd

2-phase pattern

Phase 1 — Infrastructure (always active):

Artifact Registry (Docker repo)
Secret Manager (3 secrets: app-id, private-key, webhook-secret)
Service Account for Cloud Run
IAM bindings

Phase 2 — Service (activated by deploy_service = true):

Cloud Run service with the AR image
Environment variables (GH_ORG, ADMIN_REPO, LOG_LEVEL, CRON)
Secret references (APP_ID, PRIVATE_KEY, WEBHOOK_SECRET via Secret Manager)

The phase separation allows creating the infrastructure first (Phase 1), populating secrets manually, and then activating the service (Phase 2) without a chicken-and-egg problem.

`IMAGE_TAG` as source of truth

The IMAGE_TAG file contains the exact safe-settings version:

2.1.20-rc.3

To update the version:

Edit IMAGE_TAG with the new version.
Open a PR → Terraform plan shows the image change.
Merge → mirroring workflow copies the image from GHCR to AR.
Terraform apply updates the Cloud Run revision.

Image mirroring

Cloud Run cannot pull from GHCR. The mirroring workflow:

Reads IMAGE_TAG from the repo.
Pulls the image from ghcr.io/github/safe-settings:{tag} (using digest to force amd64).
Tags and pushes to europe-west1-docker.pkg.dev/eigenoid-prd/safe-settings-docker/safe-settings:{tag}.
Terraform references the AR image.

Current service data

Field	Value
GCP Project	`eigenoid-prd`
Region	`europe-west1`
Service name	`eigenoid-safe-settings`
Port	3000
Min instances	1 (GitHub webhook timeout is 10s)
Max instances	3
IMAGE_TAG	`2.1.20-rc.3`
CRON	`0 0 /6 * *` (sync every 6h)

Consequences

Fully declarative state: all safe-settings infrastructure is Terraform. terraform plan shows exactly what would change.
Trivial rollback: revert IMAGE_TAG to a previous version, merge, and Terraform deploys the old version. Cloud Run keeps the latest revisions accessible.
Integrated secret management: secrets are declared in Terraform (structure), populated once manually (value), and referenced by the service automatically. Rotation is: new version in Secret Manager → Terraform apply → new revision.
Audit trail: every version or config change is a PR with a visible plan. No ad-hoc gcloud run services update.
Mirroring complexity: the image must be copied from GHCR to AR. An extra step compared to direct deployment. Mitigated by: automated workflow.
Production only: safe-settings has no staging environments. It runs only in eigenoid-prd with deploy_service = true. This is intentional — safe-settings manages the GitHub org, which is a single entity.
invoker_iam_disabled: the GCP org policy blocks allUsers IAM binding on Cloud Run. safe-settings validates webhooks via HMAC (WEBHOOK_SECRET), so IAM invocation is not required.

Alternatives considered

Manual deploy (status quo): works for a single service but does not scale. No declarative state, no audit trail, no trivial rollback.
Cloud Run source deploy: Google Cloud Buildpacks. Requires source code, not a pre-built image. safe-settings is distributed as an image, not as code.
GKE / Cloud Run Jobs: overkill for a stateless service that receives webhooks. Cloud Run with min-instances=1 is sufficient and costs ~$5/month.
Self-hosted runner with Docker Compose: possible but adds a server to maintain. Cloud Run is serverless.

References

ADR-0006 — safe-settings for GitHub governance — decision to adopt safe-settings
ADR-0008 — Shared IaC model with Terraform — producer/consumer model under which iac-platform operates
Runbook: safe-settings operations — operational procedures
Settings Bot — GitHub App technical sheet
GitHub governance — Operational guide — full operational guide
eigenoid/iac-platform — Terraform repo
github/safe-settings — upstream project

Context​

Decision​

Architecture​

2-phase pattern​

IMAGE_TAG as source of truth​

Image mirroring​

Current service data​

Consequences​

Alternatives considered​

References​