Skip to main content

Context

ADR-0006 established safe-settings on Cloud Run as the source of truth for GitHub configuration. The initial deployment was manual: pull the image from GHCR, tag for Artifact Registry, push manually, and gcloud run services update from the terminal.

This manual process had several problems:

  • Not reproducible: the service state depended on who ran which command and when. There was no declarative state.
  • No version tracking: the image on Cloud Run could be any version. There was no source of truth for which version was running.
  • No update pipeline: updating the image required ~5 manual commands (pull, tag, push, update service, verify). A different operator could forget a step.
  • Manual secrets: the secrets (APP_ID, PRIVATE_KEY, WEBHOOK_SECRET) were managed ad-hoc in GCP Secret Manager. No Terraform declared them.

Decision

We manage the full safe-settings deployment as Terraform on Cloud Run, in the iac-platform repo, with an automated image mirroring pattern from GHCR to Artifact Registry.

Architecture

iac-platform/safe-settings/
├── main.tf ← Cloud Run service, AR repo, secrets, IAM
├── mirror.tf ← Image mirroring workflow (GHCR → AR)
├── IMAGE_TAG ← Source of truth for the version (e.g., "2.1.20-rc.3")
├── variables.tf
├── outputs.tf
├── providers.tf
├── backend.tf
├── versions.tf
└── tfvars/
└── prd.tfvars ← deploy_service=true, project=eigenoid-prd

2-phase pattern

Phase 1 — Infrastructure (always active):

  • Artifact Registry (Docker repo)
  • Secret Manager (3 secrets: app-id, private-key, webhook-secret)
  • Service Account for Cloud Run
  • IAM bindings

Phase 2 — Service (activated by deploy_service = true):

  • Cloud Run service with the AR image
  • Environment variables (GH_ORG, ADMIN_REPO, LOG_LEVEL, CRON)
  • Secret references (APP_ID, PRIVATE_KEY, WEBHOOK_SECRET via Secret Manager)

The phase separation allows creating the infrastructure first (Phase 1), populating secrets manually, and then activating the service (Phase 2) without a chicken-and-egg problem.

IMAGE_TAG as source of truth

The IMAGE_TAG file contains the exact safe-settings version:

2.1.20-rc.3

To update the version:

  1. Edit IMAGE_TAG with the new version.
  2. Open a PR → Terraform plan shows the image change.
  3. Merge → mirroring workflow copies the image from GHCR to AR.
  4. Terraform apply updates the Cloud Run revision.

Image mirroring

Cloud Run cannot pull from GHCR. The mirroring workflow:

  1. Reads IMAGE_TAG from the repo.
  2. Pulls the image from ghcr.io/github/safe-settings:{tag} (using digest to force amd64).
  3. Tags and pushes to europe-west1-docker.pkg.dev/eigenoid-prd/safe-settings-docker/safe-settings:{tag}.
  4. Terraform references the AR image.

Current service data

FieldValue
GCP Projecteigenoid-prd
Regioneurope-west1
Service nameeigenoid-safe-settings
Port3000
Min instances1 (GitHub webhook timeout is 10s)
Max instances3
IMAGE_TAG2.1.20-rc.3
CRON0 0 */6 * * * (sync every 6h)

Consequences

  • Fully declarative state: all safe-settings infrastructure is Terraform. terraform plan shows exactly what would change.
  • Trivial rollback: revert IMAGE_TAG to a previous version, merge, and Terraform deploys the old version. Cloud Run keeps the latest revisions accessible.
  • Integrated secret management: secrets are declared in Terraform (structure), populated once manually (value), and referenced by the service automatically. Rotation is: new version in Secret Manager → Terraform apply → new revision.
  • Audit trail: every version or config change is a PR with a visible plan. No ad-hoc gcloud run services update.
  • Mirroring complexity: the image must be copied from GHCR to AR. An extra step compared to direct deployment. Mitigated by: automated workflow.
  • Production only: safe-settings has no staging environments. It runs only in eigenoid-prd with deploy_service = true. This is intentional — safe-settings manages the GitHub org, which is a single entity.
  • invoker_iam_disabled: the GCP org policy blocks allUsers IAM binding on Cloud Run. safe-settings validates webhooks via HMAC (WEBHOOK_SECRET), so IAM invocation is not required.

Alternatives considered

  • Manual deploy (status quo): works for a single service but does not scale. No declarative state, no audit trail, no trivial rollback.
  • Cloud Run source deploy: Google Cloud Buildpacks. Requires source code, not a pre-built image. safe-settings is distributed as an image, not as code.
  • GKE / Cloud Run Jobs: overkill for a stateless service that receives webhooks. Cloud Run with min-instances=1 is sufficient and costs ~$5/month.
  • Self-hosted runner with Docker Compose: possible but adds a server to maintain. Cloud Run is serverless.

References