Skip to main content

When to use

When you need to operate, update, troubleshoot, or recover the safe-settings service on Cloud Run.

Preconditions

Overview

safe-settings runs as a container on Cloud Run in the eigenoid-prd project. All infrastructure is managed as Terraform in iac-platform (see ADR-0013).

FieldValue
Serviceeigenoid-safe-settings
Projecteigenoid-prd
Regioneurope-west1
Port3000
Min instances1
Max instances3
IMAGE_TAG2.1.20-rc.3
CRON0 0 */6 * * *
Cost~$5-10/month

Procedure: Image updates

To update the safe-settings version:

1. Check for a new version

gh api repos/github/safe-settings/releases/latest --jq '.tag_name'
bash

2. Update IMAGE_TAG

cd iac-platform
git checkout -b chore/update-safe-settings-image
echo "NEW_VERSION" > safe-settings/IMAGE_TAG
git add safe-settings/IMAGE_TAG
git commit -m "chore: update safe-settings to NEW_VERSION"
git push origin chore/update-safe-settings-image
bash

3. Open a PR

The Terraform plan will show the image change in Cloud Run. Verify that:

  • Only the image changes (google_cloud_run_v2_service.safe_settings[0])
  • There are no unexpected changes to secrets or config

4. Merge

The merge triggers:

  1. Image mirroring: copies the image from GHCR to Artifact Registry.
  2. Terraform apply: updates the Cloud Run revision with the new image.

5. Verify

# Verify active revision
gcloud run services describe eigenoid-safe-settings \
--region europe-west1 --project eigenoid-prd \
--format="value(status.latestReadyRevisionName)"

# Verify logs from the new container
gcloud run services logs read eigenoid-safe-settings \
--region europe-west1 --project eigenoid-prd \
--limit 10
bash

Procedure: Secret rotation

Rotate PRIVATE_KEY (most frequent)

  1. Generate a new key on the App page → Private keys → Generate a private key.

  2. Create a new version in Secret Manager:

    gcloud secrets versions add safe-settings-private-key \
    --data-file=/path/to/new.pem \
    --project=eigenoid-prd
    bash
  3. Re-deploy Cloud Run to pick up the new version:

    gcloud run services update eigenoid-safe-settings \
    --region europe-west1 --project eigenoid-prd \
    --update-secrets "PRIVATE_KEY=safe-settings-private-key:latest"
    bash
  4. Verify: push a change to the admin repo and confirm the sync works.

  5. Delete the old key on the App page → Private keys → Delete.

Raw PEM

The private key must be raw PEM (starts with -----BEGIN RSA PRIVATE KEY-----). Probot rejects base64.

  1. Update the secret in platform-settings (used by governance workflows):

    gh secret set SETTINGS_BOT_PRIVATE_KEY \
    --repo eigenoid/platform-settings \
    < /path/to/new.pem
    bash

Rotate WEBHOOK_SECRET

  1. Generate a new secret: openssl rand -hex 32

  2. Update in GitHub App webhook config: App settings → Webhook → Webhook secret

  3. Update in Secret Manager:

    echo -n "NEW_SECRET" | gcloud secrets versions add safe-settings-webhook-secret \
    --data-file=- --project=eigenoid-prd
    bash
  4. Re-deploy Cloud Run (same command as above, changing the secret name)

Rotate APP_ID

Only needed if the GitHub App is recreated. Update in Secret Manager and re-deploy.

Procedure: Webhook configuration

The GitHub App webhook must point to:

FieldValue
URLhttps://eigenoid-safe-settings-{hash}.europe-west1.run.app/api/github/webhooks
Content typeapplication/json
Secret(value from Secret Manager)
ActiveYes
Path

The path must be /api/github/webhooks, not root /. If the path is wrong, safe-settings will not receive events.

Enabled events (9)

push, repository, repository_ruleset, branch_protection_rule, pull_request, check_run, check_suite, member, team

Troubleshooting

Cloud Run is not responding

  1. Check logs:
    gcloud run services logs read eigenoid-safe-settings \
    --region europe-west1 --project eigenoid-prd --limit 50
    bash
  2. Check instances: if min-instances=0 (error), webhooks fail due to cold start (GitHub's 10s timeout).
  3. Check IAM: invoker_iam_disabled must be true (the org policy blocks the allUsers binding).

Webhook is not arriving

  1. Check delivery logs: App settings → Advanced → Recent Deliveries.
  2. Check URL: must end in /api/github/webhooks.
  3. Check active: the webhook must be enabled.
  4. Check events: all 9 listed events must be selected.

Settings are not being applied

  1. Check CRON: automatic sync runs every 6 hours. To force a sync, push a change to a file in .github/ of the admin repo.
  2. Check config syntax: a YAML error in settings.yml or repos/*.yml can silently prevent application.
  3. Check repo permissions: safe-settings needs Administration: Read & Write on each repo.

Image mirroring fails

  1. Check the mirror SA: the mirroring SA needs roles/artifactregistry.writer.
  2. Check the WIF binding: the SA must be able to authenticate via WIF.
  3. Check the digest: on Apple Silicon, docker pull --platform linux/amd64 may pull arm64. Use an explicit sha256 digest.

Error: "A JSON web token could not be decoded" (401)

The private key is invalid or expired. Follow the PRIVATE_KEY rotation procedure above.

Emergency procedures

Image rollback

# List available revisions
gcloud run revisions list --service=eigenoid-safe-settings \
--region europe-west1 --project eigenoid-prd

# Route traffic to a previous revision
gcloud run services update-traffic eigenoid-safe-settings \
--region europe-west1 --project eigenoid-prd \
--to-revisions=REVISION_NAME=100
bash

Disable webhook (emergency)

On the App page → Webhook → uncheck "Active". This stops all events — safe-settings will not apply config until reactivated.

Force a manual sync

cd platform-settings
echo "# sync trigger $(date)" >> .github/settings.yml
git commit -am "fix: force config sync" && git push
bash

Verification

  • Logs show a successful sync after a push to the admin repo.
  • A manual change to a ruleset is reverted within seconds (drift prevention).
  • CRON sync runs every 6 hours (verify in logs).
  • Dry-run check appears on PRs to the admin repo.

Escalation

SituationAction
Cloud Run not servingCheck logs, attempt rollback. Contact @shoootyou.
Webhook brokenCheck URL, secret, events. If unresolved, temporarily disable the webhook.
Config corruptionRevert the commit in platform-settings and force a sync.

References