Skip to main content

Gateway VM (cf-proxy + cloudflared)

Repo: iac-api-gateway
Files: gateway/vm.tf, gateway/cf_proxy.py, gateway/network.tf

A single e2-micro VM in europe-west1-b, managed by a Managed Instance Group (MIG).

cf-proxy (cf_proxy.py, port 8080)

A Python HTTP server that sits between cloudflared and Cloud Run.

On startup:

  1. Reads INTERNAL_AUTH_TOKEN from the environment (injected by the systemd unit; sourced from GCP Secret Manager at VM boot).
  2. Loads /etc/cf-proxy/routes.json — a hostname→Cloud Run URL map written by the Terraform startup script.

On each request:

  1. Checks X-Internal-Auth-Token header against INTERNAL_AUTH_TOKEN. Returns 401 on mismatch.
  2. Looks up the Host header in routes.json. Returns 404 if no route exists.
  3. Fetches a GCP identity token from the instance metadata server for the target Cloud Run URL (audience = the Cloud Run URL). Tokens are cached for 55 minutes with a 5-minute early-refresh buffer.
  4. Drops hop-by-hop headers and the incoming Authorization header. Sets Authorization: Bearer <gcp-identity-token>.
  5. Forwards the request to the Cloud Run service.

Health endpoint: GET /healthz{"status":"ok"} — used for internal readiness checks.

routes.json format (written at VM boot from vm.tf):

{
"api-admin-internal-dev.eigenoid.services": "https://svc-access-<hash>-ew.a.run.app",
"api-public-internal-dev.eigenoid.services": "https://svc-access-public-<hash>-ew.a.run.app"
}
json

routes.json is templated by Terraform from data.google_cloud_run_v2_service.*.uri lookups. Changing routes_json in vm.tf changes the startup script, which produces a new instance template, which triggers a PROACTIVE MIG replacement.

cloudflared (port 2000 for metrics)

The Cloudflare tunnel daemon. Connects to Cloudflare edge via QUIC (UDP 7844) with HTTP/2 fallback (TCP 443). It depends on cf-proxy.service in systemd — cloudflared only starts after cf-proxy is ready.

Health check: GET :2000/ready — returns 200 only when the tunnel is connected. The MIG autohealing policy polls this endpoint; if the VM fails the check after a 300-second initial delay, it is automatically replaced.

MIG configuration

SettingValue
Zoneeurope-west1-b
Machine typee2-micro
Update policyPROACTIVE / REPLACE
Max surge1
Max unavailable0
Autohealing delay300 s
Health checkGET :2000/ready

PROACTIVE update means any terraform apply that changes the startup script (e.g. routes_json or a new cloudflared version) will automatically replace the running VM with zero downtime — one new VM is created before the old one is removed.

VM network tags (firewall rule selectors)

TagPurpose
allow-cf-tunnelEgress TCP 443 + TCP/UDP 7844 to Cloudflare
allow-pgaEgress TCP 443 to Google restricted VIPs (Cloud Run, Secret Manager)
allow-github-downloadEgress TCP 443 to GitHub (cloudflared binary download at boot)
allow-health-checkIngress from GCP health check probers
allow-iap-sshIngress SSH via Identity-Aware Proxy (for debugging)