Gateway VM (cf-proxy + cloudflared)
Repo: iac-api-gateway
Files: gateway/vm.tf, gateway/cf_proxy.py, gateway/network.tf
A single e2-micro VM in europe-west1-b, managed by a Managed Instance Group (MIG).
cf-proxy (cf_proxy.py, port 8080)
A Python HTTP server that sits between cloudflared and Cloud Run.
On startup:
- Reads
INTERNAL_AUTH_TOKENfrom the environment (injected by the systemd unit; sourced from GCP Secret Manager at VM boot). - Loads
/etc/cf-proxy/routes.json— a hostname→Cloud Run URL map written by the Terraform startup script.
On each request:
- Checks
X-Internal-Auth-Tokenheader againstINTERNAL_AUTH_TOKEN. Returns401on mismatch. - Looks up the
Hostheader inroutes.json. Returns404if no route exists. - Fetches a GCP identity token from the instance metadata server for the target Cloud Run URL (audience = the Cloud Run URL). Tokens are cached for 55 minutes with a 5-minute early-refresh buffer.
- Drops hop-by-hop headers and the incoming
Authorizationheader. SetsAuthorization: Bearer <gcp-identity-token>. - Forwards the request to the Cloud Run service.
Health endpoint: GET /healthz → {"status":"ok"} — used for internal readiness checks.
routes.json format (written at VM boot from vm.tf):
{
"api-admin-internal-dev.eigenoid.services": "https://svc-access-<hash>-ew.a.run.app",
"api-public-internal-dev.eigenoid.services": "https://svc-access-public-<hash>-ew.a.run.app"
}
routes.jsonis templated by Terraform fromdata.google_cloud_run_v2_service.*.urilookups. Changingroutes_jsoninvm.tfchanges the startup script, which produces a new instance template, which triggers a PROACTIVE MIG replacement.
cloudflared (port 2000 for metrics)
The Cloudflare tunnel daemon. Connects to Cloudflare edge via QUIC (UDP 7844) with HTTP/2 fallback (TCP 443). It depends on cf-proxy.service in systemd — cloudflared only starts after cf-proxy is ready.
Health check: GET :2000/ready — returns 200 only when the tunnel is connected. The MIG autohealing policy polls this endpoint; if the VM fails the check after a 300-second initial delay, it is automatically replaced.
MIG configuration
| Setting | Value |
|---|---|
| Zone | europe-west1-b |
| Machine type | e2-micro |
| Update policy | PROACTIVE / REPLACE |
| Max surge | 1 |
| Max unavailable | 0 |
| Autohealing delay | 300 s |
| Health check | GET :2000/ready |
PROACTIVE update means any terraform apply that changes the startup script (e.g. routes_json or a new cloudflared version) will automatically replace the running VM with zero downtime — one new VM is created before the old one is removed.
VM network tags (firewall rule selectors)
| Tag | Purpose |
|---|---|
allow-cf-tunnel | Egress TCP 443 + TCP/UDP 7844 to Cloudflare |
allow-pga | Egress TCP 443 to Google restricted VIPs (Cloud Run, Secret Manager) |
allow-github-download | Egress TCP 443 to GitHub (cloudflared binary download at boot) |
allow-health-check | Ingress from GCP health check probers |
allow-iap-ssh | Ingress SSH via Identity-Aware Proxy (for debugging) |
Related pages
- Networking — full firewall rule breakdown and subnet details
- Operations — how to force-replace a VM
- Adding a new API — how to add routes to cf-proxy