Context
ADR-0010 split the original monorepo into eigenoid/core and eigenoid/studio (renamed to app-studio per ADR-0011). Studio today runs as a single-user local orchestration: start.sh boots a Vite frontend + a FastAPI backend that spawns local subprocesses (eigenoid up --isolate, Docker CLI, docker exec) against a shared .studio-workspace directory, with secrets in process env vars and no auth.
Studio's strategic direction is multi-tenant SaaS as the primary delivery model (single-tenant managed and self-hosted are deferred Enterprise SKUs that reuse the same artifacts). To make that real, Studio needs to decide:
- Auth: how do users sign in, and where do tenant membership and roles live?
- Tenant data isolation: at what layer is
tenant_idenforced? - Per-tenant cloud identities: who provisions GCP service accounts for tenant runtime workloads?
- Hosting: where does the frontend live, where does the backend live, and how do they reach each other?
While Studio was being scoped, @shoootyou shipped a substantial set of platform decisions and services that constrain the answers:
- ADR-0009 — 3-project GCP architecture (
eigenoid-dev/-qa/-prd), WIF for CI auth,europe-west1, no static SA keys. - ADR-0014 — CF Workers as the API Gateway in front of
svc-access, with mandatory admin/public domain separation (*.eigenoid.servicesfor admin behind CF Access;*.eigenoid.comfor public/clients with magic-link auth), header sanitization, strict CORS, and origin-isolated tunnels (Cloud Run not publicly exposed). eigenoid/svc-access(devbranch, Go + Chi + GORM) — production access-control service with Cloudflare Access JWT for admins, magic-link sessions for members/clients (one-time tokens hashed in Postgres, exchanged for HttpOnly session cookies), per-tenant GCP service accounts auto-provisioned with key rotation (key_rotation_days, default 90) and reconciled WIF bindings, resource catalog + grants model with an async Pub/Sub-driven provisioner, and application-layer tenant isolation (no Postgres RLS — handlers run in an authenticated context that carriesclient_id, the repository layer requires it on every query, cross-tenant tests are CI gates).eigenoid/app-access-admin— React 19 + Vite + Tailwind + shadcn admin SPA on Cloudflare Pages, with a Pages Function that proxiesCf-Access-Jwt-Assertion→X-Forwarded-Cf-Access-Jwttosvc-access.
If Studio answers these questions independently (e.g., picks Firebase Auth, picks Postgres RLS, builds its own SA provisioner), the org ends up with two parallel identity models, two isolation patterns, and two IAM control planes to keep in sync. For a small team, this is gratuitous debt.
Decision
Studio adopts multi-tenant SaaS as its primary delivery model (Strategy A), and aligns its stack with svc-access and the existing platform decisions. Concretely:
1. Hosting
- Frontend: Cloudflare Pages,
studio.eigenoid.com(public/client surface). Built with Vite/React/TS/Tailwind, matchingapp-access-admin. - Backend: Cloud Run in
eigenoid-prd(ADR-0009) for production,eigenoid-devfor development. Regioneurope-west1. Backend not publicly exposed — reached via Cloudflare Tunnel + a CF Workers gateway that follows the ADR-0014 pattern (api-dev.eigenoid.comfor the public/client routes,api-dev.eigenoid.servicesfor any admin routes). Frontend readsVITE_BACKEND_URLat build time and calls the public gateway; CORS is strict-origin (studio.eigenoid.comonly in MVP). - Persistence: Cloud SQL Postgres in
eigenoid-prd, IAM auth via the GCP Cloud SQL connector, no static DB passwords. Pool sized to matchsvc-access's shape (max=25,idle=5,lifetime=5min) per Cloud Run instance. - Artifacts: Artifact Registry in
eigenoid-prdfor Studio backend image and tenant agent images. Workspace artifacts > 256 KB and build contexts go to GCS with per-tenant prefixes.
2. Auth (defer to the same model as svc-access and ADR-0014)
- Public/tenant users sign in via magic links (email-based one-time tokens, hashed in DB, exchanged for HttpOnly session cookies scoped to
.eigenoid.com). No Firebase, no third-party IdP. Studio backend issues magic links during MVP (Phase 4 of the implementation plan); a follow-up milestone moves issuance intosvc-accessso the org converges on a single emitter. - Internal admin surfaces of Studio (operations panel, support tools) sit behind Cloudflare Access on
*.eigenoid.servicesand follow the same Pages Function proxy pattern asapp-access-admin(Cf-Access-Jwt-Assertion→X-Forwarded-Cf-Access-Jwt). - Tenant membership and roles are authoritative in Postgres (Studio's own DB), never derived from token claims alone. Roles in MVP:
admin,super_adminfor tenant-staff; member roles separate. - Multi-IdP federation (SAML, OIDC) is deferred; if a customer demands it, a follow-up ADR extends this one without replacing magic links as the default.
3. Tenant data isolation: application layer (not RLS)
- Every tenant-scoped table carries
tenant_id UUID NOT NULL, indexed. - Every handler runs inside an authenticated request context that carries
tenant_id, derived server-side from the session cookie or admin JWT — never from a request body, query parameter, header, or unverified token claim. - Every query that touches tenant-scoped data goes through a repository layer that is the only place writing SQL/ORM. The repository rejects (by type contract or runtime assertion) any query to a tenant-scoped table that omits the
tenant_idfilter. - Cross-tenant integration tests are mandatory CI gates: for every tenant-scoped table, automated tests authenticated as tenant A attempt to read/write tenant B's data and assert empty results / 404.
- Background jobs run with an explicit tenant context (in the job payload) or with a clearly-named "system" context used only by a small allowlist of operational queries.
- Postgres RLS is explicitly deferred as a defense-in-depth enhancement. It is additive (
USING (tenant_id = ...)policies do not break queries that already filter), so we can layer it on later once a per-request transaction wrapper exists and oncesvc-accessand Studio agree on a common pattern. Until then, the repository chokepoint + cross-tenant tests are the load-bearing safeguard. Accepted trade-off for stack consistency withsvc-access.
4. Per-tenant GCP service accounts: delegate to svc-access
svc-access already provisions per-client and per-member SAs, manages key rotation, and reconciles WIF bindings. Studio does not build a parallel provisioner.
- When Studio deploys a tenant agent, the backend calls a
svc-accessendpoint (contract to be finalized with @shoootyou) requesting a runtime SA scoped to that tenant.svc-accessreturns the SA email and binding ack; Studio caches it and reuses it for that tenant's subsequent agent deploys until rotation. - Studio retains ownership of: egress controls (VPC connector + Cloud NAT allowlist) per agent service, deploy/runtime quotas, the
agents/namespace in Artifact Registry, the per-tenant kill switch, the Cloud Run Jobs that build agent images, the Cloud Run Services that run agents, and the binding between an agent and the specific Secret Manager entries declared by its pipeline (within the scopesvc-accesspermitted). - Fallback: if
svc-accesscannot be the authoritative agent-SA provisioner before the first hosted agent ships, Studio falls back to one Terraform-provisioned SA per tenant (created at onboarding), shared across that tenant's agents. This ADR would be revised in that case.
5. Generated-agent runtime isolation
- One Cloud Run service per agent, named
a-{tenantSlug}-{agentSlug}(≤ 49 chars, the Cloud Run service-name limit; UUIDs would overflow). - VPC connector + Cloud NAT with an outbound allowlist (LLM endpoints + tenant-declared destinations).
- Per-tenant runtime SA from §4, scoped to that tenant's secrets only.
- Hard per-tenant quotas: max agents, max requests/min, max LLM tokens/day, max build jobs/day. A kill switch per tenant and per agent.
- Audit every tool call (arguments redacted by policy), every build, every deploy, every chat invocation (action only, never message body in general logs).
- Cloud Run per-region quota (default 1000 services) is documented; cleanup script for stale agents is required before opening to non-internal users; "shared tenant runner" architecture documented as the fallback when a tenant exceeds limits.
6. License and branch model
- Studio repos adopt BSL 1.1 (matching the rest of the org).
- Branch model is
dev→main(matchingsvc-accessandapp-access-admin). Conventional Commits enforced on PR titles.
The implementation plan that turns this ADR into code lives in eigenoid/app-studio SAAS_PLAN.md (v3.2 and later).
Consequences
- One auth model in the org (CF Access JWT for admin + magic-link sessions for public). Studio reuses revocation, audit, and operational practices that
svc-accessalready runs. - One tenant-isolation pattern (application layer, repository chokepoint, cross-tenant CI tests). One way to reason about isolation across the org.
- One source of truth for per-tenant GCP identities (
svc-access). Studio is no longer a parallel IAM admin. - One hosting topology: Pages frontend + Cloud Run backend behind a CF Workers gateway, fitting cleanly into ADR-0014 and ADR-0009.
- Trade-off accepted: without Postgres RLS, a handler that forgets the
tenant_idfilter leaks data silently. The repository chokepoint + mandatory cross-tenant CI tests are the mitigation. This is the highest-attention-required risk in the data layer until RLS is layered on as defense-in-depth. - Trade-off accepted: Studio depends on
svc-accessbeing available to deploy new agents. Existing agents keep running. Mitigation: cache SA email per tenant; only first-deploy and rotation paths require the call. - Trade-off accepted: during MVP, Studio emits its own magic links (temporary duplication with
svc-access). A follow-up milestone moves issuance intosvc-access. - Eliminates the dependency on Firebase (and any other third-party IdP) and the operational cost / latency of custom-claim propagation.
- Federation with corporate IdPs (SAML, OIDC) is not available out of the box. Acceptable for the demo audience and early-customer profile; revisited if a customer demands it.
- Studio's runtime model changes radically: today's "API process spawns local subprocesses on a shared workspace" becomes "API is stateless, builds run as Cloud Run Jobs, agents are Cloud Run services proxied via CF Workers". The implementation plan stages this as spikes + a vertical slice rather than a big-bang migration to keep the local demo path intact during the transition.
- Cost at hosted-demo scale (≤ 50 internal users):
$50–175/month before LLM tokens. The surprise line item is VPC + Cloud NAT ($30–50/month); accepted as the cost of egress allowlisting.
Alternatives considered
- Firebase Auth for Studio (the original draft): rejected. Introduces a second identity model in an org that already has a working one via
svc-access; the supposed "GCP-native, fast, cheap" advantage does not compensate for fragmentation, custom-claim propagation latency, and partial-revocation semantics. - Postgres RLS as the primary isolation layer (the original draft): rejected for v1. Diverges from
svc-access. Documented as a deferred defense-in-depth enhancement, not discarded. - Studio builds its own per-tenant SA provisioner: rejected. Duplicates
svc-accesscapabilities (key rotation, WIF reconciliation, audit). Would make Studio a parallel IAM admin in GCP. - Schema-per-tenant or database-per-tenant isolation: rejected for the operational cost (per-tenant migrations, per-tenant connections, Cloud SQL cost) at the volume Studio targets in the next year.
- Direct browser-to-agent connections (skip the chat proxy): rejected. Loses central audit, central rate-limiting, and the security boundary; revisited only if proxy latency becomes a measured problem.
- Single shared agent runtime across tenants: rejected for blast-radius and IAM-scoping reasons. Documented as a fallback only when a tenant approaches the Cloud Run per-region service quota.
- Skip CF Workers gateway and expose Cloud Run directly: rejected. Violates ADR-0014's origin-isolation model and loses header sanitization + strict CORS at the edge.
References
- ADR-0009 — GCP multi-project architecture
- ADR-0010 — Split monorepo into core + studio
- ADR-0011 — Repository naming convention
- ADR-0014 — Cloudflare Workers as API Gateway
- ADR-0016 — Studio repo split (companion to this ADR)
eigenoid/svc-access— reference implementation of the session model, app-layer isolation, and per-tenant SA provisioningeigenoid/app-access-admin— reference implementation of the Pages Function CF-Access proxy patterneigenoid/app-studioSAAS_PLAN.md— implementation plan that turns this ADR into code