Context
Prior to RFC-007 (Plan 142), Eigenoid Access authenticated users with a single factor: a magic-link token redeemed from an email. No second factor existed. The threat model for the access portal — invitation-only, high-privilege users managing infrastructure and tenant data — required a stronger auth posture.
RFC-007 added a multi-factor authentication layer with three methods:
- TOTP (Time-based One-Time Passwords, RFC 6238) — authenticator app codes
- WebAuthn passkeys — platform biometrics or hardware security keys
- Email OTP — a 6-digit code sent by email as a recovery/backup method
Several non-trivial architectural decisions were made during implementation. This ADR documents each one and its rationale.
The full implementation landed across svc-access, app-access-public, and app-access-admin. RFC-007 is superseded by RFC-008 (auth flow redesign), which expanded the auth model; the decisions below remain the technical foundation.
Decision
Seven independent but related decisions were made as part of RFC-007. They are documented together because they form a coherent design.
1. TOTP algorithm and parameters: RFC 6238 defaults (SHA-1, 160-bit secret, 6 digits, 30 s)
TOTP tokens are generated with HMAC-SHA-1, a 160-bit (20-byte) randomly generated secret, a 6-digit output, and a 30-second period. These are the defaults defined in RFC 6238.
The primary constraint is authenticator app interoperability. Google Authenticator, Authy, 1Password, Bitwarden, and Microsoft Authenticator all implement RFC 6238 defaults. SHA-256 and SHA-512 variants exist in the spec but are not universally supported — several widely-used apps silently fall back to SHA-1 or reject the enrollment QR code entirely.
SHA-1 is deprecated for signatures and certificates; it is not deprecated for HMAC. HMAC-SHA-1 in a TOTP context is not vulnerable to the collision attacks that motivated SHA-1 deprecation — the function is used as a keyed MAC, not a one-way hash of attacker-influenced input. The risk is accepted.
2. TOTP secrets: AES-GCM encryption at rest, key in GCP Secret Manager
TOTP secrets are encrypted with AES-GCM before being stored in the totp_devices table. The encryption key lives in GCP Secret Manager and is loaded at service startup.
This is defense-in-depth: if the database is compromised without simultaneous Secret Manager compromise, TOTP seeds remain opaque. Key rotation is identified as future work — the current implementation does not automatically re-encrypt existing secrets on key rotation.
3. WebAuthn attestation policy: "none"
Registration requests are sent with attestation: "none". The server does not request or verify authenticator attestation.
Attestation verification would require maintaining the FIDO Metadata Service (FIDO MDS) allowlist, which adds operational complexity (MDS sync, stale entry handling, user lockout on unlisted devices) with no meaningful security return for this system. Eigenoid Access is invitation-only — trust is already established at the invitation step. There is no need to verify the hardware provenance of a user's authenticator.
Accepted authenticators: platform biometrics (Touch ID, Face ID, Windows Hello), hardware security keys (YubiKey, etc.), and password managers with WebAuthn support.
4. WebAuthn challenge storage: database (sessions table), not in-memory
WebAuthn registration and login flows require a challenge to be generated at /begin and verified at /finish. The challenge must be accessible across both requests.
Cloud Run auto-scales horizontally and restarts instances without notice. In-memory storage would cause verification failures when the /finish request lands on a different instance than the one that generated the challenge. The sessions table (the same store used for session tokens) is used instead.
Each challenge is stored with a short TTL (~5 minutes). A challenge record is deleted immediately upon successful verification. Failed attempts let the record expire naturally.
The cost is an extra DB round-trip on every login and registration attempt. This is accepted — the alternative (sticky sessions or a shared cache) adds operational complexity that is not warranted at current scale.
5. Session state machine: immutable transitions, new session per state change
Authentication sessions move through a defined set of states:
| State | Meaning |
|---|---|
active | Fully authenticated; all required factors verified |
pending_method_auth | Magic link redeemed; second factor not yet verified |
pending_enrollment | Admin has required a second factor; user must enroll before proceeding |
Valid transitions: magic_link_redeemed → pending_method_auth (if MFA is required), pending_method_auth → active (second factor verified), active → pending_enrollment (admin forces method enrollment).
These states were originally introduced as pending_2fa and pending_totp_enrollment in RFC-007. They were renamed to pending_method_auth and pending_enrollment by RFC-008 (Plan 147) for clarity — the states cover more than just TOTP enrollment; the same transitions apply to Passkeys and Email OTP enrollment flows as well.
Each state transition creates a new session row rather than mutating the existing one. The old session is invalidated; a new session with the target state is issued.
Immutable session records provide a complete audit trail at no extra cost — the sessions table is append-only from a logical standpoint. They also eliminate race conditions: a concurrent request holding a reference to the old session ID receives a 401 after the transition, rather than observing a mid-flight state mutation.
The cost is more rows in the sessions table. Periodic cleanup via a background job or a scheduled Cloud Run Job is identified as future work.
6. require_totp per-user boolean, configurable by admins
Each user row carries a require_totp boolean (default false). When an admin sets this flag via app-access-admin, the user's next login forces TOTP enrollment before the session transitions to active.
This allows targeted enforcement (e.g., requiring TOTP for privileged accounts) without a global rollout that would lock out users who have not yet enrolled. Admins can also reset a user's TOTP device and re-trigger enrollment from the same admin UI.
7. Email OTP as backup method: always available, short TTL
When a user has lost access to their TOTP device and has no registered passkeys, they can request a 6-digit OTP sent to their verified email address. The code is valid for a configurable window (default: 10 minutes) and is rate-limited by IP.
Email is a weaker factor than TOTP or WebAuthn. It is treated as a recovery method, not a primary MFA method — it exists solely to prevent permanent account lockout in a self-service way. Users who prefer not to use email OTP are encouraged to register multiple passkeys or maintain TOTP backup codes.
Consequences
- Broad authenticator compatibility: the TOTP SHA-1 / 30 s / 6-digit choice means any common authenticator app works out of the box. Users are not blocked by authenticator choice.
- Stateless backend compatible with Cloud Run scale-out: DB-backed challenge storage removes the need for sticky sessions or a shared in-memory cache.
- Full auth audit trail at low cost: immutable session rows make it trivial to reconstruct the exact state sequence a session passed through.
- Operational complexity increased: three MFA methods mean three enrollment and recovery paths, three sets of API endpoints, and three categories of user support ticket. This complexity was accepted as a necessary cost of strong MFA.
- TOTP key rotation not yet implemented: AES-GCM encryption of TOTP secrets is in place, but automated re-encryption on key rotation is future work. Until rotation is implemented, a compromised encryption key compromises all stored TOTP secrets.
- Email OTP weakens the MFA posture as a recovery path: a sophisticated attacker who controls a user's email can bypass TOTP/passkey. This is a known, accepted trade-off — the alternative (admin-only reset) would generate unacceptable support load.
sessionstable grows without bound until a cleanup mechanism is implemented. Accepted as future work.
Alternatives considered
- SHA-256 / SHA-512 TOTP: both are in the RFC 6238 spec. Rejected — several widely-used authenticator apps do not support them, and the security improvement over HMAC-SHA-1 in a TOTP context is marginal. Maximizing interoperability was the deciding factor.
- WebAuthn attestation
"direct"or"indirect": would allow filtering authenticators by manufacturer certificate. Rejected — the threat model (invitation-only system, trust established at invite) does not require device provenance verification. The operational cost of maintaining FIDO MDS is not justified. - In-memory challenge storage: simplest implementation. Rejected — incompatible with Cloud Run's stateless, multi-instance execution model. Challenges would be silently lost on instance restart or cross-instance request routing.
- Mutable session state (in-place update): lower row count in the
sessionstable. Rejected — it loses the audit trail and introduces potential race conditions on concurrent state-upgrade requests (e.g., two parallel/verifycalls both readingpending_method_authand both trying to writeactive). - Admin-only recovery (no self-service email OTP): higher security posture. Rejected — support load was deemed unacceptable for a small team operating an invitation-only system.
References
- Plan 142 / RFC-007:
.plans/done/142-rfc-007-auth-hardening/ - RFC 6238 — TOTP: Time-Based One-Time Password Algorithm
- Web Authentication (WebAuthn) Level 2 — W3C Recommendation
- ADR-0008 — Terraform shared infra model — producer/consumer layer pattern used in the IaC changes of this RFC
- Environment Promotion Runbook — apply sequencing required after session schema migrations
Plan 158 (RFC-008 E4f) migrated app-access-public to @simplewebauthn/browser v9 but did not update PasskeyLoginForm to use the v9 startAuthentication call signature, silently breaking passkey login end-to-end. The regression was not caught until integration testing in Plan 166.
Plan 166 fixed the regression by migrating PasskeyLoginForm to @simplewebauthn/browser v13, using the correct startAuthentication({ optionsJSON }) API. Passkey login was restored to a working state.