From 493f2bfdcacbcb934858b519556073c8b8ef9951 Mon Sep 17 00:00:00 2001
From: Greg Shear <greg@estuary.dev>
Date: Wed, 15 Apr 2026 17:47:28 -0400
Subject: [PATCH] RFCs

---
 plans/api-deprecation.md  |  71 +++++
 plans/orthogonal-authz.md |  59 ++++
 plans/service-accounts.md | 556 ++++++++++++++++++++++++++++++++++++++
 plans/support-role.md     | 293 ++++++++++++++++++++
 plans/user-management.md  | 201 ++++++++++++++
 5 files changed, 1180 insertions(+)
 create mode 100644 plans/api-deprecation.md
 create mode 100644 plans/orthogonal-authz.md
 create mode 100644 plans/service-accounts.md
 create mode 100644 plans/support-role.md
 create mode 100644 plans/user-management.md

diff --git a/plans/api-deprecation.md b/plans/api-deprecation.md
new file mode 100644
index 00000000000..05bf92e7d1a
--- /dev/null
+++ b/plans/api-deprecation.md
@@ -0,0 +1,71 @@
+# API Deprecation Lifecycle
+
+## Executive Summary
+
+Estuary maintains an evolving product API, but today we have no mechanism to retire an endpoint once it's in use. The immediate motivator is the user-management migration from PostgREST to GraphQL — flowctl users are still hitting the old PostgREST endpoints, and we have no systematic way to detect that or steer them to the replacement.
+
+Supabase logs show who's calling what, but only with seven days of retention — not long enough to track adoption of a replacement endpoint. Communicating deprecation to customers is either a mass email or relies on institutional knowledge of which customers happen to be using which APIs.
+
+This plan establishes a general-purpose deprecation lifecycle for the control-plane API. The challenge: while we control the dashboard UI and can migrate it to new endpoints on our own schedule, flowctl is installed on customer machines and older versions will continue to call deprecated endpoints indefinitely unless we give ourselves a way to see them and reach their operators.
+
+- Engineering gets visibility into which tenants (and which flowctl versions) are calling a given endpoint - once request volume drops below some acceptable threshold, we can remove the endpoint.
+- Deprecated endpoints announce themselves via standard `Deprecation`/`Sunset` response headers.
+- flowctl surfaces those headers noisily, printing a stderr warning on every response from a deprecated endpoint so the signal reaches the operator running the command or reading CI logs.
+- Affected customers get targeted outreach — automated, periodic email alerts with increasing frequency as the sunset date approaches — specific tenants still calling a deprecated endpoint hear about it directly.
+
+At current scale, flowctl adoption is small enough that watching call volume in Loki and reaching out to affected customers is the primary enforcement mechanism. We'll hold off on Sunset headers and actual endpoint removal until the warning (P1) and alerting (P3) machinery is live and broadly adopted. Until then, deprecation headers plus human support follow-up is sufficient.
+
+## Technical Notes
+
+### Signaling deprecation to API consumers
+
+Both PostgREST and GraphQL endpoints return standard `Deprecation` and `Sunset` headers. GraphQL additionally marks deprecated operations and fields in the schema itself, so schema-aware clients get the signal through introspection as well. Successor information (e.g. "use the `listConnectors` GraphQL operation instead") is stored in the deprecation table and surfaced in flowctl warnings and alert emails rather than via a `Link` header — GraphQL operations don't have their own URLs, so a link isn't meaningful.
+
+### PostgREST deprecation headers set via pre-request function
+
+PostgREST supports a `db-pre-request` configuration — a Postgres function that runs before every request and can set response headers via `set_config('response.headers', ...)`. We use this to inject deprecation headers.
+
+The deprecation metadata lives in a `deprecated_endpoints` table — endpoint path, deprecation date, optional sunset date, and a human-readable successor description (e.g. "use the `listConnectors` GraphQL operation"). The pre-request function looks up `current_setting('request.path')` against this table and sets `Deprecation` and (if present) `Sunset` headers. The same table serves as the source of truth for alert emails to communicate successor info to users.
+
+## Open Questions
+
+- **Pre-request table lookup performance.** The `deprecated_endpoints` table is the single source of truth for deprecation metadata — used by the pre-request header injection, alert emails, and potentially a GraphQL query for flowctl to enrich deprecation warnings. But the pre-request function runs on every PostgREST request, so we need to verify the per-request cost of the table lookup is negligible (the table will be tiny and should stay in the buffer cache, but we should confirm this).
+
+## Phases
+
+### P1: flowctl deprecation warnings
+
+flowctl learns to inspect responses from the control-plane API for `Deprecation` and `Sunset` headers and prints a human-readable warning on stderr, once per invocation, including the sunset date and successor information when present. We aren't setting either of these headers yet.
+
+The warning message distinguishes between two contexts. When the deprecated call originates from a built-in flowctl subcommand, the warning tells the user to update flowctl — the newer version already uses the successor endpoint. When it originates from a user-defined raw API call, the warning names the deprecated endpoint and its sunset date if known. Successor information (which endpoint or operation to use instead) becomes available once the deprecation table exists in P2 — flowctl can query it to enrich the warning.
+
+This phase also fixes a bug: flowctl already constructs a `flowctl-<version>` User-Agent and applies it to its agent-API HTTP client, but the PostgREST client never receives the header. As a result, every PostgREST call from flowctl currently arrives at the server with an empty UA.
+
+### P2: PostgREST deprecation signaling
+
+Build the `deprecated_endpoints` table and the PostgREST pre-request function that injects `Deprecation` (and eventually `Sunset`) headers based on it. Then use it to deprecate our first endpoints — likely `user_grants` and `role_grants` once the GraphQL operations that replace them ship as part of the user-management migration. An endpoint must not be marked deprecated until flowctl's own subcommands have migrated to the successor — otherwise the "update flowctl" advice in the deprecation warning would be wrong. After this phase we can actually begin deprecating PostgREST endpoints: customers running an updated flowctl see warnings (from P1), engineering uses Loki to see who's still calling a given endpoint, and we do manual customer outreach based on that visibility.
+
+This LogQL query shows who's calling specific endpoints, filtering out dashboard and Supabase JS traffic to isolate programmatic callers. Once the P1 UA fix has propagated, we can filter on `user_agent` directly instead of excluding known non-flowctl callers by referer and client info.
+
+```logql
+{service="edge_logs"}
+  | metadata_request_path =~ "/rest/v1/(user_grants|role_grants).*"
+  | metadata_request_method != "OPTIONS"
+  | metadata_request_headers_x_client_info !~ "supabase-js-web/.*"
+  | metadata_request_headers_referer !~ "https://dashboard\\.estuary\\.dev.*"
+  | line_format "{{.metadata_request_method}} {{.metadata_request_path}} {{.metadata_response_status_code}} sub={{.metadata_request_sb_jwt_authorization_payload_subject}} ua={{.metadata_request_headers_user_agent}}"
+```
+
+### P3: Automated customer email alerts
+
+_Speculative — details will firm up once P2 is in use ... and we have enough customers using flowctl to justify._ A new alert type on the existing alerting infrastructure sends periodic email alerts to tenants still calling deprecated endpoints. Alerts only fire once a sunset date is set — no sunset, no emails. As the sunset date approaches, alert frequency increases: roughly weekly at first, then every few days, then daily as the deadline nears.
+
+## Phase Dependencies
+
+```mermaid
+graph TD
+  P1[flowctl deprecation warnings, fix missing UA header]
+  P2[Send PostgREST deprecation headers]
+  P3[Automated customer email alerts]
+  P1 --> P2 --> P3
+```
diff --git a/plans/orthogonal-authz.md b/plans/orthogonal-authz.md
new file mode 100644
index 00000000000..c06f13affad
--- /dev/null
+++ b/plans/orthogonal-authz.md
@@ -0,0 +1,59 @@
+# Orthogonal Authz
+
+## Executive Summary
+
+Estuary's access control today is a tiered role model with only two tiers in practice: `read` for looking at data, and `admin` for everything else. That makes `admin` badly overloaded: platform engineers receive billing email alerts meant for finance, and the finance team has access to take down a production system.
+
+This plan refactors the role hierarchy into fine-grained, independent capabilities — most immediately to support dedicated **billing** and **user management** capabilities, so customers can delegate those responsibilities without handing out platform admin.
+
+## Technical Notes
+
+- **Capabilities are a flat set, not a hierarchy.** The five capabilities — `read`, `write`, `admin`, `billing`, `user_management` — don't imply each other. An admin grant does not grant `billing`, and (once the migration completes) does not grant `write` either; each capability is listed explicitly. This is the whole point of the refactor, and has downstream consequences — most notably for publish-target checks, see Phases below.
+
+  Once capabilities are orthogonal, the names `write` and `admin` start to feel vague — they were meaningful as tiers but don't describe a specific power on their own. A later migration phase renames and/or splits them (e.g. `write → publish`, or separating task control from catalog edits) once the shape and Postgrest retirement allow it.
+
+- **Capabilities inherit down the prefix tree.** A grant at `acmeCo/` applies to every descendant prefix — `acmeCo/sales/`, `acmeCo/sales/leads/`, and so on. A user's effective capabilities on a given prefix are the union of every grant at that prefix or any ancestor. This is how scoping already works for `read`/`write`/`admin`, and the new capabilities inherit the same way.
+
+  > The `billing` capability only really makes sense at the root prefix and will be inert on any subprefix; granting this capability on subprefixes will be inert. The UI can handle this as a special case.
+
+- **Role grants narrow capabilities, never widen them.** When a user reaches a prefix through a role grant, their effective capabilities are the intersection of what the user has and what the role grant allows. Neither side can escalate past the other:
+
+  | Alice's user grant on `acmeCo/` | `acmeCo/` role grant on `partner/shared/` | Alice's effective capabilities on `partner/shared/` |
+  | ------------------------------- | ----------------------------------------- | --------------------------------------------------- |
+  | `{read, write, billing}`        | `{read, write}`                           | `{read, write}` — `billing` is filtered out         |
+  | `{read}`                        | `{read, write}`                           | `{read}` — the role grant can't add `write`         |
+
+## Open Questions
+
+1. **Do we need a `traverse` capability to gate role-grant traversal?**
+
+   Today, only users with the `admin` role can traverse role grants at all. A read-only user on `acmeCo/` cannot follow a role grant from `acmeCo/` → `partner/shared/`.
+
+   The role grant rule as stated in Technical Notes would change this. Once capabilities are orthogonal and we drop the `admin`-required gate, any user whose capabilities intersect with a role grant's capabilities can traverse it. That means every existing read-only user would suddenly gain read access to every prefix reachable through existing role grants — a potentially large, silent expansion of access.
+
+   Should we add an explicit `traverse` capability to prevent this? With `traverse`, a user can only follow a role-grant edge if `traverse` appears on their user grant. `traverse` is a gate — it controls whether the user can enter the role grant at all, but it doesn't carry through to the effective capability set:
+
+   | User grant on `acmeCo/` | Role grant `acmeCo/` → `partner/shared/` | Effective capabilities on `partner/shared/` |
+   |---|---|---|
+   | `{read, write}` | `{read, write}` | none — no `traverse` on user grant |
+   | `{read, traverse}` | `{read, write}` | `{read}` — `traverse` lets her in, but `write` is filtered out because it wasn't on the user grant |
+
+   We could backfill and add `traverse` wherever there is already an `admin` grant so as not to change anyone's existing level of access.
+
+## Phases (still in progress)
+
+We will interleave these phases with other changes (service accounts, better user management, billing features) as needed.
+
+**Phase 1 — add the array, orthogonal capabilities only.** Introduce `capabilities capability[] NOT NULL DEFAULT '{}'` on `user_grants`. The existing `capability` enum stays authoritative for `read`/`write`/`admin`; the array only carries the new orthogonal capabilities (`billing`, `user_management`). Only the GraphQL/Rust path reads the array. This lets us gate `billing` and `user_management` features immediately without touching existing authz code paths.
+
+**Phase 2 — dual-write the tiered capabilities into the array.** The array becomes authoritative for the Rust/GraphQL authz layer for all five capabilities; the enum stays authoritative for RLS. A sync trigger keeps them coherent during the Postgrest sunset:
+
+- _New-path writes_ (GraphQL/Rust) set the array directly and project to the enum: `admin` if the array contains it, else `write`, else `read`. Orthogonal-only grants (e.g. `{billing}`) project to enum `read`, accepting a Postgrest read-leak within the prefix as Postgrest is sunsetting.
+- _Legacy-path writes_ (Postgrest/direct SQL) trigger a DB function that expands the enum to its tier capabilities (`admin → {read, write, admin}`, `write → {read, write}`, `read → {read}`) and merges them with any existing orthogonal capabilities on the row. A Postgrest write re-expresses only the tier portion; capabilities like `billing` are left untouched. Postgrest can't remove orthogonal capabilities, which is fine — they're only managed through the new path.
+- Add a `capabilities capability[]` column to `role_grants` (same as `user_grants`), backfill from the existing enum, and update role-grant traversal logic to compute intersections against the new array.
+- A one-shot backfill populates tier capabilities into the array for all existing rows using the same expansion.
+- If we decide to add the `traverse` capability, this backfill should also add `traverse` to every existing admin user and role grant, preserving today's behavior where admins can follow role-grant edges. Going forward, `traverse` is auto-bundled whenever an `admin` grant is created — the grant-expansion rule becomes `admin → {read, write, admin, traverse}`. A later phase of the user-management RFC will unbundle `traverse` from `admin` when the UI supports assigning capabilities individually.
+
+**Phase 3 — cutover.** Once Postgrest retires, drop the enum column on both tables, remove the sync trigger, and remove the projection logic. `CapabilitySet` becomes the only representation. The publish-target check becomes a plain flag-containment test for `write`; admin grants continue to satisfy it because the grant-expansion rule always stores `{read, write, admin}` on admin grants.
+
+**Phase 4 — rename and split the legacy tier names.** With Postgrest gone and `CapabilitySet` as the sole representation, the `write` and `admin` names can be replaced with capabilities that describe specific powers (e.g. `publish`, `manage`, or finer splits between task control and catalog edits). This is a pure rename/split inside the new model — a migration on `grant_capability` values, updates to the Rust `CapabilitySet` variants, and a sweep of the call sites. Sequenced last because it's disruptive to read without a forcing function, and only makes sense once nothing outside the new model speaks the old names.
diff --git a/plans/service-accounts.md b/plans/service-accounts.md
new file mode 100644
index 00000000000..46fdfba2c81
--- /dev/null
+++ b/plans/service-accounts.md
@@ -0,0 +1,556 @@
+# Service Accounts
+
+## Executive Summary
+
+CI/CD pipelines, AI agents, and other programmatic integrations need stable credentials
+that aren't tied to a human's `user_grants`.
+
+A **service account** is a non-human identity that authorized the same way as a human
+user — same `user_grants`, same resource access, same `user_roles()` resolution.
+
+A service account can authenticate in one of two ways:
+
+- **API key** — a long-lived credential we mint with a user-configurable lifetime
+  that counts down from creation and does NOT reset on each use. Like a refresh
+  token, it is exchanged via `generate_access_token` for a short-lived JWT; the
+  JWT is the bearer token used against PostgREST and the rest of the stack.
+- **OIDC** — the service account is configured to trust an external identity
+  provider (GitHub Actions, for example). The IdP signs a short-lived token
+  with its private key; we validate it against the issuer's JWKS and exchange
+  it for a short-lived Estuary access token. No long-lived secrets to manage.
+
+A service account can have multiple active API keys and OIDC configurations
+simultaneously, enabling zero-downtime rotation — mint a new key, update
+consumers, then revoke the old one.
+
+Any admin can create a service account, but can only grant it access to their own
+prefix or something more specific — an `acmeCo/` admin can create a service account
+scoped to `acmeCo/staging/`, but an `acmeCo/staging/` admin cannot grant access to
+`acmeCo/` or `acmeCo/prod/`.
+
+Both service accounts and their credentials are fully manageable from the dashboard UI
+and from `flowctl`, so admins can script provisioning (e.g., bootstrap a CI environment)
+or drive everything interactively from the browser.
+
+## Technical notes
+
+- **`auth.users` rows.** Service accounts are real Supabase users. All existing RLS policies,
+  PostgREST authorization, `user_roles()` resolution, and `role_grants` traversal work
+  unchanged. _This avoids putting the PostgREST-to-GraphQL migration on the critical
+  path to releasing the service account feature._
+
+- **`internal.service_accounts` table.** A new table keyed by `auth.users.id` that
+  holds service-account-specific metadata (owning prefix, display name, created-by,
+  disabled state). Its presence is also what distinguishes a non-human `auth.users`
+  row from a real person — code paths that assume "auth.users means a human"
+  (member lists, onboarding) filter by joining against this table.
+
+- **API keys get their own `internal.api_keys` table.**
+  API keys behave much like refresh tokens: long-lived credentials exchanged via
+  `generate_access_token` for a short-lived access token. An API key is reusable
+  until it's revoked or expires at a fixed date (no sliding window like a refresh token has).
+
+- **One `user_grants` row per service account.** Each SA has exactly one grant.
+  This makes ownership unambiguous: whoever admins that prefix, manages
+  the SA. Disabling the SA deletes the `user_grants` row and revokes all API keys and
+  OIDC trust policies, so every avenue of access is cut at once. `prefix` and
+  `capability` columns on `internal.service_accounts` mirror the original grant so this
+  information survives a deletion (and could be used to re-enabled the SA) — the UI
+  can still show the disabled SA with its intended scope. Access to multiple prefixes,
+  when needed, is modeled through `role_grants` rather than adding `user_grants`.
+
+## Phase dependency graph
+
+```mermaid
+flowchart TD
+    P1[Backend CRUD & Tokens]
+    P2[Management UI]
+    P5[flowctl Auth via API Key]
+    P3[OIDC Token Exchange]
+    P4[OIDC Configuration UI]
+    P6[flowctl Service Account & API Key Commands]
+    P7[flowctl OIDC Trust Policy Commands]
+
+    P1 --> P2
+    P1 --> P3
+    P1 --> P5
+    P1 --> P6
+    P3 --> P4
+    P3 --> P7
+    P6 --> P7
+
+    click P1 "https://github.com/estuary/flow/issues/2857"
+    click P2 "https://github.com/estuary/flow/issues/2861"
+    click P3 "https://github.com/estuary/flow/issues/2859"
+    click P4 "https://github.com/estuary/flow/issues/2863"
+    click P5 "https://github.com/estuary/flow/issues/2858"
+    click P6 "https://github.com/estuary/flow/issues/2860"
+    click P7 "https://github.com/estuary/flow/issues/2862"
+```
+
+Two independent branches off the root: the API key branch (P2, P5, P6) and the OIDC
+branch (P3, P4, P7). P7 is the only phase with two upstream dependencies (P3 for
+the backend it wraps, P6 for the subcommand tree it extends).
+
+## Phase 1 — Service Account CRUD & Tokens (Backend)
+
+Delivers the complete service account lifecycle via GraphQL. The mutations can
+be driven by any authenticated admin (GraphQL playground, curl, or flowctl using
+their existing human credentials). Once minted, an API key becomes a usable
+credential by POSTing it to `generate_access_token` to receive a short-lived JWT
+— that JWT is what actually goes in the `Authorization: Bearer` header. The raw
+`flow_sa_...` string is not itself a valid Bearer token; flowctl only handles the
+exchange transparently once Phase 5 lands.
+
+### Data Model
+
+**`internal.service_accounts`**:
+
+| Column         | Type                        | Notes                                                                                                                         |
+| -------------- | --------------------------- | ----------------------------------------------------------------------------------------------------------------------------- |
+| `user_id`      | UUID PK, FK → `auth.users`  | The service account's identity                                                                                                |
+| `prefix`       | `catalog_prefix` NOT NULL   | Owning prefix — set at creation, immutable. Used for authorization on all management operations (disable, key rotation, etc.) |
+| `capability`   | `grant_capability` NOT NULL |                                                                                                                               |
+| `display_name` | TEXT NOT NULL               | Human-readable label (e.g., "CI deploy bot")                                                                                  |
+| `created_by`   | UUID FK → `auth.users`      | Audit trail — which human created this                                                                                        |
+| `last_used_at` | TIMESTAMPTZ                 | Updated on each `generate_access_token()` call via any of this account's api keys                                             |
+| `disabled_at`  | TIMESTAMPTZ                 | Set by `disableServiceAccount` — account cannot authenticate and is shown as disabled in the UI                               |
+| `created_at`   | TIMESTAMPTZ                 |                                                                                                                               |
+| `updated_at`   | TIMESTAMPTZ                 |                                                                                                                               |
+
+The referenced row in `auth.users` has no password and no OAuth identity. Application-layer
+queries that need to distinguish service accounts from humans (member lists, onboarding)
+join against `internal.service_accounts`.
+
+**`internal.api_keys`** — credentials for service accounts (separate from human `refresh_tokens`):
+
+| Column               | Type                   | Notes                                                         |
+| -------------------- | ---------------------- | ------------------------------------------------------------- |
+| `id`                 | UUID PK                |                                                               |
+| `service_account_id` | UUID FK → `auth.users` | Which service account this key authenticates as               |
+| `secret_hash`        | TEXT NOT NULL          | Hashed secret (plaintext returned once at creation)           |
+| `label`              | TEXT NOT NULL          | Human-readable label (e.g., "GitHub Actions")                 |
+| `expires_at`         | TIMESTAMPTZ NOT NULL   | Hard cutoff — no sliding window                               |
+| `created_by`         | UUID FK → `auth.users` | Which human created this key                                  |
+| `last_used_at`       | TIMESTAMPTZ            | Updated on each `generate_access_token()` call using this key |
+| `created_at`         | TIMESTAMPTZ            |                                                               |
+
+**Token format:** API keys are prefixed `flow_sa_` followed by base64-encoded `{id}:{secret}`.
+The prefix routes server-side lookups to `internal.api_keys`, and helps users distinguish
+service account keys from personal refresh tokens.
+
+Update `generate_access_token()` to accept an API key input alongside the existing
+`{refresh_token_id, secret}` input. The two shapes are mutually exclusive and
+routed by which field is present:
+
+- **New input** — `{api_key: "flow_sa_..."}`: decode the base64 `{id}:{secret}`
+  payload, look up in `internal.api_keys`, verify `secret_hash`, check `expires_at`,
+  update `last_used_at`, mint a JWT with `service_account_id` as `sub`.
+- **Existing input** — `{refresh_token_id, secret}`: unchanged, routes through
+  `public.refresh_tokens` as today.
+
+Response shape is unchanged (`{access_token, refresh_token?}`). The optional
+`refresh_token` is never set for the api-key branch — API keys don't rotate;
+the same `flow_sa_...` string is reused until its `expires_at`.
+
+The additive input shape keeps the RPC fully backward compatible with existing
+flowctl clients.
+
+### GraphQL API
+
+**Mutations:** (requires admin capability on the prefix)
+
+- `createServiceAccount(prefix, capability, displayName)` → `{ userId, prefix, capability, displayName, createdAt }`
+  - Creates `auth.users` row + `internal.service_accounts` row + `user_grants` row
+    for `(user_id, prefix, capability)` as the only grant
+
+- `disableServiceAccount(userId)` → `Boolean`
+  - Deletes all `user_grants` (removes catalog access) and all `api_keys` (invalidates credentials)
+  - Sets `internal.service_accounts.disabled_at = now()`
+  - The `auth.users` row is intentionally preserved — several tables (`drafts`, `publications`,
+    `publication_specs`) reference it with default `NO ACTION` FK constraints, so
+    a hard delete would require cascading cleanup of audit history. Keeping the row is simpler and
+    preserves attribution.
+  - A disabled service account cannot authenticate (no api_keys, no grants) but remains
+    visible in the UI as disabled for auditability
+
+- `enableServiceAccount(userId)` → `Boolean`
+  - Clears `internal.service_accounts.disabled_at` and re-creates the `user_grants` row
+    for `(user_id, prefix, capability)` using the service account's original `prefix` and `capability`
+  - Does NOT restore previously revoked `api_keys` — the admin must mint new ones via
+    `createApiKey`
+
+- `createApiKey(userId, label, validFor)` → `{ keyId, secret }`
+  - Creates an `internal.api_keys` row for the target service account with `expires_at = now() + validFor`
+  - `validFor`: ISO 8601 duration (e.g., `P90D`, `P1Y`)
+  - Returns the `flow_sa_`-prefixed secret exactly once — never retrievable again
+
+- `revokeApiKey(keyId)` → `Boolean`
+  - Deletes the `api_keys` row
+
+**Queries:**
+
+- `serviceAccounts(after, first)` → paginated list
+  - Includes: `userId`, `displayName`, `prefix`, `capability`, `createdBy`, `createdAt`,
+    `apiKeys[]` (each with `keyId`, `label`, `createdAt`, `expiresAt` — secrets are
+    not retrievable since only `secret_hash` is stored)
+  - **Requires:** caller is admin (or read?) on `prefix`
+
+### Verification
+
+- [ ] Create service account with grant to `acmeCo/` → create API key → exchange for
+      access token via `generate_access_token()` → use JWT with PostgREST and GraphQL
+      → grants resolve normally through `user_roles()`
+- [ ] API key with elapsed `expires_at` → `generate_access_token()` rejects
+- [ ] Disable service account → all API keys revoked, grant removed, account marked disabled
+- [ ] Disabled service account can't create new API keys
+- [ ] Re-enabled service account has same access as before
+- [ ] Multiple active API keys on same service account → all work (rotation scenario)
+- [ ] `acmeCo/staging/` admin cannot manage a service account with prefix `acmeCo/prod/`
+- [ ] Service account does NOT appear in tenant member lists
+
+---
+
+## Phase 2 — Service Account Management UI (Frontend)
+
+Uses Phase 1 APIs with no backend changes.
+
+### Views
+
+**Service Accounts list** (under tenant admin area):
+
+- Table: display name, prefix, created by, created date, API key count
+- Scoped to the admin's current tenant prefix
+- "Create Service Account" action
+
+**Service Account detail:**
+
+- Display name (editable)
+- Prefix (read-only, set at creation)
+- API keys list: label, created date, expires date, status (active / expired)
+- "Create API Key" action with copy-once secret display + configurable lifetime
+- "Revoke API Key" action per key
+- "Disable Service Account" action with confirmation
+
+### API Key Creation UX
+
+1. Admin clicks "Create API Key"
+2. Enters label (e.g., "GitHub Actions") and lifetime (dropdown: 90 days, 180 days, 1 year, custom)
+3. System returns the `flow_sa_`-prefixed secret — the admin configures it in their
+   CI system (or flowctl, per Phase 5), which exchanges it for a short-lived access
+   token on each run
+4. Secret is displayed once in a copy-able field with a warning that it won't be shown again
+5. Admin copies and configures their CI/CD system
+
+### Verification
+
+- Create service account, create API key, copy secret → use in API call → works
+- Secret display disappears on navigation — not retrievable
+- Expired API keys show visual indicator
+- Disable service account → shown as disabled in list
+
+---
+
+## Phase 3 — OIDC Token Exchange (Backend)
+
+Enables external identity providers (GitHub Actions, GitLab CI, cloud provider
+workload identity) to exchange their OIDC tokens for short-lived Estuary access
+tokens. Eliminates long-lived secrets in CI/CD.
+
+### Design
+
+Implements a token exchange endpoint inspired by RFC 8693. An external OIDC token
+is exchanged for a short-lived JWT scoped to a specific service account's grants.
+
+**`internal.oidc_trust_policies`** — maps external identities to service accounts:
+
+| Column               | Type                   | Notes                                                                 |
+| -------------------- | ---------------------- | --------------------------------------------------------------------- |
+| `id`                 | flowid PK              |                                                                       |
+| `service_account_id` | UUID FK → `auth.users` | Which service account to act as                                       |
+| `issuer`             | TEXT NOT NULL          | OIDC issuer URL (e.g., `https://token.actions.githubusercontent.com`) |
+| `subject_pattern`    | TEXT NOT NULL          | Regex or exact match on `sub` claim                                   |
+| `claims_filter`      | JSONB                  | Additional claim constraints (e.g., `{"repository": "estuary/flow"}`) |
+| `max_token_lifetime` | INTERVAL               | Upper bound on issued token lifetime                                  |
+| `created_by`         | UUID FK → `auth.users` |                                                                       |
+| `created_at`         | TIMESTAMPTZ            |                                                                       |
+
+**Endpoint:** `POST /auth/v1/token-exchange` (or GraphQL mutation)
+
+```
+{
+  "grant_type": "urn:ietf:params:oauth:grant-type:token-exchange",
+  "subject_token": "<external OIDC JWT>",
+  "subject_token_type": "urn:ietf:params:oauth:token-type:jwt",
+  "requested_token_type": "urn:ietf:params:oauth:token-type:access_token"
+}
+```
+
+**Flow:**
+
+1. Decode external JWT, extract `iss` and `sub`
+2. Fetch issuer's JWKS (cached) and verify signature
+3. Match against `oidc_trust_policies` — `issuer` + `subject_pattern` + `claims_filter`
+4. Mint a short-lived JWT with the matched service account's `user_id` as `sub`
+5. Return access token (no refresh token — short-lived only)
+
+**Authorization for trust policy management:** caller must be admin on the
+linked service account's `prefix`.
+
+### Verification
+
+- Configure GitHub Actions trust policy → run workflow → token exchange succeeds
+  → API calls work with service account's grants
+- Mismatched subject or claims → rejected
+- Expired external token → rejected
+- Token lifetime respects `max_token_lifetime`
+
+---
+
+## Phase 4 — OIDC Configuration UI (Frontend)
+
+Extends service account detail view with trust policy management, using Phase 3 APIs.
+
+### UX
+
+- "OIDC Trust Policies" section on service account detail
+- "Add Trust Policy" with guided setup per provider:
+  - GitHub Actions: repo name → auto-fills issuer + subject pattern
+  - GitLab CI: project path → auto-fills
+  - Custom: manual issuer URL + subject pattern + optional claims
+- Display active policies with issuer, subject pattern, and last-used timestamp
+
+### Verification
+
+- Create trust policy through UI → GitHub Actions workflow authenticates successfully
+- Delete trust policy → subsequent workflow runs fail authentication
+- Provider-specific templates pre-fill correct values
+
+---
+
+## Phase 5 — flowctl Authenticates as a Service Account
+
+Lets CI jobs run flowctl using a service account API key instead of a
+human's refresh token. Depends only on the `generate_access_token` RPC
+change from Phase 1. Ships independently of Phase 6.
+
+Today flowctl picks up a long-lived credential from the `FLOW_AUTH_TOKEN`
+env var ([config.rs:162-173](crates/flowctl/src/config.rs#L162-L173)),
+expecting a base64-encoded JSON refresh token `{id, secret}`. CI jobs set
+this to a refresh token tied to a human user — awkward because the
+credential is personal, uses a sliding expiry (never forced to rotate),
+and breaks if the human's account is deleted.
+
+After Phase 5, the same env var also accepts a `flow_sa_...` API key:
+
+```yaml
+- run: flowctl catalog publish --specs ./catalog.yaml
+  env:
+    FLOW_AUTH_TOKEN: ${{ secrets.ESTUARY_API_KEY }} # flow_sa_... string
+```
+
+**Detection is by prefix.** In [config.rs](crates/flowctl/src/config.rs),
+the `FLOW_AUTH_TOKEN` handler branches on whether the value starts with
+`flow_sa_`. The existing base64-JSON path is untouched — base64's alphabet
+doesn't include `_`, so the two formats can't collide, and existing CI
+jobs that feed a refresh token through this env var keep working unchanged.
+
+### Config changes
+
+- Add `user_api_key: Option<String>` to `Config` alongside `user_refresh_token`.
+  Mutually exclusive in practice — whichever is set drives credential refresh.
+- Load path: if `FLOW_AUTH_TOKEN` starts with `flow_sa_`, populate `user_api_key`;
+  otherwise decode as today into `user_refresh_token`.
+- No config-file migration needed. Existing `~/.config/flowctl/*.json` files
+  that contain `user_refresh_token` are unaffected.
+
+### `refresh_authorizations` changes
+
+[flow-client/src/client.rs:462](crates/flow-client/src/client.rs#L462):
+
+- New branch: if `user_api_key` is set, call `generate_access_token` with the
+  new `{api_key}` input (Phase 1), get back an access token, return the api
+  key unchanged. Skip the "auto-create a refresh token" branch — the API key
+  IS the long-lived credential.
+- Existing refresh-token branches are unchanged.
+
+### New auth command
+
+`flowctl auth api-key --token flow_sa_...` stores the key into the config
+file for interactive workstation use. The existing `flowctl auth token`
+remains for short-lived access tokens and for the interactive login flow.
+
+### Verification
+
+- [ ] Existing CI job using a base64-JSON refresh token in `FLOW_AUTH_TOKEN`
+      continues to work with no changes
+- [ ] `FLOW_AUTH_TOKEN=flow_sa_...` set in env → flowctl runs as the service
+      account, grants resolve through `user_roles()` to the service account's
+      prefix
+- [ ] Expired API key in `FLOW_AUTH_TOKEN` → flowctl fails with a clear error
+      (no attempt to refresh, since API keys don't rotate)
+- [ ] Access token nearing expiry is re-minted from the API key on the next
+      request; the API key itself is not rotated or re-written anywhere
+- [ ] `flowctl auth api-key --token flow_sa_...` persists to config and
+      subsequent invocations authenticate correctly
+
+---
+
+## Phase 6 — flowctl Service Account & API Key Commands
+
+Adds a `flowctl service-accounts` subcommand tree covering service account
+lifecycle and API key management. Depends only on Phase 1. Ships
+independently of Phase 5; until then, admins drive provisioning from the
+dashboard (Phase 2) or exchange the secret returned by `api-keys create` via
+`generate_access_token` (the same flow a refresh token uses today) to obtain
+a short-lived JWT for ad-hoc `Authorization: Bearer` use.
+
+Lives at [crates/flowctl/src/service_accounts/](crates/flowctl/src/service_accounts/)
+alongside existing top-level subcommands like `catalog`, `draft`, and `auth`.
+
+**Pattern to follow:** [crates/flowctl/src/alert_subscriptions/](crates/flowctl/src/alert_subscriptions/)
+is the closest shape — full CRUD (`list-query.graphql`, `create-mutation.graphql`,
+`update-mutation.graphql`, `delete-mutation.graphql`) wired through
+`graphql_client::GraphQLQuery` derives and the shared `post_graphql<Q>()` helper
+in [crates/flowctl/src/graphql.rs](crates/flowctl/src/graphql.rs). The module
+header comment in `graphql.rs` documents the derive attributes (`extern_enums`,
+`response_derives`, scalar mapping) that these queries will need.
+
+### Command surface
+
+```
+flowctl service-accounts list [--prefix <prefix>]
+flowctl service-accounts create --prefix <prefix> --capability <read|write|admin> --name <display-name>
+flowctl service-accounts disable <user-id>
+flowctl service-accounts enable <user-id>
+
+flowctl service-accounts api-keys list <user-id>
+flowctl service-accounts api-keys create <user-id> --label <label> --valid-for <duration>
+flowctl service-accounts api-keys revoke <key-id>
+```
+
+`--valid-for` accepts an ISO 8601 duration (`P90D`, `P1Y`) to match the
+GraphQL input type.
+
+### Output ergonomics
+
+- Default output follows the existing `--output` convention (`table` | `yaml` |
+  `json`) from [crates/flowctl/src/output.rs](crates/flowctl/src/output.rs) so
+  `list` commands are human-readable by default and scriptable with `-o json`.
+- `api-keys create` prints the `flow_sa_`-prefixed secret to stdout **once**, with
+  the rest of the row metadata on stderr. This lets CI scripts do:
+  ```bash
+  SECRET=$(flowctl service-accounts api-keys create $ID --label ci --valid-for P90D -o json | jq -r .secret)
+  ```
+  without secrets leaking into captured stderr logs or `set -x` traces.
+- A prominent warning reminds the caller that the secret is not retrievable.
+
+### Scripting example
+
+Documented in the command's long help:
+
+```bash
+# Bootstrap a CI service account end-to-end
+USER_ID=$(flowctl service-accounts create \
+  --prefix acmeCo/ci/ --capability admin --name "GitHub Actions" \
+  -o json | jq -r .userId)
+
+flowctl service-accounts api-keys create "$USER_ID" \
+  --label "github-actions-main" --valid-for P90D \
+  -o json | jq -r .secret > ci-token.txt
+```
+
+### Verification
+
+- [ ] `create` → `api-keys create` → use returned secret as `FLOW_AUTH_TOKEN`
+      → `flowctl catalog list` succeeds (end-to-end CLI round trip; requires
+      Phase 5 for the `FLOW_AUTH_TOKEN` consumption side, or alternatively
+      exchange the secret via `generate_access_token` and use the resulting
+      JWT as the `Authorization: Bearer` header)
+- [ ] `disable` removes grants and API keys; subsequent `api-keys create` fails
+- [ ] `list` respects the caller's admin scope — a `acmeCo/staging/` admin does
+      not see `acmeCo/prod/` service accounts
+- [ ] `-o json` output is stable and parseable for all list/create commands
+- [ ] `api-keys create` secret goes to stdout only; stderr contains no secret
+      material even under `flowctl --log-level debug`
+
+---
+
+## Phase 7 — flowctl OIDC Trust Policy Commands
+
+Extends the `flowctl service-accounts` tree with an `oidc` subcommand group
+for managing trust policies from the CLI. Depends on Phase 3 (backend) and
+Phase 6 (subcommand structure to hang off of). Ships independently of
+Phase 4; until then, admins manage trust policies via Phase 3's GraphQL
+or via the future Phase 4 UI.
+
+### Command surface
+
+```
+flowctl service-accounts oidc list <user-id>
+flowctl service-accounts oidc create <user-id> --issuer <url> --subject <pattern> [--claim key=value]... [--max-lifetime <duration>]
+flowctl service-accounts oidc delete <policy-id>
+```
+
+`--max-lifetime` accepts an ISO 8601 duration to match the GraphQL input type.
+
+Output ergonomics follow Phase 6's conventions (`--output` flag, `-o json`
+scriptability).
+
+### Verification
+
+- [ ] `oidc create` with a GitHub Actions issuer → workflow token-exchange
+      succeeds (same check as Phase 3, driven from the CLI)
+- [ ] `oidc list` shows the configured policies with issuer, subject pattern,
+      and last-used timestamp
+- [ ] `oidc delete` removes the policy; subsequent token-exchange attempts
+      matching that policy are rejected
+- [ ] Attempting to manage a trust policy for a service account the caller
+      doesn't have admin rights on is rejected
+
+---
+
+## Migration & Compatibility Notes
+
+### auth.users integration
+
+Service accounts are `auth.users` rows. GoTrue won't interact with them (no
+password, no OAuth), but they occupy the same ID space. Code paths that assume
+`auth.users` means "a human" — tenant member lists, onboarding flows — need to
+filter service accounts out by joining against `internal.service_accounts`.
+
+### PostgREST compatibility
+
+Because service accounts are `auth.users` with standard `user_grants`, the existing
+`generate_access_token()` → JWT → PostgREST flow works unchanged. The JWT `sub`
+is the service account's UUID, and `user_roles()` resolves its grants normally.
+
+### flowctl compatibility
+
+flowctl changes are split across Phases 5, 6, and 7 and are fully backward compatible:
+
+- **Phase 5** (auth): existing config files (`~/.config/flowctl/*.json` with
+  `user_refresh_token`) are unaffected — the new api-key path adds a sibling
+  `user_api_key` field without touching the refresh-token one. `FLOW_AUTH_TOKEN`
+  gains prefix-based dispatch: `flow_sa_...` → api key, anything else → existing
+  base64-JSON refresh-token path. Base64's alphabet excludes `_`, so the two
+  formats can't collide and existing CI jobs are undisturbed.
+- **Phase 1** (backend): the `generate_access_token` RPC grows a new input
+  shape (`{api_key}`) alongside the existing `{refresh_token_id, secret}` —
+  older flowctl versions using the old shape continue to work unchanged.
+- **Phase 6** (management): adds a new `flowctl service-accounts` subcommand
+  tree; no changes to existing commands.
+- **Phase 7** (OIDC CLI): adds an `oidc` subcommand group under
+  `flowctl service-accounts`; no changes to existing commands.
+
+### Data plane auth
+
+Data plane HMAC auth is completely separate. Service accounts that need data plane
+access will get it through the normal task activation flow (their grants authorize
+catalog operations, which produce data plane tokens as a side effect).
+
+### Credential expiry semantics
+
+Human refresh tokens (`public.refresh_tokens`): sliding window (`updated_at + valid_for`, resets on use).
+Service account API keys (`internal.api_keys`): fixed window (`expires_at`, set once at creation, never resets).
+These live in separate tables — no overloaded semantics.
diff --git a/plans/support-role.md b/plans/support-role.md
new file mode 100644
index 00000000000..4cbfce67126
--- /dev/null
+++ b/plans/support-role.md
@@ -0,0 +1,293 @@
+# Customer-Controlled Support Access
+
+## Executive Summary
+
+Today every tenant is automatically granted `estuary_support/` at creation via a
+Postgres trigger. Every Estuary engineer who inherits that role has standing
+admin on every tenant, always. This predates our compliance posture and doesn't
+hold up against SOC 2 audit expectations, the "access under direct control of
+Customer" language in our MSA, or the GDPR requirement that support processing
+be purpose- and time-bounded.
+
+This plan replaces the standing grant with a **customer-controlled support
+session**: each tenant has a hidden support service account scoped to its own
+prefix, and the customer opens a time-bounded window during which Estuary
+engineers can authenticate as that SA via OIDC federation to Google. When the
+window closes (by expiry or explicit revocation), access ends. Estuary-side
+membership in the support group is managed entirely through Google Workspace,
+so rotations and offboarding propagate immediately across every tenant with
+no Estuary-side change.
+
+This is primarily composition of existing work: orthogonal-authz supplies the
+narrow `support` capability, the service-accounts plan supplies the identity
+and OIDC trust-policy machinery, and Google Workspace supplies the engineer
+identity and group gate. The new surface added here is small — the tenant-scoped
+hidden SA, the customer-facing Support tab, and the `flowctl support` subcommand.
+
+## Desired Outcome
+
+- **Customers hold the lever.** Opening and closing support windows is a
+  customer action in their admin UI. We cannot self-grant access to a tenant.
+- **The grant model stays simple.** No `expires_at` column on `user_grants` and
+  no new "expiring grant" primitive. Time-boundedness is enforced at the OIDC
+  trust policy and at the issued access token — the grant itself is a permanent
+  fact about the hidden SA, gated by whether a valid credential can be obtained.
+- **Support SA is never usable without a customer-opened window.** By default,
+  the trust policy is disabled; no token exchange succeeds.
+- **Access revocation at Estuary is a Google Workspace operation.** Removing
+  an engineer from the support group instantly cuts their access across every
+  tenant, with no per-tenant cleanup.
+- **Audit trail is two-sided.** The customer sees "support SA performed X" in
+  their publication history. Estuary's internal logs record which Google
+  identity exchanged a token and for which tenant, so per-engineer attribution
+  is recoverable internally.
+
+## Technical Notes
+
+**The support SA lives under the customer's prefix, not under `estuary_support/`.**
+Every tenant gets a hidden SA at creation, named something like `acmeCo/.support`
+(or tracked via a flag on `internal.service_accounts` rather than a naming
+convention — see Open Questions). Its `user_grants` row carries the eventual
+`support` capability (see "Dependency on orthogonal-authz" below) on the tenant
+prefix. The customer is the admin of their own tenant, which means the customer
+— not Estuary — owns the SA and its trust policies. This is what makes the
+MSA language ("access under direct control of Customer") structurally true
+rather than just a claim.
+
+**The `estuary_support/` role stops being how engineers access customer data.**
+Once this lands, the current blanket trigger and the standing per-tenant grants
+go away. `estuary_support/` remains as a role for internal Estuary operations
+that don't involve customer tenants (billing tooling, internal ops views), but
+it no longer has transitive admin over every tenant. The migration path is
+covered in Phase 5.
+
+**OIDC trust policy uses Google Workspace as the issuer.** The policy matches
+`iss = https://accounts.google.com` (or the workspace-specific issuer we end up
+using) plus a `claims_filter` requiring membership in the `support@estuary.dev`
+Google group. Google's standard ID tokens don't include group membership by
+default; we resolve this server-side at token-exchange time by calling the
+Cloud Identity Groups API for the authenticated `sub`, rather than requiring
+Google to emit groups in the ID token itself. From the customer's point of
+view the trust policy is still "Google, support group" — the lookup is an
+implementation detail of token exchange.
+
+**Windows are expressed as trust policy state, not grant state.** Opening a
+support window = enabling the trust policy and setting its `max_token_lifetime`
+and overall expiry. Closing the window = disabling or deleting the policy.
+Tokens already issued are short-lived (configurable, default ≤1h) and die on
+their own. The longest any engineer can retain access after a window closes
+is the remaining lifetime of their last-issued access token.
+
+**Engineer UX is `flowctl` first, dashboard later.** `flowctl support begin
+<tenant>` performs the Google OIDC → token exchange dance and writes a
+short-lived access token into a session-scoped slot (not the normal refresh
+token slot). `flowctl support end` clears it. The dashboard mode-switch story
+is deferred to a follow-up — most support work already happens via flowctl,
+and shipping flowctl-only keeps Phase 2 small.
+
+## Dependency on orthogonal-authz
+
+This plan assumes `support` (or equivalent narrow capability — `diagnose`, or
+a split of `admin` that excludes `publish`, `billing`, `user_management`)
+exists as a capability in the `user_grants.capabilities` array introduced by
+orthogonal-authz Phase 1. Until that lands, the hidden SA can be created with
+`capability = admin` as a bootstrap, with the understanding that narrowing it
+is a follow-up migration once orthogonal-authz is further along.
+
+This is the right sequencing because the compliance case for customer-opened
+windows is strongest when the capability is actually narrow. "We granted you
+read and restart access for 24 hours" is defensible; "we granted you full
+admin for 24 hours" is a weaker story even if the window is short.
+
+## Dependency on service-accounts
+
+This plan consumes, unchanged:
+
+- **Phase 1 (service account CRUD & tokens)** for the SA data model and
+  lifecycle mutations. The hidden support SA is created via the same
+  `createServiceAccount` path, with an additional flag.
+- **Phase 3 (OIDC token exchange)** for the trust policy data model, the
+  `/auth/v1/token-exchange` endpoint, issuer verification, and claims
+  filtering. The only new shape is the Google-groups-via-Cloud-Identity-API
+  resolution, which is a token-exchange-time implementation detail.
+
+Phases 2, 4, 5, 6, 7 of the service-accounts plan are unrelated to this
+feature and can ship independently.
+
+## Open Questions
+
+1. **How do we hide the support SA from the normal list?** Two candidates:
+   (a) a `kind: support` or `hidden: true` flag on `internal.service_accounts`
+   that the default `serviceAccounts` query filters out; or (b) a reserved
+   naming convention (`<tenant>/.support`). A flag is cleaner and doesn't
+   bake assumptions into the prefix namespace.
+
+2. **Default support window durations and maximum.** 24h / 7d / 30d as the
+   user-facing options, with what hard ceiling? Mike's June 2025 writeup
+   floated 30 days as an "introductory support package" framing; worth
+   confirming that matches current compliance thinking.
+
+3. **How is consent language captured?** Per the MSA and GDPR Art. 5/9
+   discussions, opening a window needs a recorded acknowledgement that the
+   customer is authorizing Estuary access for a stated purpose. Is this a
+   free-text "reason" field on the window, a structured dropdown, or just
+   implicit in the click? A free-text reason stored on the trust policy
+   (or on a separate `support_sessions` row — see Q5) is probably the right
+   compliance default.
+
+4. **Per-session approval above the group gate.** Google group membership
+   answers "who at Estuary _could_ enter a support session." Some customers
+   may want "who at Estuary _did_, and was it approved for this specific
+   case." A future ticket-integration could write a session ID into the
+   trust policy's `claims_filter` so only a specific engineer can redeem it
+   for a specific window, but this is explicitly out of scope for v1.
+   Confirm with Mike that v1's "group + window" shape is sufficient.
+
+5. **Support session history as a first-class record.** Do we need a
+   `support_sessions` table that logs each window (who opened it when, when
+   it closed, which engineers authenticated against it, what reason was
+   given), or is reconstructing this from trust-policy lifecycle events +
+   token-exchange logs good enough? A dedicated table is more auditable and
+   gives the customer a clean "Support History" view; it's also more to
+   build. Probably worth deferring to a follow-up once the base feature is
+   in customers' hands.
+
+6. **Google Workspace OIDC specifics.** We need to confirm the exact issuer
+   URL, the Cloud Identity Groups API auth model for our token-exchange
+   service, and whether we want to cache group membership (and for how
+   long — too long and revocation is delayed; too short and we hammer the
+   API on every exchange). A 5-minute cache is probably a reasonable
+   default but worth picking deliberately.
+
+7. **Existing `estuary_support/` grants — do they go away all at once, or
+   gradually?** See Phase 5; the migration is the riskiest part of this
+   plan and deserves its own sequencing decision.
+
+## Phases
+
+### P1: Hidden per-tenant support SA
+
+Extend the tenant-creation path to also create a hidden service account
+scoped to the new tenant's prefix, carrying the (eventual) `support`
+capability. Add the flag to `internal.service_accounts` that marks it as a
+support SA and hides it from the default `serviceAccounts` query (resolving
+Open Question 1). Backfill the flag for existing tenants by creating the
+hidden SA for each one.
+
+On its own this phase is inert — the SA exists but has no trust policies,
+so no one can authenticate as it. The `estuary_support/` standing grant is
+untouched. Nothing visible to customers or engineers yet.
+
+This phase depends on service-accounts P1 having landed, since it reuses
+the SA creation mutation.
+
+### P2: Google OIDC trust-policy resolution
+
+Extend the token-exchange endpoint from service-accounts P3 to resolve
+Google group membership via the Cloud Identity Groups API at exchange time,
+rather than relying on group claims being present in the ID token. The
+incoming token supplies `iss` and `sub`; the backend calls the Groups API
+to check whether that `sub` is a member of any groups named in the trust
+policy's `claims_filter`.
+
+A short in-memory cache (5m default) keyed on `(sub, group)` avoids
+hammering the Groups API on every exchange. Cache TTL is configurable and
+short enough that offboarding (removal from the Google group) propagates
+quickly — the longest stale-access window is the cache TTL plus the issued
+access token's remaining lifetime.
+
+After this phase, trust policies with Google-group `claims_filter`s work
+end-to-end, but no customer has one yet.
+
+### P3: Customer-facing Support tab
+
+Add a "Support" tab to the customer admin UI. It's presented as a single
+toggle — "Open support window" — plus a duration dropdown and a required
+free-text reason field (resolving Open Question 3, pending confirmation
+on its exact shape).
+
+Opening a window creates or enables the trust policy on the hidden SA,
+with `claims_filter = {"groups": ["support@estuary.dev"]}` and the
+selected duration as `max_token_lifetime` (and on the policy row as its
+overall expiry). Closing the window disables or deletes the policy.
+The tab also shows the currently-active window with its expiry, and a
+history of past windows (read from trust-policy lifecycle events; a
+dedicated `support_sessions` table is deferred per Open Question 5).
+
+After this phase, a customer can open a window and the hidden SA's trust
+policy activates — but engineers don't have a tool that drives the OIDC
+exchange yet, so the window is real but unused.
+
+### P4: `flowctl support` subcommands
+
+Add `flowctl support begin <tenant>` and `flowctl support end`. `begin`
+obtains a Google ID token for the authenticated user, posts it to the
+token-exchange endpoint along with the target tenant, and on success
+writes the resulting short-lived access token into a session-scoped slot
+that `refresh_authorizations` prefers over the normal refresh token while
+active. `end` clears that slot.
+
+Error handling is explicit: if the caller is not in `support@estuary.dev`,
+the exchange returns a clear "not authorized for support access" error;
+if the tenant has no open window, it returns "no support window open for
+`<tenant>`." Neither falls back to other credentials silently.
+
+After this phase, the end-to-end feature works: customer opens a window,
+engineer runs `flowctl support begin AcmeCo/`, operates for the duration
+of the access token, runs `flowctl support end` (or lets the token
+expire), and customer closes the window.
+
+### P5: Retire the `estuary_support/` standing grant
+
+The migration phase. Steps, in order:
+
+1. Announce the change internally with a cutover date. Surface the new
+   `flowctl support` flow in on-call runbooks.
+2. Remove the Postgres trigger that auto-grants `estuary_support/` on
+   tenant creation. New tenants stop getting the standing grant
+   immediately.
+3. For existing tenants, remove the `estuary_support/` role grant in
+   batches, starting with tenants where we've already seen at least one
+   support session go through the new flow successfully (proving the
+   new path works for that customer's shape). End with a sweep of the
+   remainder.
+4. Leave `estuary_support/` as a role for internal operations that don't
+   touch customer tenants.
+
+Before each batch removal, confirm there's no pending support work on
+those tenants that would be disrupted by losing standing access. Expect
+some operational friction during this transition — runbooks that assumed
+"just look at the customer's tenant" need to become "open a window first."
+
+This is the phase where most of the risk of this plan lives; every prior
+phase is additive and reversible.
+
+## Phase Dependencies
+
+```mermaid
+graph TD
+  SA_P1[Service Accounts P1]
+  SA_P3[Service Accounts P3]
+  OAZ[Orthogonal-Authz P1]
+
+  P1[P1: Hidden per-tenant support SA]
+  P2[P2: Google OIDC trust-policy resolution]
+  P3[P3: Customer-facing Support tab]
+  P4[P4: flowctl support subcommands]
+  P5[P5: Retire estuary_support/ standing grant]
+
+  SA_P1 --> P1
+  OAZ --> P1
+  SA_P3 --> P2
+  P1 --> P3
+  P2 --> P3
+  P3 --> P4
+  P4 --> P5
+```
+
+P1 and P2 are independent and can land in either order. P3 depends on both
+(the SA to attach policies to, and the resolution path for Google groups).
+P4 depends on P3 because there's nothing to authenticate against until a
+customer can open a window. P5 is last because it removes the fallback —
+until it lands, `estuary_support/` still works and any issue with the new
+flow is recoverable by reverting to standing access.
diff --git a/plans/user-management.md b/plans/user-management.md
new file mode 100644
index 00000000000..d507cdf22e4
--- /dev/null
+++ b/plans/user-management.md
@@ -0,0 +1,201 @@
+# User Management
+
+consider these before finalizing
+https://github.com/estuary/flow/issues/1928
+https://github.com/estuary/ui/issues/1457
+
+## Executive Summary
+
+Our existing user-management UX is non-standard and sometimes confusing for new users. The organization membership page shows raw grants rather than people — a single user appears multiple times if they have multiple grants (wait, what's a grant?). Invites are anonymous copy-paste links with no expiration and can be used by anyone.
+
+When we're finished with this work, admins will invite users by email — each invite is locked to its recipient, has a limited lifetime, and an observable status (pending, accepted, expired). Admins will be able to refresh and resend invites. A tenant-members view will show every person with access in one place: name, email, login method, last sign-in, and the grants they hold. Admins will add and remove grants directly on existing users from that view (instead of using invite links as they do today). flowctl's `auth roles` commands will call the same GraphQL mutations the dashboard uses, so CLI and UI enforce identical rules.
+
+> Note that what it means to be an "admin" will change with a separate effort around auth capabilities. Assume "admin" is simply someone authorized to make changes related to users and what access/capabilities they have.
+
+## Technical Notes
+
+**No upsert on grant changes.** `user_grants` has a unique constraint on `(user_id, object_role)`, so changing a capability requires removing the old grant then adding the new one. flowctl preserves upsert behavior during a deprecation window by detecting the conflict and issuing a revoke-then-add, with a warning that future releases will require the explicit two-step.
+
+## Open Questions
+
+- **Snapshot delay on invite redemption:** The auth snapshot refreshes on a 20s–5min cadence, so a newly inserted `user_grants` row isn't visible to the invitee for up to 30s after they redeem. Is this acceptable, or do we need a mechanism to force a snapshot refresh on redemption?
+
+## Phase Dependencies
+
+```mermaid
+graph TD
+  P1[Email-Based Invites with Expiration]
+  P2[Retire Anonymous Invite Links]
+  P3[Tenant Members View]
+  P4[Grant Management]
+
+  P1 --> P2
+  P3 --> P4
+```
+
+Two independent branches: the invite branch (P1 → P2) and the members branch (P3 → P4). P1 and P3 can start in parallel.
+
+---
+
+## Phase 1 — Email-Based Invites with Expiration
+
+New invites are email-addressed with a limited lifetime. Existing anonymous and multi-use invites get a two-week expiration and remain visible in the UI so admins can revoke them early if they want, but the "add" button for anonymous invites is disabled.
+
+### Pre-work: Extract Email Infrastructure
+
+Before adding invite emails, extract the generic layers from alert-specific modules:
+
+1. Move `Sender`/`ResendSender`/`EmailSender` out of `agent::alerts::notifier` into `agent::email`. No logic changes.
+2. Extract the HTML wrapper template from `crates/notifications/` so non-alert emails can share branding.
+3. Pass `Sender` to both `AlertNotifications` and the new invite email code in `main.rs`.
+
+### Schema Changes
+
+Alter `internal.invite_links`:
+
+| Column         | Type                   | Notes                                      |
+| -------------- | ---------------------- | ------------------------------------------ |
+| `email`        | TEXT                   | Nullable — existing anonymous invites kept |
+| `last_sent_at` | TIMESTAMPTZ            | Updated on resend                          |
+| `redeemed_by`  | UUID FK → `auth.users` | Who accepted the invite                    |
+| `redeemed_at`  | TIMESTAMPTZ            | When it was accepted                       |
+| `expires_at`   | TIMESTAMPTZ NOT NULL   | Default `now() + interval '7 days'`        |
+
+Existing anonymous invites backfilled with `expires_at = now() + interval '14 days'`. New email-based invites default to 7 days.
+
+### GraphQL Changes
+
+- `createInviteLink` gains required `email` and optional `expiresInDays` (default 7). The invite is locked to that recipient and the email is sent via Resend. Rejects if email already has `user_grants` on the target prefix.
+- `redeemInviteLink` rejects expired and already-redeemed invites. When `email` is set, validates the authenticated user's email matches (case-insensitive). Sets `redeemed_by`/`redeemed_at` instead of deleting the row.
+- New `resendInvite(token)` mutation re-sends email, resets `expires_at`, and bumps `last_sent_at`.
+- `InviteLink` type gains: `email`, `redeemedBy`, `redeemedAt`, `expiresAt`, `lastSentAt`, `status` (PENDING | ACCEPTED | EXPIRED — computed from the columns)
+- `inviteLinks` query gains `status` filter
+
+### Frontend
+
+- Create invite dialog requires email address and has "expires in X days" field
+- Existing anonymous/multi-use invites remain visible with status badges, expiration countdown, and a revoke button — but the "add" button for anonymous invites is disabled
+- Invite list shows recipient email, status badge, expiration date, redeemed-by user
+- "Resend" button on pending email invites
+- Surfaces the "already a member" error from the pre-flight check
+
+### Verification
+
+- [ ] Create invite → email sent, invite locked to recipient, status is PENDING, `expires_at` is 7 days out
+- [ ] Redeem invite → `user_grants` row inserted, status is ACCEPTED, `redeemed_by`/`redeemed_at` set, row not deleted
+- [ ] Wrong email redeems invite → rejected
+- [ ] Let invite expire → status is EXPIRED, redemption rejected
+- [ ] Redeem already-accepted invite → rejected
+- [ ] Existing anonymous/multi-use link still redeemable until its expiration
+- [ ] Existing anonymous invites show `expires_at` ~14 days from migration
+- [ ] Admin revokes anonymous invite before expiration → invite no longer redeemable
+- [ ] Invite for email that already has access to the prefix → rejected with clear error
+- [ ] Resend invite → new email sent, `expires_at` reset, `last_sent_at` updated
+- [ ] Filter `inviteLinks` by status → correct results
+- [ ] UI "add" button for anonymous invites is disabled
+
+---
+
+## Phase 2 — Retire Anonymous Invite Links
+
+Once all legacy anonymous invites have expired (~2 weeks after Phase 1 deploys), remove the anonymous invite UI entirely and enforce email on all invites at the schema level.
+
+- `email` becomes NOT NULL on `internal.invite_links` (all remaining anonymous rows are expired — delete or keep as historical)
+- Remove anonymous invite list from the UI
+- `createInviteLink` already requires email — no GraphQL changes needed
+
+### Verification
+
+- [ ] No anonymous invite rows with NULL email remain active
+- [ ] Anonymous invite UI elements fully removed
+- [ ] Schema enforces NOT NULL on `email`
+
+---
+
+## Phase 3 — Tenant Members View
+
+Replaces the raw grants list with a people-centric view. Admins see who has access, how they got in, and when they last showed up.
+
+### GraphQL API
+
+New `TenantMember` type:
+
+| Field          | Source                          | Notes                                                         |
+| -------------- | ------------------------------- | ------------------------------------------------------------- |
+| `userId`       | `auth.users`                    |                                                               |
+| `email`        | `auth.users`                    |                                                               |
+| `fullName`     | `auth.users.raw_user_meta_data` |                                                               |
+| `avatarUrl`    | `auth.users.raw_user_meta_data` |                                                               |
+| `loginMethod`  | `auth.identities`               | password / Google / SSO                                       |
+| `lastSignInAt` | `auth.users`                    |                                                               |
+| `grants`       | `user_grants`                   | Scoped to the queried tenant's prefix                         |
+| `inviteStatus` | `invite_links`                  | ACTIVE or PENDING_INVITE (invite exists but not yet redeemed) |
+
+New `Tenant.members(after, first, search)` query — paginated, admin-only. Joins `auth.users` for profile info, `auth.identities` for login method, `invite_links` to distinguish pending invitees from active members.
+
+### Frontend
+
+- New "Members" tab in tenant admin area
+- Table: avatar, name, email, login method, last sign-in, grants
+- Search bar, click-through to grant details
+- Pending invitees visually distinguished from active members
+
+### Verification
+
+- [ ] Members list shows each user once, with all their grants aggregated
+- [ ] Pending invitee (invite sent, not yet redeemed) shows PENDING_INVITE status
+- [ ] Search by name or email returns correct results
+- [ ] Pagination works with > 20 members
+- [ ] Non-admin cannot access the members query
+
+---
+
+## Phase 4 — Grant Management
+
+Admins add and remove grants directly on existing users from the members view — no invite required. This is also where flowctl migrates off PostgREST for grant writes, aligning CLI and UI on one authorization path.
+
+### GraphQL Mutations
+
+- `addUserGrant(userId, catalogPrefix, capability, detail?)` — requires admin on prefix, errors on duplicate (no upsert). `detail` is a free-form audit string, preserved from flowctl's existing surface.
+- `removeUserGrant(grantId)` — requires admin on prefix
+- `addRoleGrant` / `removeRoleGrant` — mirrored for role-to-role grants, also accepting `detail`
+- No update mutation — remove then add for capability changes
+
+### flowctl Migration
+
+Reimplement `flowctl auth roles grant` and `revoke` against the new GraphQL mutations instead of direct PostgREST writes on `user_grants` / `role_grants`. The `--detail` flag is preserved end-to-end via the mutation argument.
+
+**Preserve upsert behavior during a deprecation window.** When `grant` is called against an existing `(subject, object)` pair, flowctl detects the conflict and issues a revoke-then-add to keep current scripts working. Each upsert prints a deprecation warning explaining that a future release will require an explicit revoke-then-grant. Removal of the auto-revoke fallback is a follow-up effort, not part of this plan.
+
+`flowctl auth roles list` can stay on `combined_grants_ext` for now — read paths aren't on the critical path for this migration.
+
+### Frontend
+
+- "Add grant" action in members view
+- "Remove" button per grant
+- UI streamlines "change capability" as remove + add
+
+### Verification
+
+- [ ] Add grant via dashboard → user sees new prefix immediately
+- [ ] Add duplicate grant → clear error, no upsert
+- [ ] Remove grant → user loses access
+- [ ] `flowctl auth roles grant` against new GraphQL → works
+- [ ] `flowctl auth roles grant` with existing `(subject, object)` → auto revoke-then-add + deprecation warning
+- [ ] `flowctl auth roles revoke` → works
+- [ ] Non-admin on prefix → mutations rejected
+
+---
+
+## Phase 5 — Retire Anonymous Invite Links
+
+Email becomes required on all new invites.
+
+- `email` becomes NOT NULL on `createInviteLink`
+- Transition window: existing anonymous links remain redeemable
+- Dashboard: email field becomes required, copyable link kept as secondary action
+
+### Verification
+
+- [ ] Create invite without email → rejected
+- [ ] Existing anonymous link still redeemable during transition