Skip to content

fix(storage): ServerCredentialStore orphan-reason discriminator + metadata-first delete#541

Open
sroussey wants to merge 2 commits into
mainfrom
claude/beautiful-mayer-j2Nma
Open

fix(storage): ServerCredentialStore orphan-reason discriminator + metadata-first delete#541
sroussey wants to merge 2 commits into
mainfrom
claude/beautiful-mayer-j2Nma

Conversation

@sroussey
Copy link
Copy Markdown
Collaborator

Summary

Two high-priority correctness fixes for ServerCredentialStore in packages/storage/src/credentials/.

Fix 1 — orphanReason discriminator on orphan markers

CredentialMetadataRow.orphanedAt previously left operators guessing which failure branch produced a sticky orphan marker. This change adds an optional orphanReason field with one of three string-literal values:

  • "vault-write-failed"put() new-entry rollback path: metadata row written, vault.setSecret threw, and the follow-up metadata.delete also threw, so the pending row was rewritten as a sticky marker.
  • "metadata-commit-failed"put() new-entry commit path: vault.setSecret succeeded but the pending: true -> false flip threw, so the vault holds bytes the metadata can no longer expose.
  • "vault-delete-failed"delete() path (introduced by Fix 2 below): metadata.delete succeeded but vault.deleteSecret threw, so the vault may still hold bytes for a key no metadata row exposes.

A OrphanReason type alias is exported (NonNullable<CredentialMetadataRow["orphanReason"]>) for downstream consumers (reconciliation tooling, dashboards). The JSDoc on orphanedAt now points at orphanReason for branch semantics.

Fix 2 — delete() is metadata-first; deleteAll() aggregates per-id failures

Previously, delete() called vault.deleteSecret first and metadata.delete second. If the vault call threw, the row remained visible to readers but its secret was already gone — a torn state. The new ordering is:

  1. Read the existing row; return false if absent.
  2. await this.metadata.delete(id) — failure bubbles, vault untouched.
  3. await this.vault.deleteSecret(id) — failure triggers a best-effort orphan marker (orphanReason: "vault-delete-failed") and throws a wrapped Error whose cause is the vault error.

delete() itself is now a thin wrapper around a private deleteById(id, key), and deleteAll() iterates scopedIds() calling the same path. Per-id errors are collected and surfaced as an AggregateError once every id has been attempted, so one bad row no longer aborts cleanup of the rest. The failing ids retain sticky orphan markers for operator reconciliation; succeeding ids are fully removed.

get() and has() self-eviction calls (await this.delete(key) on expiry) are now wrapped in try { ... } catch {} so the new throwy delete() cannot turn an expired-read miss into a thrown error for callers — preserving the prior swallow-on-eviction behaviour.

Backward compatibility

  • orphanReason is optional. Legacy persisted rows with orphanedAt but no orphanReason continue to be treated as orphan markers (they are pending: true, so they were already invisible to readers). A regression test covers this case.
  • Behaviour change: deleteAll() now throws AggregateError on per-id failure instead of throwing the first vault error and aborting the rest of the cleanup. Callers that previously only await-ed deleteAll() continue to see a throw on failure; callers that depended on early-abort semantics will need to update.
  • Behaviour change: delete() now throws a wrapped Error (with cause) when vault.deleteSecret fails after metadata removal, instead of leaving a visible-but-empty row. The previous behaviour was a silent torn state; callers must now be prepared to handle this throw.

Files modified

  • packages/storage/src/credentials/ServerCredentialStore.ts — orphanReason field + OrphanReason export, both put() branches tagged, new private deleteById(), rewritten delete() / deleteAll(), swallow-wrapped self-eviction in get() / has().
  • packages/storage/src/credentials/ServerCredentialStore.test.ts — extended two existing orphan-path tests to assert orphanReason, added four new tests (delete() vault-failure, delete() metadata-failure, deleteAll() AggregateError, legacy orphan compatibility), updated the existing delete removes both test to verify metadata-before-vault ordering via spy call-order.

Test plan

  • bun test packages/storage/src/credentials/ServerCredentialStore.test.ts passes locally
  • Type-check (tsc --noEmit) clean across the package
  • Lint clean on the two touched files
  • Spot-check downstream consumers of CredentialMetadataRow for compile compatibility with the new optional field
  • Confirm no caller in the repo depends on deleteAll() throwing the first vault error rather than AggregateError

Generated by Claude Code

sroussey added 2 commits May 30, 2026 01:16
…in ServerCredentialStore

- Extend CredentialMetadataRow with optional orphanReason discriminator
  ("vault-write-failed" | "metadata-commit-failed" | "vault-delete-failed")
  and export OrphanReason type alias.
- Tag orphan markers in both put() failure branches with the appropriate
  reason so operators can distinguish vault-write rollback failures from
  commit-step failures.
- Rewrite delete() to remove metadata BEFORE the vault, so a vault delete
  failure can't leave a row visible-but-secret-gone. If vault delete fails
  after metadata is removed, a best-effort sticky orphan marker
  (orphanReason: "vault-delete-failed") is written and a wrapped error is
  thrown.
- Rewrite deleteAll() to use the same per-id path and throw AggregateError
  when one or more ids fail, leaving orphan markers for the failing ids
  while still completing the rest.
- Wrap self-eviction calls in get()/has() in try/catch to preserve the
  prior swallow-on-eviction behaviour despite delete() now being throwy.
…lete

- Assert orphanReason on both put() orphan-marker branches
  ("vault-write-failed" and "metadata-commit-failed").
- Add delete() coverage: vault-failure leaves a sticky orphan marker with
  orphanReason "vault-delete-failed" while the row stays invisible to
  readers, error wraps the vault cause.
- Add delete() coverage: metadata-failure propagates without touching the
  vault (vault.deleteSecret spy must not be called).
- Add deleteAll() coverage: one failing id yields an AggregateError but
  other ids still complete and the failing id keeps an orphan marker.
- Add legacy compatibility test: an orphan row without orphanReason
  (pre-discriminator persisted state) remains invisible to readers.
- Update the existing "delete removes both" test to assert metadata is
  deleted BEFORE the vault via a spy call-order check.
@github-actions
Copy link
Copy Markdown

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 61.95% 24614 / 39730
🔵 Statements 61.81% 25468 / 41200
🔵 Functions 62.92% 4662 / 7409
🔵 Branches 50.66% 12066 / 23813
File CoverageNo changed files found.
Generated in workflow #2454 for commit 1193ddd by the Vitest Coverage Report Action

sroussey added a commit that referenced this pull request Jun 2, 2026
…es, orphan reasons)

Three classes of issue addressed:

1. Key-injection scope escape.
   userId, projectId, and every credential key are now matched against a
   strict SAFE_SEGMENT grammar (^[A-Za-z0-9_-]{1,128}$). Without this, a key
   like "../../other-user/other-project/leaked" could be smuggled through
   vaultId() and collide with another scope's vault id. Failures are surfaced
   as TypeError without echoing the invalid value back.

2. Delete races and orphan-overwrite.
   - delete() is refactored onto an internal deleteById() helper that
     deletes metadata FIRST, then vault. A throw on metadata.delete leaves
     the entry fully intact and readable; a throw on vault.deleteSecret
     reinstates a pending orphan marker (with orphanReason
     "vault-delete-failed") so readers see absence and operators can
     reconcile, AND surfaces a wrapped error whose message distinguishes
     "orphan marker persisted" from "orphan marker also failed to persist".
   - deleteById skips rows whose metadata is pending: true. This closes
     both the orphan-overwrite path (delete on an existing orphan marker
     used to nuke the vault entry it was tracking) and the concurrent
     put(new)+delete race (delete during the pending window of a new-entry
     put would wipe the vault entry mid-write).
   - get()/has() wrap expiry self-eviction in try/catch so a transient
     KV/vault failure during eviction does not surface as a thrown read.
   - deleteAll() collects per-row failures and throws AggregateError so a
     single bad row no longer silently masks deletion of the rest.

3. Orphan reason discriminator.
   CredentialMetadataRow gains an optional orphanReason field
   ("vault-write-failed" | "metadata-commit-failed" | "vault-delete-failed")
   and a corresponding exported OrphanReason type. Both put() orphan
   paths and the new delete() orphan path stamp the reason so operator
   tooling can pick a remediation strategy without re-running the original
   write. Legacy rows without the discriminator stay invisible to readers
   (covered by an explicit test).

14 new tests added; existing delete-removes-both test upgraded to assert
metadata-first ordering via spy call-order; existing put orphan tests
upgraded to assert orphanReason.

Supersedes #541.

https://claude.ai/code/session_01Wws8oZpB5imjKL2e7DRXtc
sroussey added a commit that referenced this pull request Jun 3, 2026
…es, orphan reasons)

Three classes of issue addressed:

1. Key-injection scope escape.
   userId, projectId, and every credential key are now matched against a
   strict SAFE_SEGMENT grammar (^[A-Za-z0-9_-]{1,128}$). Without this, a key
   like "../../other-user/other-project/leaked" could be smuggled through
   vaultId() and collide with another scope's vault id. Failures are surfaced
   as TypeError without echoing the invalid value back.

2. Delete races and orphan-overwrite.
   - delete() is refactored onto an internal deleteById() helper that
     deletes metadata FIRST, then vault. A throw on metadata.delete leaves
     the entry fully intact and readable; a throw on vault.deleteSecret
     reinstates a pending orphan marker (with orphanReason
     "vault-delete-failed") so readers see absence and operators can
     reconcile, AND surfaces a wrapped error whose message distinguishes
     "orphan marker persisted" from "orphan marker also failed to persist".
   - deleteById skips rows whose metadata is pending: true. This closes
     both the orphan-overwrite path (delete on an existing orphan marker
     used to nuke the vault entry it was tracking) and the concurrent
     put(new)+delete race (delete during the pending window of a new-entry
     put would wipe the vault entry mid-write).
   - get()/has() wrap expiry self-eviction in try/catch so a transient
     KV/vault failure during eviction does not surface as a thrown read.
   - deleteAll() collects per-row failures and throws AggregateError so a
     single bad row no longer silently masks deletion of the rest.

3. Orphan reason discriminator.
   CredentialMetadataRow gains an optional orphanReason field
   ("vault-write-failed" | "metadata-commit-failed" | "vault-delete-failed")
   and a corresponding exported OrphanReason type. Both put() orphan
   paths and the new delete() orphan path stamp the reason so operator
   tooling can pick a remediation strategy without re-running the original
   write. Legacy rows without the discriminator stay invisible to readers
   (covered by an explicit test).

14 new tests added; existing delete-removes-both test upgraded to assert
metadata-first ordering via spy call-order; existing put orphan tests
upgraded to assert orphanReason.

Supersedes #541.

https://claude.ai/code/session_01Wws8oZpB5imjKL2e7DRXtc
sroussey added a commit that referenced this pull request Jun 4, 2026
…ore (sec) (#544)

* fix(storage): harden ServerCredentialStore (key injection, delete races, orphan reasons)

Three classes of issue addressed:

1. Key-injection scope escape.
   userId, projectId, and every credential key are now matched against a
   strict SAFE_SEGMENT grammar (^[A-Za-z0-9_-]{1,128}$). Without this, a key
   like "../../other-user/other-project/leaked" could be smuggled through
   vaultId() and collide with another scope's vault id. Failures are surfaced
   as TypeError without echoing the invalid value back.

2. Delete races and orphan-overwrite.
   - delete() is refactored onto an internal deleteById() helper that
     deletes metadata FIRST, then vault. A throw on metadata.delete leaves
     the entry fully intact and readable; a throw on vault.deleteSecret
     reinstates a pending orphan marker (with orphanReason
     "vault-delete-failed") so readers see absence and operators can
     reconcile, AND surfaces a wrapped error whose message distinguishes
     "orphan marker persisted" from "orphan marker also failed to persist".
   - deleteById skips rows whose metadata is pending: true. This closes
     both the orphan-overwrite path (delete on an existing orphan marker
     used to nuke the vault entry it was tracking) and the concurrent
     put(new)+delete race (delete during the pending window of a new-entry
     put would wipe the vault entry mid-write).
   - get()/has() wrap expiry self-eviction in try/catch so a transient
     KV/vault failure during eviction does not surface as a thrown read.
   - deleteAll() collects per-row failures and throws AggregateError so a
     single bad row no longer silently masks deletion of the rest.

3. Orphan reason discriminator.
   CredentialMetadataRow gains an optional orphanReason field
   ("vault-write-failed" | "metadata-commit-failed" | "vault-delete-failed")
   and a corresponding exported OrphanReason type. Both put() orphan
   paths and the new delete() orphan path stamp the reason so operator
   tooling can pick a remediation strategy without re-running the original
   write. Legacy rows without the discriminator stay invisible to readers
   (covered by an explicit test).

14 new tests added; existing delete-removes-both test upgraded to assert
metadata-first ordering via spy call-order; existing put orphan tests
upgraded to assert orphanReason.

Supersedes #541.

https://claude.ai/code/session_01Wws8oZpB5imjKL2e7DRXtc

* fix(storage): restore ICredentialStore contract on ServerCredentialStore (sec)

Follow-up to #543. Two contract regressions discovered in code review:

H1: deleteAll() silently skipped pending/orphan rows because it now
    delegates to deleteById (added in #543) which short-circuits when
    pending=true. Result: no public API cleared orphan markers; a
    "wipe scope" call silently leaked the very rows operators need
    to clean up. Adds forceDeleteById that bypasses the pending-skip;
    deleteAll uses it. deleteById (single-key) unchanged.

H2: get/has/delete now threw TypeError for invalid keys because they
    all routed through vaultId(). ICredentialStore documents these as
    returning undefined/false on missing keys; throwing breaks
    substitutability with InMemoryCredentialStore / EncryptedKvCredentialStore.
    Adds isSafeKey() predicate; readers short-circuit; put() still
    throws via vaultId (only path that persists a colliding id).

M1: SAFE_SEGMENT widened from [A-Za-z0-9_-] to [A-Za-z0-9._:-] so
    natural credential names like "openai.prod" and "scope:billing"
    are accepted. Still rejects path separators and whitespace.

5 new tests + 1 rewritten test. Existing tests pass.

* docs(storage): clarify deleteById JSDoc + fix deleteByid typo (review)

Copilot review feedback on #544:

- deleteById's JSDoc claimed both delete() and deleteAll() share its
  code path, but deleteAll() deliberately routes through
  forceDeleteById() to bypass the pending-row short-circuit and
  reach sticky orphan markers. Updated to call out the asymmetry
  with a forward reference to forceDeleteById.
- Typo: 'deleteByid' -> 'deleteById' so the @link symbol resolves.

* refactor(storage): extract commitDelete to dedupe ServerCredentialStore delete paths

`deleteById` and `forceDeleteById` had nearly-identical bodies after the
pending-row check — the metadata-first ordering, orphan-marker rewrite,
and wrapped-error throw were duplicated, so a fix in one could silently
drift from the other. Lift the shared tail into `commitDelete(id, key,
existing, op)` so both callers route through one source of truth and
the only meaningful divergence (pending-row policy) stays at the call
site. Also corrects the stale `vaultId` JSDoc that still listed the
reader methods as callers — they were moved to `isSafeKey` in the same
PR for the ICredentialStore-contract substitutability fix.

https://claude.ai/code/session_011KMd9sERp2rguyekAi8a3u

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant