Configurable custom jina url by uhercik · Pull Request #10 · danny-avila/agents

uhercik · 2025-08-05T13:54:24Z

Made jina url configurable using JINA_URL env variable

davidjrh · 2025-08-11T15:48:36Z

This is quite useful to allow hosting the JINA reranker in Azure Foundry, since it is available through the Azure Marketplace. https://jina.ai/news/jina-embeddings-and-reranker-on-azure-scalable-business-ready-ai-solutions/

BTW, would be awesome to have the same for the COHERE_URL (I'm going to do another similar PR)

…t, cache patterns Resolves the comprehensive review on #87: - Rename `HookMatcher.matcher` -> `HookMatcher.pattern` to fix the self-referential naming (`m.matcher` reads as "the matcher's matcher"). - Wire `PostToolUseHookOutput.updatedOutput` through `AggregatedHookResult` and `fold()`. The type was previously a promise the executor didn't fulfill — a hook returning `{ updatedOutput: ... }` was silently dropped. - Correct the JSDoc on `updatedInput`/`updatedOutput`. The previous copy claimed non-determinism "in Promise.all resolution order" — but `Promise.all` preserves input order, so the fold is deterministic in registration order (outer loop over matchers, inner loop over each matcher's hooks). Updated the `executeHooks` function docstring to match. - Add regex compilation cache and length bound to `matchers.ts`. Patterns compile at most once per unique string and are reused across calls; patterns over MAX_PATTERN_LENGTH (512) are rejected outright as a cheap ReDoS mitigation. Documented the trust model (patterns are trusted input; hosts must validate user-supplied patterns upstream). - Document the wildcard-only semantics on query-less events: a non-empty pattern on an event that doesn't supply a matchQuery (RunStart, Stop, etc.) never matches. - Document the `once: true` concurrent-dispatch limitation: two parallel `executeHooks` calls each snapshot the matcher list, so `once` means "once per call", not "once globally". Matches Claude Code's semantics. Added a test that pins this behavior. - Merge `matchers.filter(...)` + task-build loop into a single pass per AGENTS.md "consolidate sequential O(n) operations." - Scope `eslint-disable no-console` to the single `console.warn` call in `reportErrors` instead of disabling it file-wide. - Fix the timeout-ignoring-hook test: clear the dangling `setTimeout` and assert the error surfaces an abort-shaped label. - Add tests: multi-hook `preventContinuation` (first-writer-wins on stopReason, flag without reason), `updatedOutput` flow + registration order, registration-order determinism for `updatedInput`, pattern length bound, compilation cache hit/miss/clear, concurrent `once` dispatch. Total: 47 -> 59 tests. Findings audited and resolved: - #1 ReDoS: cache + length bound + trust model doc (full fix blocked on host-side validation) - #2 Wrong updatedInput ordering JSDoc: fixed - #3 Dead `updatedOutput` type: wired through - #4 Concurrent `once` race: documented + test - #5 `HookMatcher.matcher` naming: renamed to `pattern` - #6 `WideCallback` dead code: rejected as inaccurate (used in `runHook`) - #7 Eslint-disable scope: line-scoped - #8 File-path comments: rejected; matches existing project convention - #9 Two-pass filter: single pass - #10 Regex recompilation: cached - CL-4 Undocumented wildcard semantics: documented - RT-3 Timeout test dangling timer: fixed + error content verified - RT-4 No multi-hook preventContinuation test: added

* feat: add inert hook registry and executor scaffolding (phase 1/1) Introduces `src/hooks/` with types, registry, and executor for a 12-event hook lifecycle. Purely additive — not exported from `src/index.ts` and not yet wired into `Run`, `Graph`, or `ToolNode`. Integration lands in the next PR so this change is shippable independently with zero behavioral impact on hosts that don't opt in. - Discriminated-union input/output per event; generic `HookCallback<E>` and `HookMatcher<E>` for type-safe registration. - `HookRegistry` uses `Map<sessionId, bucket>` (not `Record`) to avoid O(n^2) churn under parallel registration in multi-tenant hosts. - `executeHooks` fans out in parallel, races hooks against a combined parent + timeout `AbortSignal` so non-listening hooks still time out, folds decisions with `deny > ask > allow` precedence, accumulates `additionalContext`, and self-removes `once: true` matchers after any successful hook. Errors are non-fatal: swallowed into the aggregated result and routed through an optional winston logger (falling back to `console.warn`), with internal matcher errors suppressed. 47 tests under `src/hooks/__tests__/` cover registry scoping, regex matching, precedence folding, once-self-removal (success/failure/mixed), timeout enforcement (including non-listening hooks), error non-fatality, and parent signal combination. * fix(hooks): address review findings — rename field, wire updatedOutput, cache patterns Resolves the comprehensive review on #87: - Rename `HookMatcher.matcher` -> `HookMatcher.pattern` to fix the self-referential naming (`m.matcher` reads as "the matcher's matcher"). - Wire `PostToolUseHookOutput.updatedOutput` through `AggregatedHookResult` and `fold()`. The type was previously a promise the executor didn't fulfill — a hook returning `{ updatedOutput: ... }` was silently dropped. - Correct the JSDoc on `updatedInput`/`updatedOutput`. The previous copy claimed non-determinism "in Promise.all resolution order" — but `Promise.all` preserves input order, so the fold is deterministic in registration order (outer loop over matchers, inner loop over each matcher's hooks). Updated the `executeHooks` function docstring to match. - Add regex compilation cache and length bound to `matchers.ts`. Patterns compile at most once per unique string and are reused across calls; patterns over MAX_PATTERN_LENGTH (512) are rejected outright as a cheap ReDoS mitigation. Documented the trust model (patterns are trusted input; hosts must validate user-supplied patterns upstream). - Document the wildcard-only semantics on query-less events: a non-empty pattern on an event that doesn't supply a matchQuery (RunStart, Stop, etc.) never matches. - Document the `once: true` concurrent-dispatch limitation: two parallel `executeHooks` calls each snapshot the matcher list, so `once` means "once per call", not "once globally". Matches Claude Code's semantics. Added a test that pins this behavior. - Merge `matchers.filter(...)` + task-build loop into a single pass per AGENTS.md "consolidate sequential O(n) operations." - Scope `eslint-disable no-console` to the single `console.warn` call in `reportErrors` instead of disabling it file-wide. - Fix the timeout-ignoring-hook test: clear the dangling `setTimeout` and assert the error surfaces an abort-shaped label. - Add tests: multi-hook `preventContinuation` (first-writer-wins on stopReason, flag without reason), `updatedOutput` flow + registration order, registration-order determinism for `updatedInput`, pattern length bound, compilation cache hit/miss/clear, concurrent `once` dispatch. Total: 47 -> 59 tests. Findings audited and resolved: - #1 ReDoS: cache + length bound + trust model doc (full fix blocked on host-side validation) - #2 Wrong updatedInput ordering JSDoc: fixed - #3 Dead `updatedOutput` type: wired through - #4 Concurrent `once` race: documented + test - #5 `HookMatcher.matcher` naming: renamed to `pattern` - #6 `WideCallback` dead code: rejected as inaccurate (used in `runHook`) - #7 Eslint-disable scope: line-scoped - #8 File-path comments: rejected; matches existing project convention - #9 Two-pass filter: single pass - #10 Regex recompilation: cached - CL-4 Undocumented wildcard semantics: documented - RT-3 Timeout test dangling timer: fixed + error content verified - RT-4 No multi-hook preventContinuation test: added * perf(hooks): atomic once, bounded LRU, ReDoS heuristic, shared signal Rework the phase-1 hooks scaffolding for the multi-tenant scale we actually operate at. These are behavior changes (semantics of `once` tightened, new pattern-rejection path) but none of them are wired into the runtime yet — this PR is still inert — so the blast radius is the hooks directory only. ## Atomic `once: true` Removal now happens synchronously inside `executeHooks`, between `registry.getMatchers` and the first `await`. Node's event loop serialises sync work, so two concurrent `executeHooks` calls can never both observe the same `once` matcher — whichever call runs its sync prefix first consumes it, and the loser sees an empty bucket. Semantic change: `once` is now at-most-one-dispatch, not at-most-one-successful-execution. If every hook in a `once` matcher throws, the matcher is still gone. The old retry-on-failure behavior was Claude Code's model — fine for a single-user CLI, but in a multi-tenant server it opens a window where concurrent bursts can re-dispatch a "once" hook non-idempotently. "At most once, ever" is the correct semantic for a shared registry. Hosts that need retry should register a normal matcher and self-unregister via the callback returned from `registry.register`. Dropped `collectOnceMatchersForRemoval` + its Map/array allocations — the post-hoc collection step is no longer needed. Tests: replaced "keeps the matcher when every hook throws" with "removes the matcher even when every hook in it throws". Added "fires exactly once across 3 concurrent calls" and "fires exactly once across 8-way concurrent dispatch when hooks are slow" (the slow test deliberately uses a 10ms hook to force overlap). ## Bounded LRU regex cache `matchers.ts` was unbounded. Under multi-tenant use where different tenants register different patterns, the cache would leak. Capped at `MAX_CACHE_SIZE = 256` entries with LRU eviction: hits refresh position (delete-then-set), misses at capacity evict the oldest key. Failed compiles are cached the same way so a tenant spamming bad patterns doesn't burn CPU on every call. Exposed `getMatcherCacheSize` and `MAX_CACHE_SIZE` for test observability. Tests verify eviction order (cold patterns recompiled after overflow) and that hot patterns refreshed mid-stream survive eviction pressure. ## ReDoS heuristic: nested-quantifier detector `hasNestedQuantifier` walks a pattern linearly, tracks paren depth, and rejects any pattern where a quantified group contains another quantifier — the `(a+)+`, `(.*)*`, `(\w+)+` shape that is the #1 catastrophic-backtracking source. Character classes (`[*+?]`) and escaped quantifiers (`\+`) are correctly ignored. Rejection happens before `new RegExp`, so adversarial input never reaches the compiler. This is a floor, not a ceiling — it doesn't catch ambiguous-alternation ReDoS like `(a|a)+`. Hosts that accept user-supplied patterns must still validate upstream. Documented in the `HookMatcher.pattern` and `matchesQuery` JSDoc. Test verifies that `(a+)+$` on a 35-char adversarial input resolves in under 50ms (would be seconds without the heuristic). ## Shared `AbortSignal` per matcher Was: one `AbortSignal.timeout(ms)` + one `AbortSignal.any([parent, timeout])` per hook. For a matcher with N hooks and the same timeout, that's N timers + N composite signals. Now: one signal per matcher, shared across all its hooks. Since hooks in a matcher already share `matcher.timeout`, they may as well share the timer. Behavior is identical (all hooks fire and race the same deadline) but allocation drops from N to 1 per matcher. ## Tests 72 passing (was 59, +13): - Atomic once: 2 concurrent tests + 1 failure-removal test updated - LRU: cache size cap, eviction order, hit refresh - Nested quantifier: classic shapes, deep nesting, escaped chars, character classes, false positives on legitimate patterns - ReDoS mitigation: rejection path, <50ms response on adversarial input * fix(hooks): fix ReDoS heuristic false-positives on (?:...) groups Resolves the follow-up review on #87. Four findings, all valid. ## #1 MAJOR — hasNestedQuantifier false-positives on group syntax The previous scanner treated `?` as a quantifier at any depth > 0, but `?` immediately after `(` is group syntax, not a quantifier: `(?:...)`, `(?=...)`, `(?!...)`, `(?<name>...)`, `(?<=...)`, `(?<!...)`. Patterns like `(?:pre_)?tool_name` were silently rejected and the registering hook never fired — with no error surfaced to the caller. Fix: explicit group-syntax prefix skip inside the `(` handler. When the scanner enters a group it peeks for `?` and advances past the modifier characters (`:`, `=`, `!`, `<=`, `<!`, or a named-group label `<name>`) before processing the group body. The `?` is therefore never considered a quantifier at the start of a group. While in there I also fixed a second correctness bug the reviewer didn't catch: the old depth-indexed array didn't propagate a child group's "contains quantifier" signal up to its parent, so `(?:(a+))+` (outer quantifier over a wrapped quantified group, equivalent to `(a+)+`) escaped detection. The scanner now uses an explicit stack of `QuantifierFrame` records, and when a child group closes it propagates `hasBacktrackRisk` to the parent frame — whether the child itself was quantified or not. This catches `(?:(a+))+` correctly while still allowing non-risky wrappers like `(?:(ab))+`. Added 9 tests covering non-capturing groups, lookaheads, lookbehinds, named captures, `(?:(a+))+` risk propagation, and `((ab)+)+` deep wrapping. Verified existing ReDoS-detection tests still pass. ## #2 MINOR — executeHooks.test.ts did not clear the pattern cache Matcher cache is module-level. `matchers.test.ts` clears it in `beforeEach`; `executeHooks.test.ts` did not, so patterns compiled during one test persisted across subsequent tests in the same file. Added `clearMatcherCache()` to `executeHooks`'s `beforeEach` — no test failures today, but restores test independence. ## #3 MINOR — Test utilities leaked into production barrel `clearMatcherCache` and `getMatcherCacheSize` are test-only helpers (their JSDoc says so) but were exported from `src/hooks/index.ts`. When the integration PR eventually re-exports `src/hooks` from `src/index.ts` they would become public API. Removed both from the barrel. Tests already import them directly from `../matchers`, so no test changes needed. `hasNestedQuantifier`, `MAX_PATTERN_LENGTH`, and `MAX_CACHE_SIZE` remain exported — hosts can legitimately use them for upstream validation. ## #4 NIT — LRU refresh overhead at low cache pressure Every `compile()` cache hit was doing `delete + set` to refresh LRU position, even when the cache was nowhere near capacity. With `MAX_CACHE_SIZE = 256` and typical working sets of tens of patterns, eviction pressure is near-zero and the refresh is pure overhead. Added a `LRU_REFRESH_THRESHOLD = 75%` gate: refresh only runs when the cache is above 192 entries. Below that the code fast-paths straight back from `compile()`. Above it the LRU semantics kick in so hot patterns still survive evictions — the existing "refreshes LRU position on hit" test still passes because by the time it runs the check, the cache is at capacity. Tests: 72 -> 81 (+9 for group-syntax and risk-propagation coverage).

Resolves all findings from comprehensive review: MAJOR fixes: - #1: convertInjectedMessages now passes MessageContentComplex[] content through directly instead of JSON.stringify (preserves structured content) - #2: Injected messages placed AFTER ToolMessages to respect provider message ordering contracts (AIMessage tool_calls -> ToolMessage adjacency). Added JSDoc documenting host formatter contract. - #3: Removed duplicate Zod/JSON Schema. SkillToolSchema is now the single JSON Schema source of truth (used by both SkillToolDefinition and createSkillTool). Internal Zod schema derives descriptions from it. MINOR fixes: - #4: Budget calculation now accounts for (n-1) newline chars between entries. Added final length guard after proportional truncation. - #5: Moved InjectedMessage type from skill.ts to tools.ts alongside ToolExecuteResult that references it. - #6: Removed dead SkillExecutionMeta type (YAGNI). - #7: Added 4 new tests: MessageContentComplex[] passthrough, multi-tool aggregation ordering, mixed mode (direct + event-driven), and MessageContentComplex type-check. NIT fixes: - #8: Removed formatNamesOnly duplicate; formatEntries handles empty descriptions natively. - #9: Replaced em dash with period in SkillToolDescription. - #10: Tests import events module at top level instead of fragile jest.spyOn(await import(...)) pattern.

…ests Resolves all 10 findings from the review on #90. ## Structural rewrite of dispatchToolEvents The post-processing loop was rewritten to fix #1, #2, #6, #8, #10: - #1 MAJOR: Denied calls now dispatch ON_RUN_STEP_COMPLETED via a shared `dispatchStepCompleted` helper (extracted from the duplicate logic in the executed-results loop). Without this, the frontend would show stuck spinners for denied tool calls. - #2 MAJOR: Output messages are now collected in a Map keyed by tool_call_id, then returned in the original `toolCalls` input order via `toolCalls.map(call => messageByCallId.get(call.id))`. Before, denied messages were prepended to executed messages, breaking the implicit ordering contract. - #6 MINOR: `toolUsageCount` is now incremented only for approved calls (moved from the initial `preToolCalls.map` into the `approvedEntries.map` that builds ToolCallRequests). Denied calls no longer inflate the turn counter. - #8 MINOR: Eliminated the double `result.status === 'error'` check. ToolMessage is now constructed inside each status branch directly after hook dispatch. - #10 NIT: `requests.find()` replaced with a pre-built `Map<id, ToolCallRequest>` for O(1) lookup per result, per AGENTS.md guidance on Map/Set over Array.find. ## Error protection - #3 MAJOR: PreToolUse `Promise.all` now wraps each `executeHooks` call in `.catch(() => emptyResult)` so an infrastructure error in the hooks subsystem cannot crash the entire tool batch. Matches the stated "hook errors are non-fatal" contract and is consistent with PostToolUse/PostToolUseFailure which already had `.catch()`. ## New tests - #4 MAJOR: Added `updatedOutput replaces the ToolMessage content` — registers a PostToolUse hook returning `{updatedOutput: 'REDACTED'}`, verifies the ToolMessage content in the graph's run messages is `'REDACTED'` (not the original output). - #7 MINOR: PermissionDenied test replaced 50ms `setTimeout` with a `Promise` resolved by the hook callback. No more flaky sleep. ## Cleanup - #9 NIT: Updated stale "once the tool-hook PR lands" comment in `src/hooks/index.ts`. Tests: 103 -> 104 (+1 for updatedOutput).

* feat: wire tool hooks into event-driven ToolNode (phase 1/3) Fires PreToolUse, PostToolUse, PostToolUseFailure, and PermissionDenied hooks inside `ToolNode.dispatchToolEvents` — the single chokepoint for all event-driven tool execution in LibreChat. ## Hook lifecycle in dispatchToolEvents 1. **PreToolUse** fires per call in parallel before ON_TOOL_EXECUTE dispatch. Denied calls (deny or ask) produce error ToolMessages and fire PermissionDenied; surviving calls proceed with optional updatedInput applied to tool args. 2. Only approved calls are batched into ON_TOOL_EXECUTE and sent to the host. If all calls are denied, the batch dispatch is skipped entirely. 3. **PostToolUse** fires per successful result. If a hook returns updatedOutput, the ToolMessage content is replaced before the message is appended to the graph. 4. **PostToolUseFailure** fires per error result. Observational only — cannot modify the error message. Post hooks are awaited (not fire-and-forget) so updatedOutput takes effect before the ToolMessage is consumed. Hook errors are swallowed via .catch() to avoid disrupting the tool result flow. PermissionDenied is fire-and-forget since it's purely observational. ## Plumbing - `ToolNodeOptions.hookRegistry?: HookRegistry` — new optional field - `ToolNode` constructor stores `this.hookRegistry` - `Graph.createToolNode()` passes `this.hookRegistry` at both construction sites (event-driven and traditional) - `hookRegistry` is set on Graph BEFORE `createWorkflow()` is called (moved from the post-create assignment in Run constructor into `createLegacyGraph` and `createMultiAgentGraph`) so ToolNode receives a non-undefined registry at construction time ## hasHookFor guards All four hook fire points use `hasHookFor(event, runId)` before calling `executeHooks`, so when no hooks are registered for the event the overhead is a single Map lookup + array length check — no Promise allocations, no function calls. ## Tests 9 new integration tests in `src/hooks/__tests__/toolHooks.test.ts` using event-driven mode (toolDefinitions + ON_TOOL_EXECUTE handler): - PreToolUse fires with correct fields - PreToolUse deny blocks execution (tool handler never called) - PreToolUse ask blocks execution (v1) - PreToolUse updatedInput rewrites args before dispatch to host - PreToolUse hook errors are non-fatal (tool still executes) - PermissionDenied fires after deny with reason - PostToolUse fires with tool output - PostToolUseFailure fires on error result - No-hooks baseline works identically Total: 94 existing + 9 new = 103 passing. * fix(hooks): address PR 3 review — step completion, ordering, catch, tests Resolves all 10 findings from the review on #90. ## Structural rewrite of dispatchToolEvents The post-processing loop was rewritten to fix #1, #2, #6, #8, #10: - #1 MAJOR: Denied calls now dispatch ON_RUN_STEP_COMPLETED via a shared `dispatchStepCompleted` helper (extracted from the duplicate logic in the executed-results loop). Without this, the frontend would show stuck spinners for denied tool calls. - #2 MAJOR: Output messages are now collected in a Map keyed by tool_call_id, then returned in the original `toolCalls` input order via `toolCalls.map(call => messageByCallId.get(call.id))`. Before, denied messages were prepended to executed messages, breaking the implicit ordering contract. - #6 MINOR: `toolUsageCount` is now incremented only for approved calls (moved from the initial `preToolCalls.map` into the `approvedEntries.map` that builds ToolCallRequests). Denied calls no longer inflate the turn counter. - #8 MINOR: Eliminated the double `result.status === 'error'` check. ToolMessage is now constructed inside each status branch directly after hook dispatch. - #10 NIT: `requests.find()` replaced with a pre-built `Map<id, ToolCallRequest>` for O(1) lookup per result, per AGENTS.md guidance on Map/Set over Array.find. ## Error protection - #3 MAJOR: PreToolUse `Promise.all` now wraps each `executeHooks` call in `.catch(() => emptyResult)` so an infrastructure error in the hooks subsystem cannot crash the entire tool batch. Matches the stated "hook errors are non-fatal" contract and is consistent with PostToolUse/PostToolUseFailure which already had `.catch()`. ## New tests - #4 MAJOR: Added `updatedOutput replaces the ToolMessage content` — registers a PostToolUse hook returning `{updatedOutput: 'REDACTED'}`, verifies the ToolMessage content in the graph's run messages is `'REDACTED'` (not the original output). - #7 MINOR: PermissionDenied test replaced 50ms `setTimeout` with a `Promise` resolved by the hook callback. No more flaky sleep. ## Cleanup - #9 NIT: Updated stale "once the tool-hook PR lands" comment in `src/hooks/index.ts`. Tests: 103 -> 104 (+1 for updatedOutput). * fix(hooks): off-by-one in dispatchStepCompleted index, test guard, turn fallback - #1 MAJOR: dispatchStepCompleted read post-increment toolUsageCount for the `index` field. Added `turn?: number` parameter; approved-call site passes `request?.turn` (the pre-increment value), denied-call site omits it (fallback is correct since count was never bumped). - #2 NIT: updatedOutput test now asserts `expect(toolMsg).toBeDefined()` before content extraction so a missing tool message produces a clear failure rather than "Expected undefined to be 'REDACTED'". - #3 NIT: PreToolUse input `turn` field restored `?? 0` fallback so first-use tools see `turn: 0` instead of `turn: undefined`. * fix(hooks): batch tests, turn JSDoc, freeze sentinel, remove unused hookRegistry Addresses follow-up review findings on #90. - #1 MAJOR: Add multi-call batch tests — partial deny (echo denied, math approved, order preserved) and all-denied (ON_TOOL_EXECUTE handler never called). Uses counter-based IDs (#7 fix) to avoid Date.now() collisions across calls in the same test. - #2 MINOR: Document PreToolUseHookInput.turn semantics — within a single batch, all calls to the same tool share the same turn value. Per-call discrimination within a batch is not supported in v1. - #3 MINOR: Remove hookRegistry from the traditional (non-event-driven) ToolNode construction site in Graph.ts. Hooks only fire in dispatchToolEvents; passing the registry to a ToolNode that never uses it created a false impression of hook support. - #4 MINOR: Add PostToolUse error non-fatality test — throwing hook does not crash the batch, original content is preserved. - #5+#8 MINOR: Freeze the hook-error fallback sentinel via Object.freeze and rename from emptyResult to HOOK_FALLBACK. Prevents future mutation bugs if someone pushes to the shared arrays. - #6 MINOR: Document in ToolNodeOptions.hookRegistry JSDoc that hooks only fire for event-driven calls — directToolNames bypass dispatch. Tests: 104 -> 107 (+3 for batch partial-deny, all-denied, and PostToolUse error resilience). * nit: simplify test assertions, keep shallow freeze (deep conflicts with mutable type)

* feat: add inert hook registry and executor scaffolding (phase 1/1) Introduces `src/hooks/` with types, registry, and executor for a 12-event hook lifecycle. Purely additive — not exported from `src/index.ts` and not yet wired into `Run`, `Graph`, or `ToolNode`. Integration lands in the next PR so this change is shippable independently with zero behavioral impact on hosts that don't opt in. - Discriminated-union input/output per event; generic `HookCallback<E>` and `HookMatcher<E>` for type-safe registration. - `HookRegistry` uses `Map<sessionId, bucket>` (not `Record`) to avoid O(n^2) churn under parallel registration in multi-tenant hosts. - `executeHooks` fans out in parallel, races hooks against a combined parent + timeout `AbortSignal` so non-listening hooks still time out, folds decisions with `deny > ask > allow` precedence, accumulates `additionalContext`, and self-removes `once: true` matchers after any successful hook. Errors are non-fatal: swallowed into the aggregated result and routed through an optional winston logger (falling back to `console.warn`), with internal matcher errors suppressed. 47 tests under `src/hooks/__tests__/` cover registry scoping, regex matching, precedence folding, once-self-removal (success/failure/mixed), timeout enforcement (including non-listening hooks), error non-fatality, and parent signal combination. * fix(hooks): address review findings — rename field, wire updatedOutput, cache patterns Resolves the comprehensive review on #87: - Rename `HookMatcher.matcher` -> `HookMatcher.pattern` to fix the self-referential naming (`m.matcher` reads as "the matcher's matcher"). - Wire `PostToolUseHookOutput.updatedOutput` through `AggregatedHookResult` and `fold()`. The type was previously a promise the executor didn't fulfill — a hook returning `{ updatedOutput: ... }` was silently dropped. - Correct the JSDoc on `updatedInput`/`updatedOutput`. The previous copy claimed non-determinism "in Promise.all resolution order" — but `Promise.all` preserves input order, so the fold is deterministic in registration order (outer loop over matchers, inner loop over each matcher's hooks). Updated the `executeHooks` function docstring to match. - Add regex compilation cache and length bound to `matchers.ts`. Patterns compile at most once per unique string and are reused across calls; patterns over MAX_PATTERN_LENGTH (512) are rejected outright as a cheap ReDoS mitigation. Documented the trust model (patterns are trusted input; hosts must validate user-supplied patterns upstream). - Document the wildcard-only semantics on query-less events: a non-empty pattern on an event that doesn't supply a matchQuery (RunStart, Stop, etc.) never matches. - Document the `once: true` concurrent-dispatch limitation: two parallel `executeHooks` calls each snapshot the matcher list, so `once` means "once per call", not "once globally". Matches Claude Code's semantics. Added a test that pins this behavior. - Merge `matchers.filter(...)` + task-build loop into a single pass per AGENTS.md "consolidate sequential O(n) operations." - Scope `eslint-disable no-console` to the single `console.warn` call in `reportErrors` instead of disabling it file-wide. - Fix the timeout-ignoring-hook test: clear the dangling `setTimeout` and assert the error surfaces an abort-shaped label. - Add tests: multi-hook `preventContinuation` (first-writer-wins on stopReason, flag without reason), `updatedOutput` flow + registration order, registration-order determinism for `updatedInput`, pattern length bound, compilation cache hit/miss/clear, concurrent `once` dispatch. Total: 47 -> 59 tests. Findings audited and resolved: - #1 ReDoS: cache + length bound + trust model doc (full fix blocked on host-side validation) - #2 Wrong updatedInput ordering JSDoc: fixed - #3 Dead `updatedOutput` type: wired through - #4 Concurrent `once` race: documented + test - #5 `HookMatcher.matcher` naming: renamed to `pattern` - #6 `WideCallback` dead code: rejected as inaccurate (used in `runHook`) - #7 Eslint-disable scope: line-scoped - #8 File-path comments: rejected; matches existing project convention - #9 Two-pass filter: single pass - #10 Regex recompilation: cached - CL-4 Undocumented wildcard semantics: documented - RT-3 Timeout test dangling timer: fixed + error content verified - RT-4 No multi-hook preventContinuation test: added * perf(hooks): atomic once, bounded LRU, ReDoS heuristic, shared signal Rework the phase-1 hooks scaffolding for the multi-tenant scale we actually operate at. These are behavior changes (semantics of `once` tightened, new pattern-rejection path) but none of them are wired into the runtime yet — this PR is still inert — so the blast radius is the hooks directory only. ## Atomic `once: true` Removal now happens synchronously inside `executeHooks`, between `registry.getMatchers` and the first `await`. Node's event loop serialises sync work, so two concurrent `executeHooks` calls can never both observe the same `once` matcher — whichever call runs its sync prefix first consumes it, and the loser sees an empty bucket. Semantic change: `once` is now at-most-one-dispatch, not at-most-one-successful-execution. If every hook in a `once` matcher throws, the matcher is still gone. The old retry-on-failure behavior was Claude Code's model — fine for a single-user CLI, but in a multi-tenant server it opens a window where concurrent bursts can re-dispatch a "once" hook non-idempotently. "At most once, ever" is the correct semantic for a shared registry. Hosts that need retry should register a normal matcher and self-unregister via the callback returned from `registry.register`. Dropped `collectOnceMatchersForRemoval` + its Map/array allocations — the post-hoc collection step is no longer needed. Tests: replaced "keeps the matcher when every hook throws" with "removes the matcher even when every hook in it throws". Added "fires exactly once across 3 concurrent calls" and "fires exactly once across 8-way concurrent dispatch when hooks are slow" (the slow test deliberately uses a 10ms hook to force overlap). ## Bounded LRU regex cache `matchers.ts` was unbounded. Under multi-tenant use where different tenants register different patterns, the cache would leak. Capped at `MAX_CACHE_SIZE = 256` entries with LRU eviction: hits refresh position (delete-then-set), misses at capacity evict the oldest key. Failed compiles are cached the same way so a tenant spamming bad patterns doesn't burn CPU on every call. Exposed `getMatcherCacheSize` and `MAX_CACHE_SIZE` for test observability. Tests verify eviction order (cold patterns recompiled after overflow) and that hot patterns refreshed mid-stream survive eviction pressure. ## ReDoS heuristic: nested-quantifier detector `hasNestedQuantifier` walks a pattern linearly, tracks paren depth, and rejects any pattern where a quantified group contains another quantifier — the `(a+)+`, `(.*)*`, `(\w+)+` shape that is the #1 catastrophic-backtracking source. Character classes (`[*+?]`) and escaped quantifiers (`\+`) are correctly ignored. Rejection happens before `new RegExp`, so adversarial input never reaches the compiler. This is a floor, not a ceiling — it doesn't catch ambiguous-alternation ReDoS like `(a|a)+`. Hosts that accept user-supplied patterns must still validate upstream. Documented in the `HookMatcher.pattern` and `matchesQuery` JSDoc. Test verifies that `(a+)+$` on a 35-char adversarial input resolves in under 50ms (would be seconds without the heuristic). ## Shared `AbortSignal` per matcher Was: one `AbortSignal.timeout(ms)` + one `AbortSignal.any([parent, timeout])` per hook. For a matcher with N hooks and the same timeout, that's N timers + N composite signals. Now: one signal per matcher, shared across all its hooks. Since hooks in a matcher already share `matcher.timeout`, they may as well share the timer. Behavior is identical (all hooks fire and race the same deadline) but allocation drops from N to 1 per matcher. ## Tests 72 passing (was 59, +13): - Atomic once: 2 concurrent tests + 1 failure-removal test updated - LRU: cache size cap, eviction order, hit refresh - Nested quantifier: classic shapes, deep nesting, escaped chars, character classes, false positives on legitimate patterns - ReDoS mitigation: rejection path, <50ms response on adversarial input * fix(hooks): fix ReDoS heuristic false-positives on (?:...) groups Resolves the follow-up review on #87. Four findings, all valid. ## #1 MAJOR — hasNestedQuantifier false-positives on group syntax The previous scanner treated `?` as a quantifier at any depth > 0, but `?` immediately after `(` is group syntax, not a quantifier: `(?:...)`, `(?=...)`, `(?!...)`, `(?<name>...)`, `(?<=...)`, `(?<!...)`. Patterns like `(?:pre_)?tool_name` were silently rejected and the registering hook never fired — with no error surfaced to the caller. Fix: explicit group-syntax prefix skip inside the `(` handler. When the scanner enters a group it peeks for `?` and advances past the modifier characters (`:`, `=`, `!`, `<=`, `<!`, or a named-group label `<name>`) before processing the group body. The `?` is therefore never considered a quantifier at the start of a group. While in there I also fixed a second correctness bug the reviewer didn't catch: the old depth-indexed array didn't propagate a child group's "contains quantifier" signal up to its parent, so `(?:(a+))+` (outer quantifier over a wrapped quantified group, equivalent to `(a+)+`) escaped detection. The scanner now uses an explicit stack of `QuantifierFrame` records, and when a child group closes it propagates `hasBacktrackRisk` to the parent frame — whether the child itself was quantified or not. This catches `(?:(a+))+` correctly while still allowing non-risky wrappers like `(?:(ab))+`. Added 9 tests covering non-capturing groups, lookaheads, lookbehinds, named captures, `(?:(a+))+` risk propagation, and `((ab)+)+` deep wrapping. Verified existing ReDoS-detection tests still pass. ## #2 MINOR — executeHooks.test.ts did not clear the pattern cache Matcher cache is module-level. `matchers.test.ts` clears it in `beforeEach`; `executeHooks.test.ts` did not, so patterns compiled during one test persisted across subsequent tests in the same file. Added `clearMatcherCache()` to `executeHooks`'s `beforeEach` — no test failures today, but restores test independence. ## #3 MINOR — Test utilities leaked into production barrel `clearMatcherCache` and `getMatcherCacheSize` are test-only helpers (their JSDoc says so) but were exported from `src/hooks/index.ts`. When the integration PR eventually re-exports `src/hooks` from `src/index.ts` they would become public API. Removed both from the barrel. Tests already import them directly from `../matchers`, so no test changes needed. `hasNestedQuantifier`, `MAX_PATTERN_LENGTH`, and `MAX_CACHE_SIZE` remain exported — hosts can legitimately use them for upstream validation. ## #4 NIT — LRU refresh overhead at low cache pressure Every `compile()` cache hit was doing `delete + set` to refresh LRU position, even when the cache was nowhere near capacity. With `MAX_CACHE_SIZE = 256` and typical working sets of tens of patterns, eviction pressure is near-zero and the refresh is pure overhead. Added a `LRU_REFRESH_THRESHOLD = 75%` gate: refresh only runs when the cache is above 192 entries. Below that the code fast-paths straight back from `compile()`. Above it the LRU semantics kick in so hot patterns still survive evictions — the existing "refreshes LRU position on hit" test still passes because by the time it runs the check, the cache is at capacity. Tests: 72 -> 81 (+9 for group-syntax and risk-propagation coverage).

* feat: wire tool hooks into event-driven ToolNode (phase 1/3) Fires PreToolUse, PostToolUse, PostToolUseFailure, and PermissionDenied hooks inside `ToolNode.dispatchToolEvents` — the single chokepoint for all event-driven tool execution in LibreChat. ## Hook lifecycle in dispatchToolEvents 1. **PreToolUse** fires per call in parallel before ON_TOOL_EXECUTE dispatch. Denied calls (deny or ask) produce error ToolMessages and fire PermissionDenied; surviving calls proceed with optional updatedInput applied to tool args. 2. Only approved calls are batched into ON_TOOL_EXECUTE and sent to the host. If all calls are denied, the batch dispatch is skipped entirely. 3. **PostToolUse** fires per successful result. If a hook returns updatedOutput, the ToolMessage content is replaced before the message is appended to the graph. 4. **PostToolUseFailure** fires per error result. Observational only — cannot modify the error message. Post hooks are awaited (not fire-and-forget) so updatedOutput takes effect before the ToolMessage is consumed. Hook errors are swallowed via .catch() to avoid disrupting the tool result flow. PermissionDenied is fire-and-forget since it's purely observational. ## Plumbing - `ToolNodeOptions.hookRegistry?: HookRegistry` — new optional field - `ToolNode` constructor stores `this.hookRegistry` - `Graph.createToolNode()` passes `this.hookRegistry` at both construction sites (event-driven and traditional) - `hookRegistry` is set on Graph BEFORE `createWorkflow()` is called (moved from the post-create assignment in Run constructor into `createLegacyGraph` and `createMultiAgentGraph`) so ToolNode receives a non-undefined registry at construction time ## hasHookFor guards All four hook fire points use `hasHookFor(event, runId)` before calling `executeHooks`, so when no hooks are registered for the event the overhead is a single Map lookup + array length check — no Promise allocations, no function calls. ## Tests 9 new integration tests in `src/hooks/__tests__/toolHooks.test.ts` using event-driven mode (toolDefinitions + ON_TOOL_EXECUTE handler): - PreToolUse fires with correct fields - PreToolUse deny blocks execution (tool handler never called) - PreToolUse ask blocks execution (v1) - PreToolUse updatedInput rewrites args before dispatch to host - PreToolUse hook errors are non-fatal (tool still executes) - PermissionDenied fires after deny with reason - PostToolUse fires with tool output - PostToolUseFailure fires on error result - No-hooks baseline works identically Total: 94 existing + 9 new = 103 passing. * fix(hooks): address PR 3 review — step completion, ordering, catch, tests Resolves all 10 findings from the review on #90. ## Structural rewrite of dispatchToolEvents The post-processing loop was rewritten to fix #1, #2, #6, #8, #10: - #1 MAJOR: Denied calls now dispatch ON_RUN_STEP_COMPLETED via a shared `dispatchStepCompleted` helper (extracted from the duplicate logic in the executed-results loop). Without this, the frontend would show stuck spinners for denied tool calls. - #2 MAJOR: Output messages are now collected in a Map keyed by tool_call_id, then returned in the original `toolCalls` input order via `toolCalls.map(call => messageByCallId.get(call.id))`. Before, denied messages were prepended to executed messages, breaking the implicit ordering contract. - #6 MINOR: `toolUsageCount` is now incremented only for approved calls (moved from the initial `preToolCalls.map` into the `approvedEntries.map` that builds ToolCallRequests). Denied calls no longer inflate the turn counter. - #8 MINOR: Eliminated the double `result.status === 'error'` check. ToolMessage is now constructed inside each status branch directly after hook dispatch. - #10 NIT: `requests.find()` replaced with a pre-built `Map<id, ToolCallRequest>` for O(1) lookup per result, per AGENTS.md guidance on Map/Set over Array.find. ## Error protection - #3 MAJOR: PreToolUse `Promise.all` now wraps each `executeHooks` call in `.catch(() => emptyResult)` so an infrastructure error in the hooks subsystem cannot crash the entire tool batch. Matches the stated "hook errors are non-fatal" contract and is consistent with PostToolUse/PostToolUseFailure which already had `.catch()`. ## New tests - #4 MAJOR: Added `updatedOutput replaces the ToolMessage content` — registers a PostToolUse hook returning `{updatedOutput: 'REDACTED'}`, verifies the ToolMessage content in the graph's run messages is `'REDACTED'` (not the original output). - #7 MINOR: PermissionDenied test replaced 50ms `setTimeout` with a `Promise` resolved by the hook callback. No more flaky sleep. ## Cleanup - #9 NIT: Updated stale "once the tool-hook PR lands" comment in `src/hooks/index.ts`. Tests: 103 -> 104 (+1 for updatedOutput). * fix(hooks): off-by-one in dispatchStepCompleted index, test guard, turn fallback - #1 MAJOR: dispatchStepCompleted read post-increment toolUsageCount for the `index` field. Added `turn?: number` parameter; approved-call site passes `request?.turn` (the pre-increment value), denied-call site omits it (fallback is correct since count was never bumped). - #2 NIT: updatedOutput test now asserts `expect(toolMsg).toBeDefined()` before content extraction so a missing tool message produces a clear failure rather than "Expected undefined to be 'REDACTED'". - #3 NIT: PreToolUse input `turn` field restored `?? 0` fallback so first-use tools see `turn: 0` instead of `turn: undefined`. * fix(hooks): batch tests, turn JSDoc, freeze sentinel, remove unused hookRegistry Addresses follow-up review findings on #90. - #1 MAJOR: Add multi-call batch tests — partial deny (echo denied, math approved, order preserved) and all-denied (ON_TOOL_EXECUTE handler never called). Uses counter-based IDs (#7 fix) to avoid Date.now() collisions across calls in the same test. - #2 MINOR: Document PreToolUseHookInput.turn semantics — within a single batch, all calls to the same tool share the same turn value. Per-call discrimination within a batch is not supported in v1. - #3 MINOR: Remove hookRegistry from the traditional (non-event-driven) ToolNode construction site in Graph.ts. Hooks only fire in dispatchToolEvents; passing the registry to a ToolNode that never uses it created a false impression of hook support. - #4 MINOR: Add PostToolUse error non-fatality test — throwing hook does not crash the batch, original content is preserved. - #5+#8 MINOR: Freeze the hook-error fallback sentinel via Object.freeze and rename from emptyResult to HOOK_FALLBACK. Prevents future mutation bugs if someone pushes to the shared arrays. - #6 MINOR: Document in ToolNodeOptions.hookRegistry JSDoc that hooks only fire for event-driven calls — directToolNames bypass dispatch. Tests: 104 -> 107 (+3 for batch partial-deny, all-denied, and PostToolUse error resilience). * nit: simplify test assertions, keep shallow freeze (deep conflicts with mutable type)

* feat: add inert hook registry and executor scaffolding (phase 1/1) Introduces `src/hooks/` with types, registry, and executor for a 12-event hook lifecycle. Purely additive — not exported from `src/index.ts` and not yet wired into `Run`, `Graph`, or `ToolNode`. Integration lands in the next PR so this change is shippable independently with zero behavioral impact on hosts that don't opt in. - Discriminated-union input/output per event; generic `HookCallback<E>` and `HookMatcher<E>` for type-safe registration. - `HookRegistry` uses `Map<sessionId, bucket>` (not `Record`) to avoid O(n^2) churn under parallel registration in multi-tenant hosts. - `executeHooks` fans out in parallel, races hooks against a combined parent + timeout `AbortSignal` so non-listening hooks still time out, folds decisions with `deny > ask > allow` precedence, accumulates `additionalContext`, and self-removes `once: true` matchers after any successful hook. Errors are non-fatal: swallowed into the aggregated result and routed through an optional winston logger (falling back to `console.warn`), with internal matcher errors suppressed. 47 tests under `src/hooks/__tests__/` cover registry scoping, regex matching, precedence folding, once-self-removal (success/failure/mixed), timeout enforcement (including non-listening hooks), error non-fatality, and parent signal combination. * fix(hooks): address review findings — rename field, wire updatedOutput, cache patterns Resolves the comprehensive review on #87: - Rename `HookMatcher.matcher` -> `HookMatcher.pattern` to fix the self-referential naming (`m.matcher` reads as "the matcher's matcher"). - Wire `PostToolUseHookOutput.updatedOutput` through `AggregatedHookResult` and `fold()`. The type was previously a promise the executor didn't fulfill — a hook returning `{ updatedOutput: ... }` was silently dropped. - Correct the JSDoc on `updatedInput`/`updatedOutput`. The previous copy claimed non-determinism "in Promise.all resolution order" — but `Promise.all` preserves input order, so the fold is deterministic in registration order (outer loop over matchers, inner loop over each matcher's hooks). Updated the `executeHooks` function docstring to match. - Add regex compilation cache and length bound to `matchers.ts`. Patterns compile at most once per unique string and are reused across calls; patterns over MAX_PATTERN_LENGTH (512) are rejected outright as a cheap ReDoS mitigation. Documented the trust model (patterns are trusted input; hosts must validate user-supplied patterns upstream). - Document the wildcard-only semantics on query-less events: a non-empty pattern on an event that doesn't supply a matchQuery (RunStart, Stop, etc.) never matches. - Document the `once: true` concurrent-dispatch limitation: two parallel `executeHooks` calls each snapshot the matcher list, so `once` means "once per call", not "once globally". Matches Claude Code's semantics. Added a test that pins this behavior. - Merge `matchers.filter(...)` + task-build loop into a single pass per AGENTS.md "consolidate sequential O(n) operations." - Scope `eslint-disable no-console` to the single `console.warn` call in `reportErrors` instead of disabling it file-wide. - Fix the timeout-ignoring-hook test: clear the dangling `setTimeout` and assert the error surfaces an abort-shaped label. - Add tests: multi-hook `preventContinuation` (first-writer-wins on stopReason, flag without reason), `updatedOutput` flow + registration order, registration-order determinism for `updatedInput`, pattern length bound, compilation cache hit/miss/clear, concurrent `once` dispatch. Total: 47 -> 59 tests. Findings audited and resolved: - #1 ReDoS: cache + length bound + trust model doc (full fix blocked on host-side validation) - #2 Wrong updatedInput ordering JSDoc: fixed - #3 Dead `updatedOutput` type: wired through - #4 Concurrent `once` race: documented + test - #5 `HookMatcher.matcher` naming: renamed to `pattern` - #6 `WideCallback` dead code: rejected as inaccurate (used in `runHook`) - #7 Eslint-disable scope: line-scoped - #8 File-path comments: rejected; matches existing project convention - #9 Two-pass filter: single pass - #10 Regex recompilation: cached - CL-4 Undocumented wildcard semantics: documented - RT-3 Timeout test dangling timer: fixed + error content verified - RT-4 No multi-hook preventContinuation test: added * perf(hooks): atomic once, bounded LRU, ReDoS heuristic, shared signal Rework the phase-1 hooks scaffolding for the multi-tenant scale we actually operate at. These are behavior changes (semantics of `once` tightened, new pattern-rejection path) but none of them are wired into the runtime yet — this PR is still inert — so the blast radius is the hooks directory only. ## Atomic `once: true` Removal now happens synchronously inside `executeHooks`, between `registry.getMatchers` and the first `await`. Node's event loop serialises sync work, so two concurrent `executeHooks` calls can never both observe the same `once` matcher — whichever call runs its sync prefix first consumes it, and the loser sees an empty bucket. Semantic change: `once` is now at-most-one-dispatch, not at-most-one-successful-execution. If every hook in a `once` matcher throws, the matcher is still gone. The old retry-on-failure behavior was Claude Code's model — fine for a single-user CLI, but in a multi-tenant server it opens a window where concurrent bursts can re-dispatch a "once" hook non-idempotently. "At most once, ever" is the correct semantic for a shared registry. Hosts that need retry should register a normal matcher and self-unregister via the callback returned from `registry.register`. Dropped `collectOnceMatchersForRemoval` + its Map/array allocations — the post-hoc collection step is no longer needed. Tests: replaced "keeps the matcher when every hook throws" with "removes the matcher even when every hook in it throws". Added "fires exactly once across 3 concurrent calls" and "fires exactly once across 8-way concurrent dispatch when hooks are slow" (the slow test deliberately uses a 10ms hook to force overlap). ## Bounded LRU regex cache `matchers.ts` was unbounded. Under multi-tenant use where different tenants register different patterns, the cache would leak. Capped at `MAX_CACHE_SIZE = 256` entries with LRU eviction: hits refresh position (delete-then-set), misses at capacity evict the oldest key. Failed compiles are cached the same way so a tenant spamming bad patterns doesn't burn CPU on every call. Exposed `getMatcherCacheSize` and `MAX_CACHE_SIZE` for test observability. Tests verify eviction order (cold patterns recompiled after overflow) and that hot patterns refreshed mid-stream survive eviction pressure. ## ReDoS heuristic: nested-quantifier detector `hasNestedQuantifier` walks a pattern linearly, tracks paren depth, and rejects any pattern where a quantified group contains another quantifier — the `(a+)+`, `(.*)*`, `(\w+)+` shape that is the #1 catastrophic-backtracking source. Character classes (`[*+?]`) and escaped quantifiers (`\+`) are correctly ignored. Rejection happens before `new RegExp`, so adversarial input never reaches the compiler. This is a floor, not a ceiling — it doesn't catch ambiguous-alternation ReDoS like `(a|a)+`. Hosts that accept user-supplied patterns must still validate upstream. Documented in the `HookMatcher.pattern` and `matchesQuery` JSDoc. Test verifies that `(a+)+$` on a 35-char adversarial input resolves in under 50ms (would be seconds without the heuristic). ## Shared `AbortSignal` per matcher Was: one `AbortSignal.timeout(ms)` + one `AbortSignal.any([parent, timeout])` per hook. For a matcher with N hooks and the same timeout, that's N timers + N composite signals. Now: one signal per matcher, shared across all its hooks. Since hooks in a matcher already share `matcher.timeout`, they may as well share the timer. Behavior is identical (all hooks fire and race the same deadline) but allocation drops from N to 1 per matcher. ## Tests 72 passing (was 59, +13): - Atomic once: 2 concurrent tests + 1 failure-removal test updated - LRU: cache size cap, eviction order, hit refresh - Nested quantifier: classic shapes, deep nesting, escaped chars, character classes, false positives on legitimate patterns - ReDoS mitigation: rejection path, <50ms response on adversarial input * fix(hooks): fix ReDoS heuristic false-positives on (?:...) groups Resolves the follow-up review on #87. Four findings, all valid. ## #1 MAJOR — hasNestedQuantifier false-positives on group syntax The previous scanner treated `?` as a quantifier at any depth > 0, but `?` immediately after `(` is group syntax, not a quantifier: `(?:...)`, `(?=...)`, `(?!...)`, `(?<name>...)`, `(?<=...)`, `(?<!...)`. Patterns like `(?:pre_)?tool_name` were silently rejected and the registering hook never fired — with no error surfaced to the caller. Fix: explicit group-syntax prefix skip inside the `(` handler. When the scanner enters a group it peeks for `?` and advances past the modifier characters (`:`, `=`, `!`, `<=`, `<!`, or a named-group label `<name>`) before processing the group body. The `?` is therefore never considered a quantifier at the start of a group. While in there I also fixed a second correctness bug the reviewer didn't catch: the old depth-indexed array didn't propagate a child group's "contains quantifier" signal up to its parent, so `(?:(a+))+` (outer quantifier over a wrapped quantified group, equivalent to `(a+)+`) escaped detection. The scanner now uses an explicit stack of `QuantifierFrame` records, and when a child group closes it propagates `hasBacktrackRisk` to the parent frame — whether the child itself was quantified or not. This catches `(?:(a+))+` correctly while still allowing non-risky wrappers like `(?:(ab))+`. Added 9 tests covering non-capturing groups, lookaheads, lookbehinds, named captures, `(?:(a+))+` risk propagation, and `((ab)+)+` deep wrapping. Verified existing ReDoS-detection tests still pass. ## #2 MINOR — executeHooks.test.ts did not clear the pattern cache Matcher cache is module-level. `matchers.test.ts` clears it in `beforeEach`; `executeHooks.test.ts` did not, so patterns compiled during one test persisted across subsequent tests in the same file. Added `clearMatcherCache()` to `executeHooks`'s `beforeEach` — no test failures today, but restores test independence. ## #3 MINOR — Test utilities leaked into production barrel `clearMatcherCache` and `getMatcherCacheSize` are test-only helpers (their JSDoc says so) but were exported from `src/hooks/index.ts`. When the integration PR eventually re-exports `src/hooks` from `src/index.ts` they would become public API. Removed both from the barrel. Tests already import them directly from `../matchers`, so no test changes needed. `hasNestedQuantifier`, `MAX_PATTERN_LENGTH`, and `MAX_CACHE_SIZE` remain exported — hosts can legitimately use them for upstream validation. ## #4 NIT — LRU refresh overhead at low cache pressure Every `compile()` cache hit was doing `delete + set` to refresh LRU position, even when the cache was nowhere near capacity. With `MAX_CACHE_SIZE = 256` and typical working sets of tens of patterns, eviction pressure is near-zero and the refresh is pure overhead. Added a `LRU_REFRESH_THRESHOLD = 75%` gate: refresh only runs when the cache is above 192 entries. Below that the code fast-paths straight back from `compile()`. Above it the LRU semantics kick in so hot patterns still survive evictions — the existing "refreshes LRU position on hit" test still passes because by the time it runs the check, the cache is at capacity. Tests: 72 -> 81 (+9 for group-syntax and risk-propagation coverage).

* feat: wire tool hooks into event-driven ToolNode (phase 1/3) Fires PreToolUse, PostToolUse, PostToolUseFailure, and PermissionDenied hooks inside `ToolNode.dispatchToolEvents` — the single chokepoint for all event-driven tool execution in LibreChat. ## Hook lifecycle in dispatchToolEvents 1. **PreToolUse** fires per call in parallel before ON_TOOL_EXECUTE dispatch. Denied calls (deny or ask) produce error ToolMessages and fire PermissionDenied; surviving calls proceed with optional updatedInput applied to tool args. 2. Only approved calls are batched into ON_TOOL_EXECUTE and sent to the host. If all calls are denied, the batch dispatch is skipped entirely. 3. **PostToolUse** fires per successful result. If a hook returns updatedOutput, the ToolMessage content is replaced before the message is appended to the graph. 4. **PostToolUseFailure** fires per error result. Observational only — cannot modify the error message. Post hooks are awaited (not fire-and-forget) so updatedOutput takes effect before the ToolMessage is consumed. Hook errors are swallowed via .catch() to avoid disrupting the tool result flow. PermissionDenied is fire-and-forget since it's purely observational. ## Plumbing - `ToolNodeOptions.hookRegistry?: HookRegistry` — new optional field - `ToolNode` constructor stores `this.hookRegistry` - `Graph.createToolNode()` passes `this.hookRegistry` at both construction sites (event-driven and traditional) - `hookRegistry` is set on Graph BEFORE `createWorkflow()` is called (moved from the post-create assignment in Run constructor into `createLegacyGraph` and `createMultiAgentGraph`) so ToolNode receives a non-undefined registry at construction time ## hasHookFor guards All four hook fire points use `hasHookFor(event, runId)` before calling `executeHooks`, so when no hooks are registered for the event the overhead is a single Map lookup + array length check — no Promise allocations, no function calls. ## Tests 9 new integration tests in `src/hooks/__tests__/toolHooks.test.ts` using event-driven mode (toolDefinitions + ON_TOOL_EXECUTE handler): - PreToolUse fires with correct fields - PreToolUse deny blocks execution (tool handler never called) - PreToolUse ask blocks execution (v1) - PreToolUse updatedInput rewrites args before dispatch to host - PreToolUse hook errors are non-fatal (tool still executes) - PermissionDenied fires after deny with reason - PostToolUse fires with tool output - PostToolUseFailure fires on error result - No-hooks baseline works identically Total: 94 existing + 9 new = 103 passing. * fix(hooks): address PR 3 review — step completion, ordering, catch, tests Resolves all 10 findings from the review on #90. ## Structural rewrite of dispatchToolEvents The post-processing loop was rewritten to fix #1, #2, #6, #8, #10: - #1 MAJOR: Denied calls now dispatch ON_RUN_STEP_COMPLETED via a shared `dispatchStepCompleted` helper (extracted from the duplicate logic in the executed-results loop). Without this, the frontend would show stuck spinners for denied tool calls. - #2 MAJOR: Output messages are now collected in a Map keyed by tool_call_id, then returned in the original `toolCalls` input order via `toolCalls.map(call => messageByCallId.get(call.id))`. Before, denied messages were prepended to executed messages, breaking the implicit ordering contract. - #6 MINOR: `toolUsageCount` is now incremented only for approved calls (moved from the initial `preToolCalls.map` into the `approvedEntries.map` that builds ToolCallRequests). Denied calls no longer inflate the turn counter. - #8 MINOR: Eliminated the double `result.status === 'error'` check. ToolMessage is now constructed inside each status branch directly after hook dispatch. - #10 NIT: `requests.find()` replaced with a pre-built `Map<id, ToolCallRequest>` for O(1) lookup per result, per AGENTS.md guidance on Map/Set over Array.find. ## Error protection - #3 MAJOR: PreToolUse `Promise.all` now wraps each `executeHooks` call in `.catch(() => emptyResult)` so an infrastructure error in the hooks subsystem cannot crash the entire tool batch. Matches the stated "hook errors are non-fatal" contract and is consistent with PostToolUse/PostToolUseFailure which already had `.catch()`. ## New tests - #4 MAJOR: Added `updatedOutput replaces the ToolMessage content` — registers a PostToolUse hook returning `{updatedOutput: 'REDACTED'}`, verifies the ToolMessage content in the graph's run messages is `'REDACTED'` (not the original output). - #7 MINOR: PermissionDenied test replaced 50ms `setTimeout` with a `Promise` resolved by the hook callback. No more flaky sleep. ## Cleanup - #9 NIT: Updated stale "once the tool-hook PR lands" comment in `src/hooks/index.ts`. Tests: 103 -> 104 (+1 for updatedOutput). * fix(hooks): off-by-one in dispatchStepCompleted index, test guard, turn fallback - #1 MAJOR: dispatchStepCompleted read post-increment toolUsageCount for the `index` field. Added `turn?: number` parameter; approved-call site passes `request?.turn` (the pre-increment value), denied-call site omits it (fallback is correct since count was never bumped). - #2 NIT: updatedOutput test now asserts `expect(toolMsg).toBeDefined()` before content extraction so a missing tool message produces a clear failure rather than "Expected undefined to be 'REDACTED'". - #3 NIT: PreToolUse input `turn` field restored `?? 0` fallback so first-use tools see `turn: 0` instead of `turn: undefined`. * fix(hooks): batch tests, turn JSDoc, freeze sentinel, remove unused hookRegistry Addresses follow-up review findings on #90. - #1 MAJOR: Add multi-call batch tests — partial deny (echo denied, math approved, order preserved) and all-denied (ON_TOOL_EXECUTE handler never called). Uses counter-based IDs (#7 fix) to avoid Date.now() collisions across calls in the same test. - #2 MINOR: Document PreToolUseHookInput.turn semantics — within a single batch, all calls to the same tool share the same turn value. Per-call discrimination within a batch is not supported in v1. - #3 MINOR: Remove hookRegistry from the traditional (non-event-driven) ToolNode construction site in Graph.ts. Hooks only fire in dispatchToolEvents; passing the registry to a ToolNode that never uses it created a false impression of hook support. - #4 MINOR: Add PostToolUse error non-fatality test — throwing hook does not crash the batch, original content is preserved. - #5+#8 MINOR: Freeze the hook-error fallback sentinel via Object.freeze and rename from emptyResult to HOOK_FALLBACK. Prevents future mutation bugs if someone pushes to the shared arrays. - #6 MINOR: Document in ToolNodeOptions.hookRegistry JSDoc that hooks only fire for event-driven calls — directToolNames bypass dispatch. Tests: 104 -> 107 (+3 for batch partial-deny, all-denied, and PostToolUse error resilience). * nit: simplify test assertions, keep shallow freeze (deep conflicts with mutable type)

Address 12 valid findings from the audit — 3 MAJOR, 6 MINOR, 3 NIT: MAJOR fixes: 1. Depth tracking now works across graph boundaries. Previously the `depth`/`maxDepth` check only worked within a single executor; when `allowNested: true`, the child graph spawned its own executor at depth=0, making `maxSubagentDepth` non-functional. Fix: decrement `maxSubagentDepth` in `buildChildInputs`, so the child's own executor inherits a smaller budget and naturally blocks further nesting when it reaches 0. Removed the now-vestigial `depth` field. 2. Detect duplicate `SubagentConfig.type` values. `Map` construction silently dropped duplicate keys. Now `resolveSubagentConfigs` throws with a clear error. 3. Add test coverage for `allowNested: true`: verifies depth is decremented across boundaries, child inherits subagentConfigs, and budget is clamped to >= 0. MINOR fixes: 4. Runtime guard for empty `description` in the tool callback — the LLM can send undefined/empty values; now we default to a safe fallback string. 5. Updated `src/hooks/index.ts` barrel comment to list SubagentExecutor as the fourth hook consumer (SubagentStart/SubagentStop). 6. Extracted property description strings as constants in SubagentTool.ts; `SubagentToolSchema` and `buildSubagentToolParams` now share a single source of truth. 7. Changed SubagentStop from fire-and-forget to awaited, matching the PostCompact pattern. Removes fragile `setTimeout` synchronization in tests. Errors still swallowed (observational). 8. Truncate error messages from child graph to 200 chars to prevent leaking internal details (stack traces, API errors) to the parent LLM. 9. Document that `ask` decision is treated identically to `deny` in the non-interactive subagent context. NIT fixes: 11. Import order in SubagentExecutor.ts — `@/types` (multi-line, longest) now comes first in the local types sub-group. 12. Removed unused `contentParts` destructuring in verification script. 13. Fixed unsafe `msg.content as string` cast in verification script; now handles array content correctly. Skipped #10 (NIT): `_sourceInputs` underscore prefix is an accepted TS convention for cross-file-accessible-but-semantically-internal fields. Test count: 1074 passed (up from 1065, +9 new tests), 0 regressions.

@remarks

#104) * feat: add subagent-as-tool primitive for hierarchical agent delegation Adds a SubagentTool that lets a parent agent delegate tasks to child agents with isolated context windows. Verbose tool outputs stay in the child's context; only a filtered text summary returns to the parent. Key components: - SubagentConfig type on AgentInputs for declaring child agent types - SubagentExecutor: creates child StandardGraph, invokes with isolated state, filters result (strips tool_use/thinking blocks), fires SubagentStart/SubagentStop hooks - SubagentTool: LCTool definition with dynamic enum from configs - Graph integration: tool created in createAgentNode() via graphTools, automatically becomes a direct tool (bypasses event-driven dispatch) - Self-spawn support: reuse parent's AgentInputs for context isolation without separate agent config Wires the existing SubagentStart (with deny support) and SubagentStop hook types that were defined but had zero fire points. * fix: resolve review findings for subagent primitive Address all 6 issues from PR review: 1. Add subagentHooks.test.ts — end-to-end hook integration test through real Run pipeline (SubagentStart payload, SubagentStop messages, deny blocks execution). 3 new tests. 2. Remove `as unknown as` cast — follow MultiAgentGraph pattern (init graphTools as empty array, push directly). 3. Deduplicate schema — extract buildSubagentToolParams() in SubagentTool.ts, use it in Graph.ts. createSubagentToolDefinition() now delegates to it. Single source of truth for schema/description. 4. Propagate threadId — tool callback extracts thread_id from LangGraph config.configurable, passes to executor.execute(). Hooks now receive correct threadId. 5. Pass tokenCounter to child graph — added to SubagentExecutorOptions, threaded from agentContext.tokenCounter through to child StandardGraph constructor. 6. Pass signal in invoke() config — workflow.invoke() now receives signal for more responsive abort between graph steps. * fix: resolve second-pass audit findings for subagent primitive Address 12 valid findings from the audit — 3 MAJOR, 6 MINOR, 3 NIT: MAJOR fixes: 1. Depth tracking now works across graph boundaries. Previously the `depth`/`maxDepth` check only worked within a single executor; when `allowNested: true`, the child graph spawned its own executor at depth=0, making `maxSubagentDepth` non-functional. Fix: decrement `maxSubagentDepth` in `buildChildInputs`, so the child's own executor inherits a smaller budget and naturally blocks further nesting when it reaches 0. Removed the now-vestigial `depth` field. 2. Detect duplicate `SubagentConfig.type` values. `Map` construction silently dropped duplicate keys. Now `resolveSubagentConfigs` throws with a clear error. 3. Add test coverage for `allowNested: true`: verifies depth is decremented across boundaries, child inherits subagentConfigs, and budget is clamped to >= 0. MINOR fixes: 4. Runtime guard for empty `description` in the tool callback — the LLM can send undefined/empty values; now we default to a safe fallback string. 5. Updated `src/hooks/index.ts` barrel comment to list SubagentExecutor as the fourth hook consumer (SubagentStart/SubagentStop). 6. Extracted property description strings as constants in SubagentTool.ts; `SubagentToolSchema` and `buildSubagentToolParams` now share a single source of truth. 7. Changed SubagentStop from fire-and-forget to awaited, matching the PostCompact pattern. Removes fragile `setTimeout` synchronization in tests. Errors still swallowed (observational). 8. Truncate error messages from child graph to 200 chars to prevent leaking internal details (stack traces, API errors) to the parent LLM. 9. Document that `ask` decision is treated identically to `deny` in the non-interactive subagent context. NIT fixes: 11. Import order in SubagentExecutor.ts — `@/types` (multi-line, longest) now comes first in the local types sub-group. 12. Removed unused `contentParts` destructuring in verification script. 13. Fixed unsafe `msg.content as string` cast in verification script; now handles array content correctly. Skipped #10 (NIT): `_sourceInputs` underscore prefix is an accepted TS convention for cross-file-accessible-but-semantically-internal fields. Test count: 1074 passed (up from 1065, +9 new tests), 0 regressions. * fix: address third-pass audit findings for subagent primitive Resolve 3 NITs from the second review pass plus 2 new codex comments: NIT #1 — Tighten truncation test: previously asserted only `result.content.length < 500`. Now asserts exact envelope of 219 chars ("Subagent error: " prefix + 200 body + "..." suffix) so that regressions in ERROR_MESSAGE_MAX_CHARS are caught. Added companion test verifying short messages are not truncated. NIT #2 — Skip tool creation when maxSubagentDepth: 0. When a host set `maxSubagentDepth: 0` alongside `subagentConfigs`, the tool was still bound to the LLM, and every invocation returned the depth- exceeded error. The LLM wasted a turn. Now the `createAgentNode` guard includes `effectiveSubagentDepth > 0`, so the tool is not created at the leaf level. Added integration test. NIT #3 — Document buildChildInputs audience. Added @remarks block explaining this is an advanced utility exported primarily for testing and internal use; hosts configuring subagents should not call it directly. Codex P1 (false positive, clarified): maxSubagentDepth propagation across graph boundaries was already correct via the countdown in buildChildInputs, but it wasn't obvious from the call site. Added a multi-line JSDoc comment above the executor construction in createAgentNode explaining how depth consumes across boundaries through the decremented AgentInputs.maxSubagentDepth. Codex P2 (real bug, fixed): toolSchemaTokens did not include the subagent tool's schema. `calculateInstructionTokens()` was called in `fromConfig()` before graphTools was populated, so the later- injected subagent tool's schema was missing from the budget. Fixes: 1. Include `graphTools` in the iteration in `calculateInstructionTokens` (also helps handoff tools, a pre-existing issue with the same root cause). 2. Re-trigger the calculation at the end of `createAgentNode` after the subagent tool is pushed, so the token count reflects the final graphTools state before `createCallModel` awaits `tokenCalculationPromise`. Added test asserting `toolSchemaTokens` is larger for agents with subagents vs without, confirming the recalculation runs. Test count: 1077 passed (up from 1074, +3 new tests), 0 regressions. * fix: break circular dep between SubagentExecutor and StandardGraph Rollup warned on build: Graph.ts → subagent/index.ts → SubagentExecutor.ts → Graph.ts SubagentExecutor needed `StandardGraph` at runtime to construct the child graph; Graph.ts needs SubagentExecutor to attach the tool. With preserveModules, these land in separate chunks producing circular chunk imports with undefined execution order. Fix: dependency injection. SubagentExecutor now takes a required `createChildGraph: ChildGraphFactory` option and only imports `StandardGraph` as `import type` (erased at build). Graph.ts passes `(input) => new StandardGraph(input)` when constructing the executor. Runtime module graph is now acyclic; `import type` does not appear in emitted JS. Test updates: replaced `jest.spyOn(GraphModule, 'StandardGraph')` (which relied on the runtime import we just removed) with direct mock factories passed through the new option. Cleaner — tests no longer need to reach into module internals. Same coverage. Build: `npx rollup -c` now emits zero warnings. Test count: 1077 passed, 0 regressions. * chore: add multi-agent-subagent script to package.json Introduced a new script entry for multi-agent-subagent, enabling execution of the corresponding TypeScript file with necessary configurations. This addition enhances the multi-agent functionality by allowing for more granular control over subagent operations. * fix: isolate child subagent from parent's callback chain The manual verification script (`npm run multi-agent-subagent`) crashed with: Error: No agent context found for agent ID researcher at StandardGraph.getAgentContext at ChatModelStreamHandler.handle Root cause: `Run.processStream` drives the parent via `streamEvents`, which captures events from every nested runnable in the call tree — including the child graph's LLM calls. When the child's "researcher" agent streamed `on_chat_model_stream` events, the parent's `ChatModelStreamHandler` received them and tried to resolve the child's agent ID in the parent's `agentContexts` map, which only contains the supervisor. Error thrown. The Phase 1 plan was for child events to stay fully isolated. Unit tests missed this because the child graph was always mocked (no real `streamEvents` plumbing); only running the full pipeline end-to-end surfaced it. Fix: pass `callbacks: []` in the child `workflow.invoke()` config. This overrides the inherited callbacks for that invocation, breaking the event propagation chain. Also set `runName: subagent:<type>` and a child-scoped `configurable.thread_id` so the child has a distinct trace root rather than polluting the parent's trace. Manual verification now works: supervisor delegates to researcher, receives filtered text, synthesizes the final answer. Test suite: 1079 passed, 0 subagent-suite regressions. The two newly failing suites (ProgrammaticToolCalling / ToolSearch Live API tests) are pre-existing — they require `LIBRECHAT_CODE_BASEURL` to be configured and fail identically on the prior commit. They were skipped before because OPENAI_API_KEY was absent. * feat: align filterSubagentResult fallback with Claude Code When a subagent's last AIMessage is pure tool_use (e.g. the model hit maxTurns mid-tool-call), the previous implementation returned the "Task completed" sentinel, discarding any textual progress from earlier turns. Claude Code's agentToolUtils.finalizeAgentTool walks backwards from the last message, and on messages without text content continues scanning earlier AIMessages, surfacing the most recent textual thought instead of the sentinel. This change matches that behavior. The three `return` sites inside the loop that previously exited with the sentinel now `continue` when the AIMessage yields no text; the loop exits with "Task completed" only when no AIMessage in the history contains any text. Concrete example: [0] Human: "What's the capital of France?" [1] AI: "Let me search." + tool_use(search, ...) [2] Tool: "Paris." [3] AI: tool_use(search, ...) <-- maxTurns, no text Before: "Task completed" After: "Let me search." Added two tests — the turn-limit scenario above, and the analogous empty-string-content edge case. All nine existing filterSubagentResult tests still pass unchanged. * fix: update model name for non-Anthropic API to gpt-5.4 Changed the model name used in the multi-agent subagent script from 'gpt-4.1' to 'gpt-5.4' for improved performance and capabilities. This update ensures compatibility with the latest API offerings.

* feat: add inert hook registry and executor scaffolding (phase 1/1) Introduces `src/hooks/` with types, registry, and executor for a 12-event hook lifecycle. Purely additive — not exported from `src/index.ts` and not yet wired into `Run`, `Graph`, or `ToolNode`. Integration lands in the next PR so this change is shippable independently with zero behavioral impact on hosts that don't opt in. - Discriminated-union input/output per event; generic `HookCallback<E>` and `HookMatcher<E>` for type-safe registration. - `HookRegistry` uses `Map<sessionId, bucket>` (not `Record`) to avoid O(n^2) churn under parallel registration in multi-tenant hosts. - `executeHooks` fans out in parallel, races hooks against a combined parent + timeout `AbortSignal` so non-listening hooks still time out, folds decisions with `deny > ask > allow` precedence, accumulates `additionalContext`, and self-removes `once: true` matchers after any successful hook. Errors are non-fatal: swallowed into the aggregated result and routed through an optional winston logger (falling back to `console.warn`), with internal matcher errors suppressed. 47 tests under `src/hooks/__tests__/` cover registry scoping, regex matching, precedence folding, once-self-removal (success/failure/mixed), timeout enforcement (including non-listening hooks), error non-fatality, and parent signal combination. * fix(hooks): address review findings — rename field, wire updatedOutput, cache patterns Resolves the comprehensive review on #87: - Rename `HookMatcher.matcher` -> `HookMatcher.pattern` to fix the self-referential naming (`m.matcher` reads as "the matcher's matcher"). - Wire `PostToolUseHookOutput.updatedOutput` through `AggregatedHookResult` and `fold()`. The type was previously a promise the executor didn't fulfill — a hook returning `{ updatedOutput: ... }` was silently dropped. - Correct the JSDoc on `updatedInput`/`updatedOutput`. The previous copy claimed non-determinism "in Promise.all resolution order" — but `Promise.all` preserves input order, so the fold is deterministic in registration order (outer loop over matchers, inner loop over each matcher's hooks). Updated the `executeHooks` function docstring to match. - Add regex compilation cache and length bound to `matchers.ts`. Patterns compile at most once per unique string and are reused across calls; patterns over MAX_PATTERN_LENGTH (512) are rejected outright as a cheap ReDoS mitigation. Documented the trust model (patterns are trusted input; hosts must validate user-supplied patterns upstream). - Document the wildcard-only semantics on query-less events: a non-empty pattern on an event that doesn't supply a matchQuery (RunStart, Stop, etc.) never matches. - Document the `once: true` concurrent-dispatch limitation: two parallel `executeHooks` calls each snapshot the matcher list, so `once` means "once per call", not "once globally". Matches Claude Code's semantics. Added a test that pins this behavior. - Merge `matchers.filter(...)` + task-build loop into a single pass per AGENTS.md "consolidate sequential O(n) operations." - Scope `eslint-disable no-console` to the single `console.warn` call in `reportErrors` instead of disabling it file-wide. - Fix the timeout-ignoring-hook test: clear the dangling `setTimeout` and assert the error surfaces an abort-shaped label. - Add tests: multi-hook `preventContinuation` (first-writer-wins on stopReason, flag without reason), `updatedOutput` flow + registration order, registration-order determinism for `updatedInput`, pattern length bound, compilation cache hit/miss/clear, concurrent `once` dispatch. Total: 47 -> 59 tests. Findings audited and resolved: - #1 ReDoS: cache + length bound + trust model doc (full fix blocked on host-side validation) - #2 Wrong updatedInput ordering JSDoc: fixed - #3 Dead `updatedOutput` type: wired through - #4 Concurrent `once` race: documented + test - #5 `HookMatcher.matcher` naming: renamed to `pattern` - #6 `WideCallback` dead code: rejected as inaccurate (used in `runHook`) - #7 Eslint-disable scope: line-scoped - #8 File-path comments: rejected; matches existing project convention - #9 Two-pass filter: single pass - #10 Regex recompilation: cached - CL-4 Undocumented wildcard semantics: documented - RT-3 Timeout test dangling timer: fixed + error content verified - RT-4 No multi-hook preventContinuation test: added * perf(hooks): atomic once, bounded LRU, ReDoS heuristic, shared signal Rework the phase-1 hooks scaffolding for the multi-tenant scale we actually operate at. These are behavior changes (semantics of `once` tightened, new pattern-rejection path) but none of them are wired into the runtime yet — this PR is still inert — so the blast radius is the hooks directory only. ## Atomic `once: true` Removal now happens synchronously inside `executeHooks`, between `registry.getMatchers` and the first `await`. Node's event loop serialises sync work, so two concurrent `executeHooks` calls can never both observe the same `once` matcher — whichever call runs its sync prefix first consumes it, and the loser sees an empty bucket. Semantic change: `once` is now at-most-one-dispatch, not at-most-one-successful-execution. If every hook in a `once` matcher throws, the matcher is still gone. The old retry-on-failure behavior was Claude Code's model — fine for a single-user CLI, but in a multi-tenant server it opens a window where concurrent bursts can re-dispatch a "once" hook non-idempotently. "At most once, ever" is the correct semantic for a shared registry. Hosts that need retry should register a normal matcher and self-unregister via the callback returned from `registry.register`. Dropped `collectOnceMatchersForRemoval` + its Map/array allocations — the post-hoc collection step is no longer needed. Tests: replaced "keeps the matcher when every hook throws" with "removes the matcher even when every hook in it throws". Added "fires exactly once across 3 concurrent calls" and "fires exactly once across 8-way concurrent dispatch when hooks are slow" (the slow test deliberately uses a 10ms hook to force overlap). ## Bounded LRU regex cache `matchers.ts` was unbounded. Under multi-tenant use where different tenants register different patterns, the cache would leak. Capped at `MAX_CACHE_SIZE = 256` entries with LRU eviction: hits refresh position (delete-then-set), misses at capacity evict the oldest key. Failed compiles are cached the same way so a tenant spamming bad patterns doesn't burn CPU on every call. Exposed `getMatcherCacheSize` and `MAX_CACHE_SIZE` for test observability. Tests verify eviction order (cold patterns recompiled after overflow) and that hot patterns refreshed mid-stream survive eviction pressure. ## ReDoS heuristic: nested-quantifier detector `hasNestedQuantifier` walks a pattern linearly, tracks paren depth, and rejects any pattern where a quantified group contains another quantifier — the `(a+)+`, `(.*)*`, `(\w+)+` shape that is the #1 catastrophic-backtracking source. Character classes (`[*+?]`) and escaped quantifiers (`\+`) are correctly ignored. Rejection happens before `new RegExp`, so adversarial input never reaches the compiler. This is a floor, not a ceiling — it doesn't catch ambiguous-alternation ReDoS like `(a|a)+`. Hosts that accept user-supplied patterns must still validate upstream. Documented in the `HookMatcher.pattern` and `matchesQuery` JSDoc. Test verifies that `(a+)+$` on a 35-char adversarial input resolves in under 50ms (would be seconds without the heuristic). ## Shared `AbortSignal` per matcher Was: one `AbortSignal.timeout(ms)` + one `AbortSignal.any([parent, timeout])` per hook. For a matcher with N hooks and the same timeout, that's N timers + N composite signals. Now: one signal per matcher, shared across all its hooks. Since hooks in a matcher already share `matcher.timeout`, they may as well share the timer. Behavior is identical (all hooks fire and race the same deadline) but allocation drops from N to 1 per matcher. ## Tests 72 passing (was 59, +13): - Atomic once: 2 concurrent tests + 1 failure-removal test updated - LRU: cache size cap, eviction order, hit refresh - Nested quantifier: classic shapes, deep nesting, escaped chars, character classes, false positives on legitimate patterns - ReDoS mitigation: rejection path, <50ms response on adversarial input * fix(hooks): fix ReDoS heuristic false-positives on (?:...) groups Resolves the follow-up review on #87. Four findings, all valid. ## #1 MAJOR — hasNestedQuantifier false-positives on group syntax The previous scanner treated `?` as a quantifier at any depth > 0, but `?` immediately after `(` is group syntax, not a quantifier: `(?:...)`, `(?=...)`, `(?!...)`, `(?<name>...)`, `(?<=...)`, `(?<!...)`. Patterns like `(?:pre_)?tool_name` were silently rejected and the registering hook never fired — with no error surfaced to the caller. Fix: explicit group-syntax prefix skip inside the `(` handler. When the scanner enters a group it peeks for `?` and advances past the modifier characters (`:`, `=`, `!`, `<=`, `<!`, or a named-group label `<name>`) before processing the group body. The `?` is therefore never considered a quantifier at the start of a group. While in there I also fixed a second correctness bug the reviewer didn't catch: the old depth-indexed array didn't propagate a child group's "contains quantifier" signal up to its parent, so `(?:(a+))+` (outer quantifier over a wrapped quantified group, equivalent to `(a+)+`) escaped detection. The scanner now uses an explicit stack of `QuantifierFrame` records, and when a child group closes it propagates `hasBacktrackRisk` to the parent frame — whether the child itself was quantified or not. This catches `(?:(a+))+` correctly while still allowing non-risky wrappers like `(?:(ab))+`. Added 9 tests covering non-capturing groups, lookaheads, lookbehinds, named captures, `(?:(a+))+` risk propagation, and `((ab)+)+` deep wrapping. Verified existing ReDoS-detection tests still pass. ## #2 MINOR — executeHooks.test.ts did not clear the pattern cache Matcher cache is module-level. `matchers.test.ts` clears it in `beforeEach`; `executeHooks.test.ts` did not, so patterns compiled during one test persisted across subsequent tests in the same file. Added `clearMatcherCache()` to `executeHooks`'s `beforeEach` — no test failures today, but restores test independence. ## #3 MINOR — Test utilities leaked into production barrel `clearMatcherCache` and `getMatcherCacheSize` are test-only helpers (their JSDoc says so) but were exported from `src/hooks/index.ts`. When the integration PR eventually re-exports `src/hooks` from `src/index.ts` they would become public API. Removed both from the barrel. Tests already import them directly from `../matchers`, so no test changes needed. `hasNestedQuantifier`, `MAX_PATTERN_LENGTH`, and `MAX_CACHE_SIZE` remain exported — hosts can legitimately use them for upstream validation. ## #4 NIT — LRU refresh overhead at low cache pressure Every `compile()` cache hit was doing `delete + set` to refresh LRU position, even when the cache was nowhere near capacity. With `MAX_CACHE_SIZE = 256` and typical working sets of tens of patterns, eviction pressure is near-zero and the refresh is pure overhead. Added a `LRU_REFRESH_THRESHOLD = 75%` gate: refresh only runs when the cache is above 192 entries. Below that the code fast-paths straight back from `compile()`. Above it the LRU semantics kick in so hot patterns still survive evictions — the existing "refreshes LRU position on hit" test still passes because by the time it runs the check, the cache is at capacity. Tests: 72 -> 81 (+9 for group-syntax and risk-propagation coverage).

* feat: wire tool hooks into event-driven ToolNode (phase 1/3) Fires PreToolUse, PostToolUse, PostToolUseFailure, and PermissionDenied hooks inside `ToolNode.dispatchToolEvents` — the single chokepoint for all event-driven tool execution in LibreChat. ## Hook lifecycle in dispatchToolEvents 1. **PreToolUse** fires per call in parallel before ON_TOOL_EXECUTE dispatch. Denied calls (deny or ask) produce error ToolMessages and fire PermissionDenied; surviving calls proceed with optional updatedInput applied to tool args. 2. Only approved calls are batched into ON_TOOL_EXECUTE and sent to the host. If all calls are denied, the batch dispatch is skipped entirely. 3. **PostToolUse** fires per successful result. If a hook returns updatedOutput, the ToolMessage content is replaced before the message is appended to the graph. 4. **PostToolUseFailure** fires per error result. Observational only — cannot modify the error message. Post hooks are awaited (not fire-and-forget) so updatedOutput takes effect before the ToolMessage is consumed. Hook errors are swallowed via .catch() to avoid disrupting the tool result flow. PermissionDenied is fire-and-forget since it's purely observational. ## Plumbing - `ToolNodeOptions.hookRegistry?: HookRegistry` — new optional field - `ToolNode` constructor stores `this.hookRegistry` - `Graph.createToolNode()` passes `this.hookRegistry` at both construction sites (event-driven and traditional) - `hookRegistry` is set on Graph BEFORE `createWorkflow()` is called (moved from the post-create assignment in Run constructor into `createLegacyGraph` and `createMultiAgentGraph`) so ToolNode receives a non-undefined registry at construction time ## hasHookFor guards All four hook fire points use `hasHookFor(event, runId)` before calling `executeHooks`, so when no hooks are registered for the event the overhead is a single Map lookup + array length check — no Promise allocations, no function calls. ## Tests 9 new integration tests in `src/hooks/__tests__/toolHooks.test.ts` using event-driven mode (toolDefinitions + ON_TOOL_EXECUTE handler): - PreToolUse fires with correct fields - PreToolUse deny blocks execution (tool handler never called) - PreToolUse ask blocks execution (v1) - PreToolUse updatedInput rewrites args before dispatch to host - PreToolUse hook errors are non-fatal (tool still executes) - PermissionDenied fires after deny with reason - PostToolUse fires with tool output - PostToolUseFailure fires on error result - No-hooks baseline works identically Total: 94 existing + 9 new = 103 passing. * fix(hooks): address PR 3 review — step completion, ordering, catch, tests Resolves all 10 findings from the review on #90. ## Structural rewrite of dispatchToolEvents The post-processing loop was rewritten to fix #1, #2, #6, #8, #10: - #1 MAJOR: Denied calls now dispatch ON_RUN_STEP_COMPLETED via a shared `dispatchStepCompleted` helper (extracted from the duplicate logic in the executed-results loop). Without this, the frontend would show stuck spinners for denied tool calls. - #2 MAJOR: Output messages are now collected in a Map keyed by tool_call_id, then returned in the original `toolCalls` input order via `toolCalls.map(call => messageByCallId.get(call.id))`. Before, denied messages were prepended to executed messages, breaking the implicit ordering contract. - #6 MINOR: `toolUsageCount` is now incremented only for approved calls (moved from the initial `preToolCalls.map` into the `approvedEntries.map` that builds ToolCallRequests). Denied calls no longer inflate the turn counter. - #8 MINOR: Eliminated the double `result.status === 'error'` check. ToolMessage is now constructed inside each status branch directly after hook dispatch. - #10 NIT: `requests.find()` replaced with a pre-built `Map<id, ToolCallRequest>` for O(1) lookup per result, per AGENTS.md guidance on Map/Set over Array.find. ## Error protection - #3 MAJOR: PreToolUse `Promise.all` now wraps each `executeHooks` call in `.catch(() => emptyResult)` so an infrastructure error in the hooks subsystem cannot crash the entire tool batch. Matches the stated "hook errors are non-fatal" contract and is consistent with PostToolUse/PostToolUseFailure which already had `.catch()`. ## New tests - #4 MAJOR: Added `updatedOutput replaces the ToolMessage content` — registers a PostToolUse hook returning `{updatedOutput: 'REDACTED'}`, verifies the ToolMessage content in the graph's run messages is `'REDACTED'` (not the original output). - #7 MINOR: PermissionDenied test replaced 50ms `setTimeout` with a `Promise` resolved by the hook callback. No more flaky sleep. ## Cleanup - #9 NIT: Updated stale "once the tool-hook PR lands" comment in `src/hooks/index.ts`. Tests: 103 -> 104 (+1 for updatedOutput). * fix(hooks): off-by-one in dispatchStepCompleted index, test guard, turn fallback - #1 MAJOR: dispatchStepCompleted read post-increment toolUsageCount for the `index` field. Added `turn?: number` parameter; approved-call site passes `request?.turn` (the pre-increment value), denied-call site omits it (fallback is correct since count was never bumped). - #2 NIT: updatedOutput test now asserts `expect(toolMsg).toBeDefined()` before content extraction so a missing tool message produces a clear failure rather than "Expected undefined to be 'REDACTED'". - #3 NIT: PreToolUse input `turn` field restored `?? 0` fallback so first-use tools see `turn: 0` instead of `turn: undefined`. * fix(hooks): batch tests, turn JSDoc, freeze sentinel, remove unused hookRegistry Addresses follow-up review findings on #90. - #1 MAJOR: Add multi-call batch tests — partial deny (echo denied, math approved, order preserved) and all-denied (ON_TOOL_EXECUTE handler never called). Uses counter-based IDs (#7 fix) to avoid Date.now() collisions across calls in the same test. - #2 MINOR: Document PreToolUseHookInput.turn semantics — within a single batch, all calls to the same tool share the same turn value. Per-call discrimination within a batch is not supported in v1. - #3 MINOR: Remove hookRegistry from the traditional (non-event-driven) ToolNode construction site in Graph.ts. Hooks only fire in dispatchToolEvents; passing the registry to a ToolNode that never uses it created a false impression of hook support. - #4 MINOR: Add PostToolUse error non-fatality test — throwing hook does not crash the batch, original content is preserved. - #5+#8 MINOR: Freeze the hook-error fallback sentinel via Object.freeze and rename from emptyResult to HOOK_FALLBACK. Prevents future mutation bugs if someone pushes to the shared arrays. - #6 MINOR: Document in ToolNodeOptions.hookRegistry JSDoc that hooks only fire for event-driven calls — directToolNames bypass dispatch. Tests: 104 -> 107 (+3 for batch partial-deny, all-denied, and PostToolUse error resilience). * nit: simplify test assertions, keep shallow freeze (deep conflicts with mutable type)

@remarks

#104) * feat: add subagent-as-tool primitive for hierarchical agent delegation Adds a SubagentTool that lets a parent agent delegate tasks to child agents with isolated context windows. Verbose tool outputs stay in the child's context; only a filtered text summary returns to the parent. Key components: - SubagentConfig type on AgentInputs for declaring child agent types - SubagentExecutor: creates child StandardGraph, invokes with isolated state, filters result (strips tool_use/thinking blocks), fires SubagentStart/SubagentStop hooks - SubagentTool: LCTool definition with dynamic enum from configs - Graph integration: tool created in createAgentNode() via graphTools, automatically becomes a direct tool (bypasses event-driven dispatch) - Self-spawn support: reuse parent's AgentInputs for context isolation without separate agent config Wires the existing SubagentStart (with deny support) and SubagentStop hook types that were defined but had zero fire points. * fix: resolve review findings for subagent primitive Address all 6 issues from PR review: 1. Add subagentHooks.test.ts — end-to-end hook integration test through real Run pipeline (SubagentStart payload, SubagentStop messages, deny blocks execution). 3 new tests. 2. Remove `as unknown as` cast — follow MultiAgentGraph pattern (init graphTools as empty array, push directly). 3. Deduplicate schema — extract buildSubagentToolParams() in SubagentTool.ts, use it in Graph.ts. createSubagentToolDefinition() now delegates to it. Single source of truth for schema/description. 4. Propagate threadId — tool callback extracts thread_id from LangGraph config.configurable, passes to executor.execute(). Hooks now receive correct threadId. 5. Pass tokenCounter to child graph — added to SubagentExecutorOptions, threaded from agentContext.tokenCounter through to child StandardGraph constructor. 6. Pass signal in invoke() config — workflow.invoke() now receives signal for more responsive abort between graph steps. * fix: resolve second-pass audit findings for subagent primitive Address 12 valid findings from the audit — 3 MAJOR, 6 MINOR, 3 NIT: MAJOR fixes: 1. Depth tracking now works across graph boundaries. Previously the `depth`/`maxDepth` check only worked within a single executor; when `allowNested: true`, the child graph spawned its own executor at depth=0, making `maxSubagentDepth` non-functional. Fix: decrement `maxSubagentDepth` in `buildChildInputs`, so the child's own executor inherits a smaller budget and naturally blocks further nesting when it reaches 0. Removed the now-vestigial `depth` field. 2. Detect duplicate `SubagentConfig.type` values. `Map` construction silently dropped duplicate keys. Now `resolveSubagentConfigs` throws with a clear error. 3. Add test coverage for `allowNested: true`: verifies depth is decremented across boundaries, child inherits subagentConfigs, and budget is clamped to >= 0. MINOR fixes: 4. Runtime guard for empty `description` in the tool callback — the LLM can send undefined/empty values; now we default to a safe fallback string. 5. Updated `src/hooks/index.ts` barrel comment to list SubagentExecutor as the fourth hook consumer (SubagentStart/SubagentStop). 6. Extracted property description strings as constants in SubagentTool.ts; `SubagentToolSchema` and `buildSubagentToolParams` now share a single source of truth. 7. Changed SubagentStop from fire-and-forget to awaited, matching the PostCompact pattern. Removes fragile `setTimeout` synchronization in tests. Errors still swallowed (observational). 8. Truncate error messages from child graph to 200 chars to prevent leaking internal details (stack traces, API errors) to the parent LLM. 9. Document that `ask` decision is treated identically to `deny` in the non-interactive subagent context. NIT fixes: 11. Import order in SubagentExecutor.ts — `@/types` (multi-line, longest) now comes first in the local types sub-group. 12. Removed unused `contentParts` destructuring in verification script. 13. Fixed unsafe `msg.content as string` cast in verification script; now handles array content correctly. Skipped #10 (NIT): `_sourceInputs` underscore prefix is an accepted TS convention for cross-file-accessible-but-semantically-internal fields. Test count: 1074 passed (up from 1065, +9 new tests), 0 regressions. * fix: address third-pass audit findings for subagent primitive Resolve 3 NITs from the second review pass plus 2 new codex comments: NIT #1 — Tighten truncation test: previously asserted only `result.content.length < 500`. Now asserts exact envelope of 219 chars ("Subagent error: " prefix + 200 body + "..." suffix) so that regressions in ERROR_MESSAGE_MAX_CHARS are caught. Added companion test verifying short messages are not truncated. NIT #2 — Skip tool creation when maxSubagentDepth: 0. When a host set `maxSubagentDepth: 0` alongside `subagentConfigs`, the tool was still bound to the LLM, and every invocation returned the depth- exceeded error. The LLM wasted a turn. Now the `createAgentNode` guard includes `effectiveSubagentDepth > 0`, so the tool is not created at the leaf level. Added integration test. NIT #3 — Document buildChildInputs audience. Added @remarks block explaining this is an advanced utility exported primarily for testing and internal use; hosts configuring subagents should not call it directly. Codex P1 (false positive, clarified): maxSubagentDepth propagation across graph boundaries was already correct via the countdown in buildChildInputs, but it wasn't obvious from the call site. Added a multi-line JSDoc comment above the executor construction in createAgentNode explaining how depth consumes across boundaries through the decremented AgentInputs.maxSubagentDepth. Codex P2 (real bug, fixed): toolSchemaTokens did not include the subagent tool's schema. `calculateInstructionTokens()` was called in `fromConfig()` before graphTools was populated, so the later- injected subagent tool's schema was missing from the budget. Fixes: 1. Include `graphTools` in the iteration in `calculateInstructionTokens` (also helps handoff tools, a pre-existing issue with the same root cause). 2. Re-trigger the calculation at the end of `createAgentNode` after the subagent tool is pushed, so the token count reflects the final graphTools state before `createCallModel` awaits `tokenCalculationPromise`. Added test asserting `toolSchemaTokens` is larger for agents with subagents vs without, confirming the recalculation runs. Test count: 1077 passed (up from 1074, +3 new tests), 0 regressions. * fix: break circular dep between SubagentExecutor and StandardGraph Rollup warned on build: Graph.ts → subagent/index.ts → SubagentExecutor.ts → Graph.ts SubagentExecutor needed `StandardGraph` at runtime to construct the child graph; Graph.ts needs SubagentExecutor to attach the tool. With preserveModules, these land in separate chunks producing circular chunk imports with undefined execution order. Fix: dependency injection. SubagentExecutor now takes a required `createChildGraph: ChildGraphFactory` option and only imports `StandardGraph` as `import type` (erased at build). Graph.ts passes `(input) => new StandardGraph(input)` when constructing the executor. Runtime module graph is now acyclic; `import type` does not appear in emitted JS. Test updates: replaced `jest.spyOn(GraphModule, 'StandardGraph')` (which relied on the runtime import we just removed) with direct mock factories passed through the new option. Cleaner — tests no longer need to reach into module internals. Same coverage. Build: `npx rollup -c` now emits zero warnings. Test count: 1077 passed, 0 regressions. * chore: add multi-agent-subagent script to package.json Introduced a new script entry for multi-agent-subagent, enabling execution of the corresponding TypeScript file with necessary configurations. This addition enhances the multi-agent functionality by allowing for more granular control over subagent operations. * fix: isolate child subagent from parent's callback chain The manual verification script (`npm run multi-agent-subagent`) crashed with: Error: No agent context found for agent ID researcher at StandardGraph.getAgentContext at ChatModelStreamHandler.handle Root cause: `Run.processStream` drives the parent via `streamEvents`, which captures events from every nested runnable in the call tree — including the child graph's LLM calls. When the child's "researcher" agent streamed `on_chat_model_stream` events, the parent's `ChatModelStreamHandler` received them and tried to resolve the child's agent ID in the parent's `agentContexts` map, which only contains the supervisor. Error thrown. The Phase 1 plan was for child events to stay fully isolated. Unit tests missed this because the child graph was always mocked (no real `streamEvents` plumbing); only running the full pipeline end-to-end surfaced it. Fix: pass `callbacks: []` in the child `workflow.invoke()` config. This overrides the inherited callbacks for that invocation, breaking the event propagation chain. Also set `runName: subagent:<type>` and a child-scoped `configurable.thread_id` so the child has a distinct trace root rather than polluting the parent's trace. Manual verification now works: supervisor delegates to researcher, receives filtered text, synthesizes the final answer. Test suite: 1079 passed, 0 subagent-suite regressions. The two newly failing suites (ProgrammaticToolCalling / ToolSearch Live API tests) are pre-existing — they require `LIBRECHAT_CODE_BASEURL` to be configured and fail identically on the prior commit. They were skipped before because OPENAI_API_KEY was absent. * feat: align filterSubagentResult fallback with Claude Code When a subagent's last AIMessage is pure tool_use (e.g. the model hit maxTurns mid-tool-call), the previous implementation returned the "Task completed" sentinel, discarding any textual progress from earlier turns. Claude Code's agentToolUtils.finalizeAgentTool walks backwards from the last message, and on messages without text content continues scanning earlier AIMessages, surfacing the most recent textual thought instead of the sentinel. This change matches that behavior. The three `return` sites inside the loop that previously exited with the sentinel now `continue` when the AIMessage yields no text; the loop exits with "Task completed" only when no AIMessage in the history contains any text. Concrete example: [0] Human: "What's the capital of France?" [1] AI: "Let me search." + tool_use(search, ...) [2] Tool: "Paris." [3] AI: tool_use(search, ...) <-- maxTurns, no text Before: "Task completed" After: "Let me search." Added two tests — the turn-limit scenario above, and the analogous empty-string-content edge case. All nine existing filterSubagentResult tests still pass unchanged. * fix: update model name for non-Anthropic API to gpt-5.4 Changed the model name used in the multi-agent subagent script from 'gpt-4.1' to 'gpt-5.4' for improved performance and capabilities. This update ensures compatibility with the latest API offerings.

…tests Addresses the comprehensive audit of #107: ### MAJOR #1 — ON_TOOL_EXECUTE forwarding coverage Two new tests in the `event forwarding` suite drive the forwarder callback directly with a synthesized `ToolExecuteBatchRequest`: - `routes child ON_TOOL_EXECUTE dispatches through the parent registry` asserts the parent's `ON_TOOL_EXECUTE` handler is invoked, the batch's `resolve` fires with canned results, and the child receives them. - `does NOT forward ON_TOOL_EXECUTE when the parent registry has no handler (safe fallback)` documents the companion case: without a handler the `resolve` never fires, which is why the executor gates `keepToolDefinitions` on handler presence in the first place. ### MINOR #2 — unused `childGraph` parameter Removed from `createForwarderCallback`'s arg shape, destructuring, and call site. It was accepted as `_childGraph` and never read. ### MINOR #3 — `data as never` `EventHandler.handle` didn't include `ToolExecuteBatchRequest` in its accepted data union, which is why the forwarder used `as never` to shut the compiler up. Extended the union and switched the cast to the concrete `ToolExecuteBatchRequest` type — the forwarder is now fully type-checked on the hot path. ### MINOR #4 — error-phase envelope test `emits an 'error' phase envelope when the child graph throws` uses the throwing factory helper already in the file, registers a real `ON_SUBAGENT_UPDATE` handler, and asserts start/error ordering plus the `data.message` payload and `parentToolCallId` passthrough. ### MINOR #6 — dynamic imports in tests Replaced `await import('@/events')` / `await import('@/common')` with static top-level imports. Consistent with the rest of the file. ### MINOR #7 — `summarizeEvent` unit tests Exported the pure function from `@/tools/subagent` and added 10 focused tests covering every branch: tool_calls with one/many names, empty tool_calls, message_creation, ON_TOOL_EXECUTE with/without names, completed steps with/without a tool name, message_delta, and the unknown-event fallback. Every user-visible label is now regression protected. ### MINOR #8 — `config.toolCall` comment Expanded the JSDoc to note the specific LangChain source (`ToolRunnableConfig` in `@langchain/core/tools`, stable since 0.3.x) and that the defensive read degrades to `undefined` on upstream changes. ### NIT #9 — unused import Removed `createContentAggregator` from `subagent-tools-debug.ts`. ### NIT #10 — `awaitHandlers = true` doc Added a JSDoc block explaining the trade-off: required for `ON_TOOL_EXECUTE` to actually block on the host's resolve, but it also serializes observational events through any slow host handlers. Refinement path noted for later if host-side latency becomes a problem. Version bumped to `3.1.67-dev.3`. Tests: 51 pass (40 SubagentExecutor + 10 new summarizeEvent + 1 new error-phase + existing coverage retained).

* feat: forward subagent child events and enable event-driven tools Adds an opt-in event forwarding path for `SubagentExecutor` so host apps that rely on event-driven tool dispatch (`toolDefinitions` + `ON_TOOL_EXECUTE` handler) can use subagents — including self-spawn. Before: a subagent in event-driven mode got `toolDefinitions` stripped in `buildChildInputs` and ran with `callbacks: []`, so the LLM had no tool schemas and ON_TOOL_EXECUTE never reached the host. The subagent replied in prose without invoking any tools. Now: - `SubagentExecutor` accepts `parentHandlerRegistry` (a direct registry or a lazy getter — `Run` wires the registry onto the graph AFTER `createWorkflow`, so `createAgentNode` must capture lazily). - When provided, `buildChildInputs` keeps `toolDefinitions`, and the child's invoke attaches a `BaseCallbackHandler` forwarder that: - routes ON_TOOL_EXECUTE to the parent's handler (event-driven tools work), and - wraps other custom events (run_step, run_step_delta, run_step_completed, message_delta, reasoning_delta) in a new `ON_SUBAGENT_UPDATE` envelope so hosts can render child progress in a separate UI surface. - New `SubagentUpdateEvent` type and `ON_SUBAGENT_UPDATE` GraphEvents entry; start/stop/error envelopes fire around the child invoke. - Legacy isolation is preserved when no registry is passed. Test plan: - `npx jest SubagentExecutor SubagentTool` — 36 pass (4 new covering forwarding, toolDefinition retention, legacy strip, lazy getter). - Integration test against OpenAI (`subagent-event-driven-debug.ts`) confirms the child uses `calculator` via the parent's ON_TOOL_EXECUTE path and the host sees start/run_step/run_step_completed/stop ON_SUBAGENT_UPDATE envelopes. * fix: gate child toolDefinitions on ON_TOOL_EXECUTE handler + propagate tool_call_id Two follow-ups from review on #107: 1. Codex P1: `SubagentExecutor` was treating any non-null parent registry as "forwarding enabled" and always kept `toolDefinitions` on the child. Because `Run.create` always constructs a `HandlerRegistry`, this meant configurations with tools declared but no `ON_TOOL_EXECUTE` handler would hang forever — the child's `ToolNode` would dispatch a batch request and the forwarder would find no handler to resolve or reject it, leaving the promise pending. Now `keepToolDefinitions` is gated on `parentRegistry.getHandler(ON_TOOL_EXECUTE) != null`. A registry without the handler falls back to the legacy "strip toolDefinitions" path, which is a recoverable no-tools state instead of a hang. 2. `SubagentUpdateEvent` now carries `parentToolCallId`. `Graph.ts` pulls it from `config.toolCall.id` (LangChain threads the originating `ToolCall` onto `RunnableConfig`) when the subagent tool is invoked. This lets hosts correlate subagent updates to the parent tool call deterministically instead of inferring by event ordering — a concern raised on the LibreChat PR using this event stream. Also bumps to `3.1.67-dev.2`. Tests: `npx jest SubagentExecutor` — 38 pass (2 new covering the handler gate and `parentToolCallId` propagation). Integration script verified against real OpenAI: every ON_SUBAGENT_UPDATE envelope now carries the parent's `tool_call_id`. * chore: address audit findings — drop unused param, tighten cast, add tests Addresses the comprehensive audit of #107: ### MAJOR #1 — ON_TOOL_EXECUTE forwarding coverage Two new tests in the `event forwarding` suite drive the forwarder callback directly with a synthesized `ToolExecuteBatchRequest`: - `routes child ON_TOOL_EXECUTE dispatches through the parent registry` asserts the parent's `ON_TOOL_EXECUTE` handler is invoked, the batch's `resolve` fires with canned results, and the child receives them. - `does NOT forward ON_TOOL_EXECUTE when the parent registry has no handler (safe fallback)` documents the companion case: without a handler the `resolve` never fires, which is why the executor gates `keepToolDefinitions` on handler presence in the first place. ### MINOR #2 — unused `childGraph` parameter Removed from `createForwarderCallback`'s arg shape, destructuring, and call site. It was accepted as `_childGraph` and never read. ### MINOR #3 — `data as never` `EventHandler.handle` didn't include `ToolExecuteBatchRequest` in its accepted data union, which is why the forwarder used `as never` to shut the compiler up. Extended the union and switched the cast to the concrete `ToolExecuteBatchRequest` type — the forwarder is now fully type-checked on the hot path. ### MINOR #4 — error-phase envelope test `emits an 'error' phase envelope when the child graph throws` uses the throwing factory helper already in the file, registers a real `ON_SUBAGENT_UPDATE` handler, and asserts start/error ordering plus the `data.message` payload and `parentToolCallId` passthrough. ### MINOR #6 — dynamic imports in tests Replaced `await import('@/events')` / `await import('@/common')` with static top-level imports. Consistent with the rest of the file. ### MINOR #7 — `summarizeEvent` unit tests Exported the pure function from `@/tools/subagent` and added 10 focused tests covering every branch: tool_calls with one/many names, empty tool_calls, message_creation, ON_TOOL_EXECUTE with/without names, completed steps with/without a tool name, message_delta, and the unknown-event fallback. Every user-visible label is now regression protected. ### MINOR #8 — `config.toolCall` comment Expanded the JSDoc to note the specific LangChain source (`ToolRunnableConfig` in `@langchain/core/tools`, stable since 0.3.x) and that the defensive read degrades to `undefined` on upstream changes. ### NIT #9 — unused import Removed `createContentAggregator` from `subagent-tools-debug.ts`. ### NIT #10 — `awaitHandlers = true` doc Added a JSDoc block explaining the trade-off: required for `ON_TOOL_EXECUTE` to actually block on the host's resolve, but it also serializes observational events through any slow host handlers. Refinement path noted for later if host-side latency becomes a problem. Version bumped to `3.1.67-dev.3`. Tests: 51 pass (40 SubagentExecutor + 10 new summarizeEvent + 1 new error-phase + existing coverage retained). * fix: clear parent run summary and discoveredTools from child inputs Paired with Codex P1 on the LibreChat PR (danny-avila/LibreChat#12725) that uses this event stream. `buildChildInputs` shallow-spreads the parent's `AgentInputs`, which carries two run-scoped fields that leak into the isolated child by default: - `initialSummary` — cross-run conversation summary. On a conversation that has already been summarized, every child would inherit it and reason against unrelated prior chat. - `discoveredTools` — tool names the parent's LLM searched for in earlier turns via `tool_search`. The child gets primed to use whatever the parent happened to find. Both are cleared on the returned child inputs. Configuration fields (summarization policy, provider options, context pruning config) are preserved — those are policy, not context. Affects both paths: - Self-spawn: `resolveSubagentConfigs` clones parent `_sourceInputs`, then `buildChildInputs` strips. - Explicit children: host-built `AgentInputs` goes through the same `buildChildInputs` before the child graph runs. Version bumped to `3.1.67-dev.4`. Regression test: `strips parent-run-scoped initialSummary and discoveredTools from child inputs` — seeds parent inputs with `initialSummary: { text, tokenCount }` and `discoveredTools: [...]`, asserts both are `undefined` on the returned child. 52 pass. * test: cover summarizeEvent step.type fallback branches Addresses reviewer F2.1 (NIT) on #107. `summarizeEvent` resolves run-step detail type via `step.stepDetails?.type ?? step.type ?? 'step'`. The existing tests all passed `{ stepDetails: { type } }`, leaving the second and third clauses of the chain uncovered. Two new tests: - top-level `step.type` (no `stepDetails` wrapper) for both `tool_calls` and `message_creation` — exercises the second clause - empty `{}` payload — exercises the `'step'` default and the generic `Step: <detailType>` branch Carried MINOR #5 (`SubagentUpdateEvent.data: unknown`) left as a documented deferral; the JSDoc already notes "shape depends on phase" and a discriminated union is a larger refactor. 54/54 pass. * chore: add npm run scripts for subagent debug scripts - `npm run subagent` — supervisor + researcher/coder subagent demo (no tool use; from #104) - `npm run subagent:events` — event-driven flow matching LibreChat's architecture (toolDefinitions + ON_TOOL_EXECUTE handler + self-spawn with calculator). This is the script that demonstrates the #107 fix. - `npm run subagent:tools` — traditional LangChain tool instances with self-spawn; pre-fix parity case. All three accept `OPENAI_API_KEY` (and `--provider anthropic` for the first).

* feat: add hook registry and executor scaffolding (phase 1/1) (#87) * feat: add inert hook registry and executor scaffolding (phase 1/1) Introduces `src/hooks/` with types, registry, and executor for a 12-event hook lifecycle. Purely additive — not exported from `src/index.ts` and not yet wired into `Run`, `Graph`, or `ToolNode`. Integration lands in the next PR so this change is shippable independently with zero behavioral impact on hosts that don't opt in. - Discriminated-union input/output per event; generic `HookCallback<E>` and `HookMatcher<E>` for type-safe registration. - `HookRegistry` uses `Map<sessionId, bucket>` (not `Record`) to avoid O(n^2) churn under parallel registration in multi-tenant hosts. - `executeHooks` fans out in parallel, races hooks against a combined parent + timeout `AbortSignal` so non-listening hooks still time out, folds decisions with `deny > ask > allow` precedence, accumulates `additionalContext`, and self-removes `once: true` matchers after any successful hook. Errors are non-fatal: swallowed into the aggregated result and routed through an optional winston logger (falling back to `console.warn`), with internal matcher errors suppressed. 47 tests under `src/hooks/__tests__/` cover registry scoping, regex matching, precedence folding, once-self-removal (success/failure/mixed), timeout enforcement (including non-listening hooks), error non-fatality, and parent signal combination. * fix(hooks): address review findings — rename field, wire updatedOutput, cache patterns Resolves the comprehensive review on #87: - Rename `HookMatcher.matcher` -> `HookMatcher.pattern` to fix the self-referential naming (`m.matcher` reads as "the matcher's matcher"). - Wire `PostToolUseHookOutput.updatedOutput` through `AggregatedHookResult` and `fold()`. The type was previously a promise the executor didn't fulfill — a hook returning `{ updatedOutput: ... }` was silently dropped. - Correct the JSDoc on `updatedInput`/`updatedOutput`. The previous copy claimed non-determinism "in Promise.all resolution order" — but `Promise.all` preserves input order, so the fold is deterministic in registration order (outer loop over matchers, inner loop over each matcher's hooks). Updated the `executeHooks` function docstring to match. - Add regex compilation cache and length bound to `matchers.ts`. Patterns compile at most once per unique string and are reused across calls; patterns over MAX_PATTERN_LENGTH (512) are rejected outright as a cheap ReDoS mitigation. Documented the trust model (patterns are trusted input; hosts must validate user-supplied patterns upstream). - Document the wildcard-only semantics on query-less events: a non-empty pattern on an event that doesn't supply a matchQuery (RunStart, Stop, etc.) never matches. - Document the `once: true` concurrent-dispatch limitation: two parallel `executeHooks` calls each snapshot the matcher list, so `once` means "once per call", not "once globally". Matches Claude Code's semantics. Added a test that pins this behavior. - Merge `matchers.filter(...)` + task-build loop into a single pass per AGENTS.md "consolidate sequential O(n) operations." - Scope `eslint-disable no-console` to the single `console.warn` call in `reportErrors` instead of disabling it file-wide. - Fix the timeout-ignoring-hook test: clear the dangling `setTimeout` and assert the error surfaces an abort-shaped label. - Add tests: multi-hook `preventContinuation` (first-writer-wins on stopReason, flag without reason), `updatedOutput` flow + registration order, registration-order determinism for `updatedInput`, pattern length bound, compilation cache hit/miss/clear, concurrent `once` dispatch. Total: 47 -> 59 tests. Findings audited and resolved: - #1 ReDoS: cache + length bound + trust model doc (full fix blocked on host-side validation) - #2 Wrong updatedInput ordering JSDoc: fixed - #3 Dead `updatedOutput` type: wired through - #4 Concurrent `once` race: documented + test - #5 `HookMatcher.matcher` naming: renamed to `pattern` - #6 `WideCallback` dead code: rejected as inaccurate (used in `runHook`) - #7 Eslint-disable scope: line-scoped - #8 File-path comments: rejected; matches existing project convention - #9 Two-pass filter: single pass - #10 Regex recompilation: cached - CL-4 Undocumented wildcard semantics: documented - RT-3 Timeout test dangling timer: fixed + error content verified - RT-4 No multi-hook preventContinuation test: added * perf(hooks): atomic once, bounded LRU, ReDoS heuristic, shared signal Rework the phase-1 hooks scaffolding for the multi-tenant scale we actually operate at. These are behavior changes (semantics of `once` tightened, new pattern-rejection path) but none of them are wired into the runtime yet — this PR is still inert — so the blast radius is the hooks directory only. ## Atomic `once: true` Removal now happens synchronously inside `executeHooks`, between `registry.getMatchers` and the first `await`. Node's event loop serialises sync work, so two concurrent `executeHooks` calls can never both observe the same `once` matcher — whichever call runs its sync prefix first consumes it, and the loser sees an empty bucket. Semantic change: `once` is now at-most-one-dispatch, not at-most-one-successful-execution. If every hook in a `once` matcher throws, the matcher is still gone. The old retry-on-failure behavior was Claude Code's model — fine for a single-user CLI, but in a multi-tenant server it opens a window where concurrent bursts can re-dispatch a "once" hook non-idempotently. "At most once, ever" is the correct semantic for a shared registry. Hosts that need retry should register a normal matcher and self-unregister via the callback returned from `registry.register`. Dropped `collectOnceMatchersForRemoval` + its Map/array allocations — the post-hoc collection step is no longer needed. Tests: replaced "keeps the matcher when every hook throws" with "removes the matcher even when every hook in it throws". Added "fires exactly once across 3 concurrent calls" and "fires exactly once across 8-way concurrent dispatch when hooks are slow" (the slow test deliberately uses a 10ms hook to force overlap). ## Bounded LRU regex cache `matchers.ts` was unbounded. Under multi-tenant use where different tenants register different patterns, the cache would leak. Capped at `MAX_CACHE_SIZE = 256` entries with LRU eviction: hits refresh position (delete-then-set), misses at capacity evict the oldest key. Failed compiles are cached the same way so a tenant spamming bad patterns doesn't burn CPU on every call. Exposed `getMatcherCacheSize` and `MAX_CACHE_SIZE` for test observability. Tests verify eviction order (cold patterns recompiled after overflow) and that hot patterns refreshed mid-stream survive eviction pressure. ## ReDoS heuristic: nested-quantifier detector `hasNestedQuantifier` walks a pattern linearly, tracks paren depth, and rejects any pattern where a quantified group contains another quantifier — the `(a+)+`, `(.*)*`, `(\w+)+` shape that is the #1 catastrophic-backtracking source. Character classes (`[*+?]`) and escaped quantifiers (`\+`) are correctly ignored. Rejection happens before `new RegExp`, so adversarial input never reaches the compiler. This is a floor, not a ceiling — it doesn't catch ambiguous-alternation ReDoS like `(a|a)+`. Hosts that accept user-supplied patterns must still validate upstream. Documented in the `HookMatcher.pattern` and `matchesQuery` JSDoc. Test verifies that `(a+)+$` on a 35-char adversarial input resolves in under 50ms (would be seconds without the heuristic). ## Shared `AbortSignal` per matcher Was: one `AbortSignal.timeout(ms)` + one `AbortSignal.any([parent, timeout])` per hook. For a matcher with N hooks and the same timeout, that's N timers + N composite signals. Now: one signal per matcher, shared across all its hooks. Since hooks in a matcher already share `matcher.timeout`, they may as well share the timer. Behavior is identical (all hooks fire and race the same deadline) but allocation drops from N to 1 per matcher. ## Tests 72 passing (was 59, +13): - Atomic once: 2 concurrent tests + 1 failure-removal test updated - LRU: cache size cap, eviction order, hit refresh - Nested quantifier: classic shapes, deep nesting, escaped chars, character classes, false positives on legitimate patterns - ReDoS mitigation: rejection path, <50ms response on adversarial input * fix(hooks): fix ReDoS heuristic false-positives on (?:...) groups Resolves the follow-up review on #87. Four findings, all valid. ## #1 MAJOR — hasNestedQuantifier false-positives on group syntax The previous scanner treated `?` as a quantifier at any depth > 0, but `?` immediately after `(` is group syntax, not a quantifier: `(?:...)`, `(?=...)`, `(?!...)`, `(?<name>...)`, `(?<=...)`, `(?<!...)`. Patterns like `(?:pre_)?tool_name` were silently rejected and the registering hook never fired — with no error surfaced to the caller. Fix: explicit group-syntax prefix skip inside the `(` handler. When the scanner enters a group it peeks for `?` and advances past the modifier characters (`:`, `=`, `!`, `<=`, `<!`, or a named-group label `<name>`) before processing the group body. The `?` is therefore never considered a quantifier at the start of a group. While in there I also fixed a second correctness bug the reviewer didn't catch: the old depth-indexed array didn't propagate a child group's "contains quantifier" signal up to its parent, so `(?:(a+))+` (outer quantifier over a wrapped quantified group, equivalent to `(a+)+`) escaped detection. The scanner now uses an explicit stack of `QuantifierFrame` records, and when a child group closes it propagates `hasBacktrackRisk` to the parent frame — whether the child itself was quantified or not. This catches `(?:(a+))+` correctly while still allowing non-risky wrappers like `(?:(ab))+`. Added 9 tests covering non-capturing groups, lookaheads, lookbehinds, named captures, `(?:(a+))+` risk propagation, and `((ab)+)+` deep wrapping. Verified existing ReDoS-detection tests still pass. ## #2 MINOR — executeHooks.test.ts did not clear the pattern cache Matcher cache is module-level. `matchers.test.ts` clears it in `beforeEach`; `executeHooks.test.ts` did not, so patterns compiled during one test persisted across subsequent tests in the same file. Added `clearMatcherCache()` to `executeHooks`'s `beforeEach` — no test failures today, but restores test independence. ## #3 MINOR — Test utilities leaked into production barrel `clearMatcherCache` and `getMatcherCacheSize` are test-only helpers (their JSDoc says so) but were exported from `src/hooks/index.ts`. When the integration PR eventually re-exports `src/hooks` from `src/index.ts` they would become public API. Removed both from the barrel. Tests already import them directly from `../matchers`, so no test changes needed. `hasNestedQuantifier`, `MAX_PATTERN_LENGTH`, and `MAX_CACHE_SIZE` remain exported — hosts can legitimately use them for upstream validation. ## #4 NIT — LRU refresh overhead at low cache pressure Every `compile()` cache hit was doing `delete + set` to refresh LRU position, even when the cache was nowhere near capacity. With `MAX_CACHE_SIZE = 256` and typical working sets of tens of patterns, eviction pressure is near-zero and the refresh is pure overhead. Added a `LRU_REFRESH_THRESHOLD = 75%` gate: refresh only runs when the cache is above 192 entries. Below that the code fast-paths straight back from `compile()`. Above it the LRU semantics kick in so hot patterns still survive evictions — the existing "refreshes LRU position on hit" test still passes because by the time it runs the check, the cache is at capacity. Tests: 72 -> 81 (+9 for group-syntax and risk-propagation coverage). * feat: wire run-level hooks into processStream (#88) * feat: wire run-level hooks into processStream (phase 1/2) Fires four hook lifecycle events from `Run.processStream`, making the inert hooks module from PR #87 live. Hosts pass a pre-constructed `HookRegistry` via `RunConfig.hooks`; when absent, all hook dispatch is skipped (zero overhead for non-adopters). ## Integration points - `RunConfig.hooks?: HookRegistry` — new optional field, mirrors the existing `customHandlers` pattern - `Graph.hookRegistry` — field alongside `handlerRegistry`, cleared in `clearHeavyState()` - `Run` constructor wires the registry onto both `this.hookRegistry` and `this.Graph.hookRegistry` ## Hook fire points in `processStream` 1. **RunStart** — after config setup, before stream creation. Receives `inputs.messages`, `runId`, `threadId`, `agentId`. 2. **UserPromptSubmit** — immediately after RunStart. Extracts prompt text from the last human message. If the hook returns `{decision: 'deny'}` or `{decision: 'ask'}`, the run aborts early and returns `undefined`. The `ask` case is an abort in v1 — the SDK returns the decision and the host decides how to implement interactive approval (see Phase 2 plan in the skills design report). 3. **Stop** — at the end of the try block, after the for-await loop. Receives the accumulated messages from `Graph.getRunMessages()`. Only fires on successful completion (not on error). 4. **StopFailure** — in a new catch block between try and finally. Receives the error message and last assistant message. Hook errors are swallowed (`.catch(() => {})`) so the original error propagates. ## Session teardown `hookRegistry.clearSession(runId)` is called in the finally block, guaranteeing cleanup even on error. Fulfills §3.9 rule #5 from the design report. ## Public API `src/hooks/` is now re-exported from `src/index.ts`. Hosts can import `HookRegistry`, `executeHooks`, and all hook types directly from `@librechat/agents`. ## Helper functions Three module-level functions added to `run.ts`: - `findLastHumanMessage` — backward scan for `getType() === 'human'` - `findLastAssistantMessage` — backward scan for `getType() === 'ai'` - `extractPromptText` — handles string and complex content arrays ## Tests 10 new integration tests using `Run.create` + `overrideTestModel` + `processStream` (same pattern as `custom-event-await.test.ts`): - RunStart fires with correct fields - UserPromptSubmit extracts prompt, deny aborts, ask aborts - Stop fires on success, does not fire on error - StopFailure fires on error, preserves original exception - Session cleared in finally (both success and error paths) - No-hooks baseline works identically Total: 81 existing + 10 integration = 91 passing. * fix(hooks): address PR 2 review — DRY helpers, test coverage, guards Resolves all findings from the comprehensive review on #88. - #1 MAJOR: Add 3 tests for `extractPromptText` — multi-part content (text + image blocks yields 'hello\nworld'), image-only content (yields ''), and empty content array (yields ''). - #2 MINOR: Merge `findLastHumanMessage`/`findLastAssistantMessage` into parameterized `findLastMessageOfType(messages, type)`. - #3 MINOR: Rewrite `extractPromptText` array path from `.filter().map().join()` triple-pass to single `for...of` loop accumulating into `parts[]` then `.join('\n')`. - #4 MINOR: Add `config.callbacks = undefined` on deny/ask early return path so Langfuse handler reference is cleaned up even though no stream ran. - #5 MINOR: Add comment on `UserPromptSubmit` fire point noting `attachments` is not yet wired (Phase 2). - #6 MINOR: Strengthen `StopFailure` test assertion from `toBeTruthy()` to `typeof === 'string'` + `length > 0`. - #7 MINOR: Add test for empty-content human message — verifies `UserPromptSubmit` fires with `prompt: ''`. - #8 NIT: Add inline comment on `stopHookActive: false` explaining it will be `true` in Phase 2 when stop is triggered by a hook. - #9 NIT: Use `hasHookFor('Stop', sessionId)` and `hasHookFor('StopFailure', sessionId)` guards before `getRunMessages()` to avoid the array copy when no hooks are registered for the event. - CL-2: Wrap Stop hook dispatch in its own `.catch()` so an infrastructure error in `executeHooks` (theoretical — it catches internally) cannot masquerade as a stream failure in the catch block. Tests: 91 -> 94 (+3 for extractPromptText coverage). * feat: wire tool hooks into event-driven ToolNode (phase 1/3) (#90) * feat: wire tool hooks into event-driven ToolNode (phase 1/3) Fires PreToolUse, PostToolUse, PostToolUseFailure, and PermissionDenied hooks inside `ToolNode.dispatchToolEvents` — the single chokepoint for all event-driven tool execution in LibreChat. ## Hook lifecycle in dispatchToolEvents 1. **PreToolUse** fires per call in parallel before ON_TOOL_EXECUTE dispatch. Denied calls (deny or ask) produce error ToolMessages and fire PermissionDenied; surviving calls proceed with optional updatedInput applied to tool args. 2. Only approved calls are batched into ON_TOOL_EXECUTE and sent to the host. If all calls are denied, the batch dispatch is skipped entirely. 3. **PostToolUse** fires per successful result. If a hook returns updatedOutput, the ToolMessage content is replaced before the message is appended to the graph. 4. **PostToolUseFailure** fires per error result. Observational only — cannot modify the error message. Post hooks are awaited (not fire-and-forget) so updatedOutput takes effect before the ToolMessage is consumed. Hook errors are swallowed via .catch() to avoid disrupting the tool result flow. PermissionDenied is fire-and-forget since it's purely observational. ## Plumbing - `ToolNodeOptions.hookRegistry?: HookRegistry` — new optional field - `ToolNode` constructor stores `this.hookRegistry` - `Graph.createToolNode()` passes `this.hookRegistry` at both construction sites (event-driven and traditional) - `hookRegistry` is set on Graph BEFORE `createWorkflow()` is called (moved from the post-create assignment in Run constructor into `createLegacyGraph` and `createMultiAgentGraph`) so ToolNode receives a non-undefined registry at construction time ## hasHookFor guards All four hook fire points use `hasHookFor(event, runId)` before calling `executeHooks`, so when no hooks are registered for the event the overhead is a single Map lookup + array length check — no Promise allocations, no function calls. ## Tests 9 new integration tests in `src/hooks/__tests__/toolHooks.test.ts` using event-driven mode (toolDefinitions + ON_TOOL_EXECUTE handler): - PreToolUse fires with correct fields - PreToolUse deny blocks execution (tool handler never called) - PreToolUse ask blocks execution (v1) - PreToolUse updatedInput rewrites args before dispatch to host - PreToolUse hook errors are non-fatal (tool still executes) - PermissionDenied fires after deny with reason - PostToolUse fires with tool output - PostToolUseFailure fires on error result - No-hooks baseline works identically Total: 94 existing + 9 new = 103 passing. * fix(hooks): address PR 3 review — step completion, ordering, catch, tests Resolves all 10 findings from the review on #90. ## Structural rewrite of dispatchToolEvents The post-processing loop was rewritten to fix #1, #2, #6, #8, #10: - #1 MAJOR: Denied calls now dispatch ON_RUN_STEP_COMPLETED via a shared `dispatchStepCompleted` helper (extracted from the duplicate logic in the executed-results loop). Without this, the frontend would show stuck spinners for denied tool calls. - #2 MAJOR: Output messages are now collected in a Map keyed by tool_call_id, then returned in the original `toolCalls` input order via `toolCalls.map(call => messageByCallId.get(call.id))`. Before, denied messages were prepended to executed messages, breaking the implicit ordering contract. - #6 MINOR: `toolUsageCount` is now incremented only for approved calls (moved from the initial `preToolCalls.map` into the `approvedEntries.map` that builds ToolCallRequests). Denied calls no longer inflate the turn counter. - #8 MINOR: Eliminated the double `result.status === 'error'` check. ToolMessage is now constructed inside each status branch directly after hook dispatch. - #10 NIT: `requests.find()` replaced with a pre-built `Map<id, ToolCallRequest>` for O(1) lookup per result, per AGENTS.md guidance on Map/Set over Array.find. ## Error protection - #3 MAJOR: PreToolUse `Promise.all` now wraps each `executeHooks` call in `.catch(() => emptyResult)` so an infrastructure error in the hooks subsystem cannot crash the entire tool batch. Matches the stated "hook errors are non-fatal" contract and is consistent with PostToolUse/PostToolUseFailure which already had `.catch()`. ## New tests - #4 MAJOR: Added `updatedOutput replaces the ToolMessage content` — registers a PostToolUse hook returning `{updatedOutput: 'REDACTED'}`, verifies the ToolMessage content in the graph's run messages is `'REDACTED'` (not the original output). - #7 MINOR: PermissionDenied test replaced 50ms `setTimeout` with a `Promise` resolved by the hook callback. No more flaky sleep. ## Cleanup - #9 NIT: Updated stale "once the tool-hook PR lands" comment in `src/hooks/index.ts`. Tests: 103 -> 104 (+1 for updatedOutput). * fix(hooks): off-by-one in dispatchStepCompleted index, test guard, turn fallback - #1 MAJOR: dispatchStepCompleted read post-increment toolUsageCount for the `index` field. Added `turn?: number` parameter; approved-call site passes `request?.turn` (the pre-increment value), denied-call site omits it (fallback is correct since count was never bumped). - #2 NIT: updatedOutput test now asserts `expect(toolMsg).toBeDefined()` before content extraction so a missing tool message produces a clear failure rather than "Expected undefined to be 'REDACTED'". - #3 NIT: PreToolUse input `turn` field restored `?? 0` fallback so first-use tools see `turn: 0` instead of `turn: undefined`. * fix(hooks): batch tests, turn JSDoc, freeze sentinel, remove unused hookRegistry Addresses follow-up review findings on #90. - #1 MAJOR: Add multi-call batch tests — partial deny (echo denied, math approved, order preserved) and all-denied (ON_TOOL_EXECUTE handler never called). Uses counter-based IDs (#7 fix) to avoid Date.now() collisions across calls in the same test. - #2 MINOR: Document PreToolUseHookInput.turn semantics — within a single batch, all calls to the same tool share the same turn value. Per-call discrimination within a batch is not supported in v1. - #3 MINOR: Remove hookRegistry from the traditional (non-event-driven) ToolNode construction site in Graph.ts. Hooks only fire in dispatchToolEvents; passing the registry to a ToolNode that never uses it created a false impression of hook support. - #4 MINOR: Add PostToolUse error non-fatality test — throwing hook does not crash the batch, original content is preserved. - #5+#8 MINOR: Freeze the hook-error fallback sentinel via Object.freeze and rename from emptyResult to HOOK_FALLBACK. Prevents future mutation bugs if someone pushes to the shared arrays. - #6 MINOR: Document in ToolNodeOptions.hookRegistry JSDoc that hooks only fire for event-driven calls — directToolNames bypass dispatch. Tests: 104 -> 107 (+3 for batch partial-deny, all-denied, and PostToolUse error resilience). * nit: simplify test assertions, keep shallow freeze (deep conflicts with mutable type) * feat: SkillTool primitive, injectedMessages, and catalog formatter (#91) Phases 1+2 of the Skills system, rebased on hooks PRs (#88, #90): Types: - InjectedMessage type in tools.ts (generic, any tool can use it) - SkillCatalogEntry type in skill.ts - ToolExecuteResult.injectedMessages field - Constants.SKILL_TOOL = 'skill' SkillTool (src/tools/SkillTool.ts): - JSON Schema (single source of truth), LCTool definition, createSkillTool() - Runtime-agnostic description (files "may" become accessible) - Requires event-driven execution mode ToolNode integration (src/tools/ToolNode.ts): - convertInjectedMessages() converts to HumanMessage (both roles, avoids Anthropic/Google SystemMessage rejection). Original role in additional_kwargs. - dispatchToolEvents returns { toolMessages, injected } tuple - Injected messages placed AFTER ToolMessages (provider ordering) - try/catch isolation around injection conversion - Works with hooks-based PreToolUse/PostToolUse lifecycle Catalog formatter (src/tools/skillCatalog.ts): - formatSkillCatalog() with truncation ladder and budget enforcement - fitNamesOnly drops trailing entries when names-only exceeds budget Tests: 30 total (18 SkillTool + 12 skillCatalog) * refactor: remove unused Zod schema and createSkillTool function from SkillTool.ts * feat: add bash execution and bash programmatic tool calling tools (#93) * feat: add local bash execution and bash programmatic tool calling tools Implement BashExecutor for local bash command execution via child_process and BashProgrammaticToolCalling for multi-tool orchestration using bash scripts with a local HTTP server for tool IPC via curl. * fix: use remote Code API for bash tools instead of local child_process BashExecutor now sends `lang: 'bash'` to the same /exec endpoint as CodeExecutor. BashProgrammaticToolCalling sends bash code to /exec/programmatic with the same round-trip loop as PTC. Both tools share session/file handling with their counterparts. ToolNode session injection and storage updated for all four code execution tool names. * refactor: rename EXECUTE_BASH constant to BASH_TOOL for consistency Updated the constant name from EXECUTE_BASH to BASH_TOOL in the Constants enum and adjusted references in BashExecutor and ToolNode to reflect this change. This improves clarity and aligns with naming conventions. * feat: add SKILL_TOOL to session storage and context in ToolNode (#94) When the skill handler uploads files to the code env and returns artifact: { session_id, files }, ToolNode now stores that session context (via storeCodeSessionFromResults). This makes skill-primed files available to subsequent bash/code tool calls in the same run. Also adds SKILL_TOOL to the codeSessionContext request building in dispatchToolEvents, so the handler receives the current session context and can check if files are already uploaded. * feat: initialSessions on RunConfig + CODE_EXECUTION_TOOLS DRY helper (#95) * feat: add initialSessions to RunConfig for seeding Graph session state Allows hosts to seed the Graph's ToolSessionMap at run creation time. Used by skill file re-priming: after uploading skill files to the code env at run start, the host passes the session info so ToolNode can inject session_id + files into subsequent bash/code tool calls. * refactor: add CODE_EXECUTION_TOOLS set, DRY ToolNode checks Add CODE_EXECUTION_TOOLS ReadonlySet to common/enum.ts containing all four code execution tool constants (EXECUTE_CODE, BASH_TOOL, PROGRAMMATIC_TOOL_CALLING, BASH_PROGRAMMATIC_TOOL_CALLING). Replace 5 repeated 4-constant inline checks in ToolNode with CODE_EXECUTION_TOOLS.has(). SKILL_TOOL remains a separate check where needed since it's not a code execution tool itself. * chore: update version to 3.1.66-dev.0 and refresh package-lock.json * feat: reconstruct skill body HumanMessages in formatAgentMessages (#96) When a skill is invoked, the body is injected as a HumanMessage into LangGraph state but NOT persisted to conversation history. On follow-up runs the skill body is lost. formatAgentMessages now accepts an optional invokedSkillBodies Map (skillName → body) and reconstructs HumanMessages at the correct position in the message sequence — after the skill's ToolMessage, matching where ToolNode originally injected them. Detection happens inside the existing tool_call processing loop (both with and without tools filtering), so there are zero extra message iteration passes. * fix: skill body reconstruction token distribution + review fixes (#97) Critical fix: - Capture endMessageIndex BEFORE skill body HumanMessage injection so injected messages are excluded from the assistant message's token distribution range. Prevents indexTokenCountMap corruption. Review fixes: - Remove dead else-if(!discoveredTools) branch inside if(discoveredTools) block and revert redundant optional chaining - Extract extractSkillName() helper to eliminate DRY violation (identical args parsing IIFE was duplicated in two locations) - Use Set<string> instead of string[] for pendingSkillNames (dedup) - Lazy allocation via ??= (only allocated when skills are present) * chore: eslint * chore: remove `createSkillTool` * fix: resolve review findings for hooks & skills PR (#100) * fix: resolve review findings for hooks & skills PR - Fix stale JSDoc referencing non-existent future PR (run.ts) - Use requestMap (Map.get) instead of Array.find in storeCodeSessionFromResults - Extract shared updateCodeSession helper to eliminate duplicated session storage logic - Add sync critical section markers around once-matcher removal in executeHooks - Optimize fitNamesOnly from O(n²) to O(n) with running character sum - Widen ReDoS timing assertion from 50ms to 200ms for CI stability - Add test verifying ON_RUN_STEP_COMPLETED dispatch for denied tool calls * test: assert event shape in deny step completion test * feat: add ReadFile tool definition for general-purpose file reading (#99) New READ_FILE constant and ReadFileToolDefinition for a general-purpose file reader. Schema: { file_path: string }. Skill files use the path format {skillName}/{path} (e.g. "pdf-analyzer/src/utils.py"). The tool is event-driven only (host handles via ON_TOOL_EXECUTE). Works without code execution for skill files (DB/storage reads). With code env, also supports code execution output files. * chore: update ToolNode session tests for requestMap signature change (#101) * chore: add --tag dev for prerelease npm publish (#102) * feat: wire PreCompact/PostCompact hooks around summarization node (#103) * feat: wire PreCompact/PostCompact hooks around summarization node Fires PreCompact and PostCompact hooks inside `createSummarizeNode`, the last two SDK-side hook events. These are auditing hooks — hosts can archive the full transcript before compaction, observe the summary produced, or log compaction events for telemetry. ## Fire points - **PreCompact** fires after ON_SUMMARIZE_START, before the LLM summarization call. Receives `messagesBeforeCount` and the trigger type from the agent's summarization config. - **PostCompact** fires after ON_SUMMARIZE_COMPLETE, after the summary is generated and enriched. Receives the summary text and `messagesAfterCount: 0` (the node removes all messages; the summary is injected into the system prompt, not as a message). Both hooks are observational and non-blocking — errors are swallowed via `.catch()`. Results are not consumed. ## Type fix `PreCompactHookInput.trigger` was defined as `'threshold' | 'manual' | 'error'` (from the initial PR #87) but those don't match the real summarization trigger types. Updated to: `'token_ratio' | 'remaining_tokens' | 'messages_to_refine' | 'default'` with `(string & {})` for forward compatibility. `'default'` covers the case where no trigger is configured and compaction fires on any pruning. ## Plumbing `hookRegistry` added to `CreateSummarizeNodeParams.graph` interface. `Graph.ts` passes `this.hookRegistry` at the `createSummarizeNode` call site, same pattern as ToolNode. ## Tests 4 new integration tests using `FakeListChatModel` for the summarization provider (no API keys required): - PreCompact fires with correct messagesBeforeCount and trigger - PostCompact fires with summary text - Hook errors don't crash compaction - No-hooks baseline works identically Total: 108 existing + 4 new = 112 passing. * fix(hooks): add threadId to compaction hooks, strengthen tests, fix imports Resolves review findings on #103. - #1 MAJOR: Add threadId to both PreCompact and PostCompact hook inputs. PreCompact extracts from the node's config parameter, PostCompact from runnableConfig. Restores consistency with all other hook sites in run.ts. - #2 MINOR: Assert trigger field in PreCompact test — verifies the 'default' fallback when no trigger is configured. - #3 MINOR: Strengthen PostCompact summary assertion from length > 0 to toContain('Summary') against the known FakeListChatModel response. - #4 MINOR: Add PostCompact error resilience test — throwing hook in dispatchCompletionEvents does not crash compaction. - #5 MINOR: Fix import order in node.ts per AGENTS.md (type imports longest→shortest, value imports longest→shortest). - #6 NIT: Update hooks/index.ts barrel comment to list createSummarizeNode as third consumer. - #8 NIT: Assert agentId in both PreCompact and PostCompact tests. Tests: 112 -> 113 (+1 PostCompact error resilience). * test: assert threadId propagation and exact summary match in compaction hooks * fix: use runnableConfig for PreCompact threadId extraction (matches PostCompact) * feat: add subagent-as-tool primitive for hierarchical agent delegation (#104) * feat: add subagent-as-tool primitive for hierarchical agent delegation Adds a SubagentTool that lets a parent agent delegate tasks to child agents with isolated context windows. Verbose tool outputs stay in the child's context; only a filtered text summary returns to the parent. Key components: - SubagentConfig type on AgentInputs for declaring child agent types - SubagentExecutor: creates child StandardGraph, invokes with isolated state, filters result (strips tool_use/thinking blocks), fires SubagentStart/SubagentStop hooks - SubagentTool: LCTool definition with dynamic enum from configs - Graph integration: tool created in createAgentNode() via graphTools, automatically becomes a direct tool (bypasses event-driven dispatch) - Self-spawn support: reuse parent's AgentInputs for context isolation without separate agent config Wires the existing SubagentStart (with deny support) and SubagentStop hook types that were defined but had zero fire points. * fix: resolve review findings for subagent primitive Address all 6 issues from PR review: 1. Add subagentHooks.test.ts — end-to-end hook integration test through real Run pipeline (SubagentStart payload, SubagentStop messages, deny blocks execution). 3 new tests. 2. Remove `as unknown as` cast — follow MultiAgentGraph pattern (init graphTools as empty array, push directly). 3. Deduplicate schema — extract buildSubagentToolParams() in SubagentTool.ts, use it in Graph.ts. createSubagentToolDefinition() now delegates to it. Single source of truth for schema/description. 4. Propagate threadId — tool callback extracts thread_id from LangGraph config.configurable, passes to executor.execute(). Hooks now receive correct threadId. 5. Pass tokenCounter to child graph — added to SubagentExecutorOptions, threaded from agentContext.tokenCounter through to child StandardGraph constructor. 6. Pass signal in invoke() config — workflow.invoke() now receives signal for more responsive abort between graph steps. * fix: resolve second-pass audit findings for subagent primitive Address 12 valid findings from the audit — 3 MAJOR, 6 MINOR, 3 NIT: MAJOR fixes: 1. Depth tracking now works across graph boundaries. Previously the `depth`/`maxDepth` check only worked within a single executor; when `allowNested: true`, the child graph spawned its own executor at depth=0, making `maxSubagentDepth` non-functional. Fix: decrement `maxSubagentDepth` in `buildChildInputs`, so the child's own executor inherits a smaller budget and naturally blocks further nesting when it reaches 0. Removed the now-vestigial `depth` field. 2. Detect duplicate `SubagentConfig.type` values. `Map` construction silently dropped duplicate keys. Now `resolveSubagentConfigs` throws with a clear error. 3. Add test coverage for `allowNested: true`: verifies depth is decremented across boundaries, child inherits subagentConfigs, and budget is clamped to >= 0. MINOR fixes: 4. Runtime guard for empty `description` in the tool callback — the LLM can send undefined/empty values; now we default to a safe fallback string. 5. Updated `src/hooks/index.ts` barrel comment to list SubagentExecutor as the fourth hook consumer (SubagentStart/SubagentStop). 6. Extracted property description strings as constants in SubagentTool.ts; `SubagentToolSchema` and `buildSubagentToolParams` now share a single source of truth. 7. Changed SubagentStop from fire-and-forget to awaited, matching the PostCompact pattern. Removes fragile `setTimeout` synchronization in tests. Errors still swallowed (observational). 8. Truncate error messages from child graph to 200 chars to prevent leaking internal details (stack traces, API errors) to the parent LLM. 9. Document that `ask` decision is treated identically to `deny` in the non-interactive subagent context. NIT fixes: 11. Import order in SubagentExecutor.ts — `@/types` (multi-line, longest) now comes first in the local types sub-group. 12. Removed unused `contentParts` destructuring in verification script. 13. Fixed unsafe `msg.content as string` cast in verification script; now handles array content correctly. Skipped #10 (NIT): `_sourceInputs` underscore prefix is an accepted TS convention for cross-file-accessible-but-semantically-internal fields. Test count: 1074 passed (up from 1065, +9 new tests), 0 regressions. * fix: address third-pass audit findings for subagent primitive Resolve 3 NITs from the second review pass plus 2 new codex comments: NIT #1 — Tighten truncation test: previously asserted only `result.content.length < 500`. Now asserts exact envelope of 219 chars ("Subagent error: " prefix + 200 body + "..." suffix) so that regressions in ERROR_MESSAGE_MAX_CHARS are caught. Added companion test verifying short messages are not truncated. NIT #2 — Skip tool creation when maxSubagentDepth: 0. When a host set `maxSubagentDepth: 0` alongside `subagentConfigs`, the tool was still bound to the LLM, and every invocation returned the depth- exceeded error. The LLM wasted a turn. Now the `createAgentNode` guard includes `effectiveSubagentDepth > 0`, so the tool is not created at the leaf level. Added integration test. NIT #3 — Document buildChildInputs audience. Added @remarks block explaining this is an advanced utility exported primarily for testing and internal use; hosts configuring subagents should not call it directly. Codex P1 (false positive, clarified): maxSubagentDepth propagation across graph boundaries was already correct via the countdown in buildChildInputs, but it wasn't obvious from the call site. Added a multi-line JSDoc comment above the executor construction in createAgentNode explaining how depth consumes across boundaries through the decremented AgentInputs.maxSubagentDepth. Codex P2 (real bug, fixed): toolSchemaTokens did not include the subagent tool's schema. `calculateInstructionTokens()` was called in `fromConfig()` before graphTools was populated, so the later- injected subagent tool's schema was missing from the budget. Fixes: 1. Include `graphTools` in the iteration in `calculateInstructionTokens` (also helps handoff tools, a pre-existing issue with the same root cause). 2. Re-trigger the calculation at the end of `createAgentNode` after the subagent tool is pushed, so the token count reflects the final graphTools state before `createCallModel` awaits `tokenCalculationPromise`. Added test asserting `toolSchemaTokens` is larger for agents with subagents vs without, confirming the recalculation runs. Test count: 1077 passed (up from 1074, +3 new tests), 0 regressions. * fix: break circular dep between SubagentExecutor and StandardGraph Rollup warned on build: Graph.ts → subagent/index.ts → SubagentExecutor.ts → Graph.ts SubagentExecutor needed `StandardGraph` at runtime to construct the child graph; Graph.ts needs SubagentExecutor to attach the tool. With preserveModules, these land in separate chunks producing circular chunk imports with undefined execution order. Fix: dependency injection. SubagentExecutor now takes a required `createChildGraph: ChildGraphFactory` option and only imports `StandardGraph` as `import type` (erased at build). Graph.ts passes `(input) => new StandardGraph(input)` when constructing the executor. Runtime module graph is now acyclic; `import type` does not appear in emitted JS. Test updates: replaced `jest.spyOn(GraphModule, 'StandardGraph')` (which relied on the runtime import we just removed) with direct mock factories passed through the new option. Cleaner — tests no longer need to reach into module internals. Same coverage. Build: `npx rollup -c` now emits zero warnings. Test count: 1077 passed, 0 regressions. * chore: add multi-agent-subagent script to package.json Introduced a new script entry for multi-agent-subagent, enabling execution of the corresponding TypeScript file with necessary configurations. This addition enhances the multi-agent functionality by allowing for more granular control over subagent operations. * fix: isolate child subagent from parent's callback chain The manual verification script (`npm run multi-agent-subagent`) crashed with: Error: No agent context found for agent ID researcher at StandardGraph.getAgentContext at ChatModelStreamHandler.handle Root cause: `Run.processStream` drives the parent via `streamEvents`, which captures events from every nested runnable in the call tree — including the child graph's LLM calls. When the child's "researcher" agent streamed `on_chat_model_stream` events, the parent's `ChatModelStreamHandler` received them and tried to resolve the child's agent ID in the parent's `agentContexts` map, which only contains the supervisor. Error thrown. The Phase 1 plan was for child events to stay fully isolated. Unit tests missed this because the child graph was always mocked (no real `streamEvents` plumbing); only running the full pipeline end-to-end surfaced it. Fix: pass `callbacks: []` in the child `workflow.invoke()` config. This overrides the inherited callbacks for that invocation, breaking the event propagation chain. Also set `runName: subagent:<type>` and a child-scoped `configurable.thread_id` so the child has a distinct trace root rather than polluting the parent's trace. Manual verification now works: supervisor delegates to researcher, receives filtered text, synthesizes the final answer. Test suite: 1079 passed, 0 subagent-suite regressions. The two newly failing suites (ProgrammaticToolCalling / ToolSearch Live API tests) are pre-existing — they require `LIBRECHAT_CODE_BASEURL` to be configured and fail identically on the prior commit. They were skipped before because OPENAI_API_KEY was absent. * feat: align filterSubagentResult fallback with Claude Code When a subagent's last AIMessage is pure tool_use (e.g. the model hit maxTurns mid-tool-call), the previous implementation returned the "Task completed" sentinel, discarding any textual progress from earlier turns. Claude Code's agentToolUtils.finalizeAgentTool walks backwards from the last message, and on messages without text content continues scanning earlier AIMessages, surfacing the most recent textual thought instead of the sentinel. This change matches that behavior. The three `return` sites inside the loop that previously exited with the sentinel now `continue` when the AIMessage yields no text; the loop exits with "Task completed" only when no AIMessage in the history contains any text. Concrete example: [0] Human: "What's the capital of France?" [1] AI: "Let me search." + tool_use(search, ...) [2] Tool: "Paris." [3] AI: tool_use(search, ...) <-- maxTurns, no text Before: "Task completed" After: "Let me search." Added two tests — the turn-limit scenario above, and the analogous empty-string-content edge case. All nine existing filterSubagentResult tests still pass unchanged. * fix: update model name for non-Anthropic API to gpt-5.4 Changed the model name used in the multi-agent subagent script from 'gpt-4.1' to 'gpt-5.4' for improved performance and capabilities. This update ensures compatibility with the latest API offerings. * v3.1.67-dev.0 * feat: forward subagent child events and enable event-driven tools (#107) * feat: forward subagent child events and enable event-driven tools Adds an opt-in event forwarding path for `SubagentExecutor` so host apps that rely on event-driven tool dispatch (`toolDefinitions` + `ON_TOOL_EXECUTE` handler) can use subagents — including self-spawn. Before: a subagent in event-driven mode got `toolDefinitions` stripped in `buildChildInputs` and ran with `callbacks: []`, so the LLM had no tool schemas and ON_TOOL_EXECUTE never reached the host. The subagent replied in prose without invoking any tools. Now: - `SubagentExecutor` accepts `parentHandlerRegistry` (a direct registry or a lazy getter — `Run` wires the registry onto the graph AFTER `createWorkflow`, so `createAgentNode` must capture lazily). - When provided, `buildChildInputs` keeps `toolDefinitions`, and the child's invoke attaches a `BaseCallbackHandler` forwarder that: - routes ON_TOOL_EXECUTE to the parent's handler (event-driven tools work), and - wraps other custom events (run_step, run_step_delta, run_step_completed, message_delta, reasoning_delta) in a new `ON_SUBAGENT_UPDATE` envelope so hosts can render child progress in a separate UI surface. - New `SubagentUpdateEvent` type and `ON_SUBAGENT_UPDATE` GraphEvents entry; start/stop/error envelopes fire around the child invoke. - Legacy isolation is preserved when no registry is passed. Test plan: - `npx jest SubagentExecutor SubagentTool` — 36 pass (4 new covering forwarding, toolDefinition retention, legacy strip, lazy getter). - Integration test against OpenAI (`subagent-event-driven-debug.ts`) confirms the child uses `calculator` via the parent's ON_TOOL_EXECUTE path and the host sees start/run_step/run_step_completed/stop ON_SUBAGENT_UPDATE envelopes. * fix: gate child toolDefinitions on ON_TOOL_EXECUTE handler + propagate tool_call_id Two follow-ups from review on #107: 1. Codex P1: `SubagentExecutor` was treating any non-null parent registry as "forwarding enabled" and always kept `toolDefinitions` on the child. Because `Run.create` always constructs a `HandlerRegistry`, this meant configurations with tools declared but no `ON_TOOL_EXECUTE` handler would hang forever — the child's `ToolNode` would dispatch a batch request and the forwarder would find no handler to resolve or reject it, leaving the promise pending. Now `keepToolDefinitions` is gated on `parentRegistry.getHandler(ON_TOOL_EXECUTE) != null`. A registry without the handler falls back to the legacy "strip toolDefinitions" path, which is a recoverable no-tools state instead of a hang. 2. `SubagentUpdateEvent` now carries `parentToolCallId`. `Graph.ts` pulls it from `config.toolCall.id` (LangChain threads the originating `ToolCall` onto `RunnableConfig`) when the subagent tool is invoked. This lets hosts correlate subagent updates to the parent tool call deterministically instead of inferring by event ordering — a concern raised on the LibreChat PR using this event stream. Also bumps to `3.1.67-dev.2`. Tests: `npx jest SubagentExecutor` — 38 pass (2 new covering the handler gate and `parentToolCallId` propagation). Integration script verified against real OpenAI: every ON_SUBAGENT_UPDATE envelope now carries the parent's `tool_call_id`. * chore: address audit findings — drop unused param, tighten cast, add tests Addresses the comprehensive audit of #107: ### MAJOR #1 — ON_TOOL_EXECUTE forwarding coverage Two new tests in the `event forwarding` suite drive the forwarder callback directly with a synthesized `ToolExecuteBatchRequest`: - `routes child ON_TOOL_EXECUTE dispatches through the parent registry` asserts the parent's `ON_TOOL_EXECUTE` handler is invoked, the batch's `resolve` fires with canned results, and the child receives them. - `does NOT forward ON_TOOL_EXECUTE when the parent registry has no handler (safe fallback)` documents the companion case: without a handler the `resolve` never fires, which is why the executor gates `keepToolDefinitions` on handler presence in the first place. ### MINOR #2 — unused `childGraph` parameter Removed from `createForwarderCallback`'s arg shape, destructuring, and call site. It was accepted as `_childGraph` and never read. ### MINOR #3 — `data as never` `EventHandler.handle` didn't include `ToolExecuteBatchRequest` in its accepted data union, which is why the forwarder used `as never` to shut the compiler up. Extended the union and switched the cast to the concrete `ToolExecuteBatchRequest` type — the forwarder is now fully type-checked on the hot path. ### MINOR #4 — error-phase envelope test `emits an 'error' phase envelope when the child graph throws` uses the throwing factory helper already in the file, registers a real `ON_SUBAGENT_UPDATE` handler, and asserts start/error ordering plus the `data.message` payload and `parentToolCallId` passthrough. ### MINOR #6 — dynamic imports in tests Replaced `await import('@/events')` / `await import('@/common')` with static top-level imports. Consistent with the rest of the file. ### MINOR #7 — `summarizeEvent` unit tests Exported the pure function from `@/tools/subagent` and added 10 focused tests covering every branch: tool_calls with one/many names, empty tool_calls, message_creation, ON_TOOL_EXECUTE with/without names, completed steps with/without a tool name, message_delta, and the unknown-event fallback. Every user-visible label is now regression protected. ### MINOR #8 — `config.toolCall` comment Expanded the JSDoc to note the specific LangChain source (`ToolRunnableConfig` in `@langchain/core/tools`, stable since 0.3.x) and that the defensive read degrades to `undefined` on upstream changes. ### NIT #9 — unused import Removed `createContentAggregator` from `subagent-tools-debug.ts`. ### NIT #10 — `awaitHandlers = true` doc Added a JSDoc block explaining the trade-off: required for `ON_TOOL_EXECUTE` to actually block on the host's resolve, but it also serializes observational events through any slow host handlers. Refinement path noted for later if host-side latency becomes a problem. Version bumped to `3.1.67-dev.3`. Tests: 51 pass (40 SubagentExecutor + 10 new summarizeEvent + 1 new error-phase + existing coverage retained). * fix: clear parent run summary and discoveredTools from child inputs Paired with Codex P1 on the LibreChat PR (danny-avila/LibreChat#12725) that uses this event stream. `buildChildInputs` shallow-spreads the parent's `AgentInputs`, which carries two run-scoped fields that leak into the isolated child by default: - `initialSummary` — cross-run conversation summary. On a conversation that has already been summarized, every child would inherit it and reason against unrelated prior chat. - `discoveredTools` — tool names the parent's LLM searched for in earlier turns via `tool_search`. The child gets primed to use whatever the parent happened to find. Both are cleared on the returned child inputs. Configuration fields (summarization policy, provider options, context pruning config) are preserved — those are policy, not context. Affects both paths: - Self-spawn: `resolveSubagentConfigs` clones parent `_sourceInputs`, then `buildChildInputs` strips. - Explicit children: host-built `AgentInputs` goes through the same `buildChildInputs` before the child graph runs. Version bumped to `3.1.67-dev.4`. Regression test: `strips parent-run-scoped initialSummary and discoveredTools from child inputs` — seeds parent inputs with `initialSummary: { text, tokenCount }` and `discoveredTools: [...]`, asserts both are `undefined` on the returned child. 52 pass. * test: cover summarizeEvent step.type fallback branches Addresses reviewer F2.1 (NIT) on #107. `summarizeEvent` resolves run-step detail type via `step.stepDetails?.type ?? step.type ?? 'step'`. The existing tests all passed `{ stepDetails: { type } }`, leaving the second and third clauses of the chain uncovered. Two new tests: - top-level `step.type` (no `stepDetails` wrapper) for both `tool_calls` and `message_creation` — exercises the second clause - empty `{}` payload — exercises the `'step'` default and the generic `Step: <detailType>` branch Carried MINOR #5 (`SubagentUpdateEvent.data: unknown`) left as a documented deferral; the JSDoc already notes "shape depends on phase" and a discriminated union is a larger refactor. 54/54 pass. * chore: add npm run scripts for subagent debug scripts - `npm run subagent` — supervisor + researcher/coder subagent demo (no tool use; from #104) - `npm run subagent:events` — event-driven flow matching LibreChat's architecture (toolDefinitions + ON_TOOL_EXECUTE handler + self-spawn with calculator). This is the script that demonstrates the #107 fix. - `npm run subagent:tools` — traditional LangChain tool instances with self-spawn; pre-fix parity case. All three accept `OPENAI_API_KEY` (and `--provider anthropic` for the first). * fix(summarization): catch model init errors and log them Move `initializeModel` inside the try/catch in `executeSummarizationWithFallback` so that errors like `Unsupported LLM provider: <name>` are surfaced through the `log('error', ...)` path and fall through to the metadata-stub fallback, instead of bubbling up silently. This is the defense-in-depth half of the fix for the LibreChat Discussion #12614 report: when a caller forwards an unrecognized summarization provider name (e.g. a custom-endpoint label), the error is now observable in operator logs rather than only the client UI.

* feat: add inert hook registry and executor scaffolding (phase 1/1) Introduces `src/hooks/` with types, registry, and executor for a 12-event hook lifecycle. Purely additive — not exported from `src/index.ts` and not yet wired into `Run`, `Graph`, or `ToolNode`. Integration lands in the next PR so this change is shippable independently with zero behavioral impact on hosts that don't opt in. - Discriminated-union input/output per event; generic `HookCallback<E>` and `HookMatcher<E>` for type-safe registration. - `HookRegistry` uses `Map<sessionId, bucket>` (not `Record`) to avoid O(n^2) churn under parallel registration in multi-tenant hosts. - `executeHooks` fans out in parallel, races hooks against a combined parent + timeout `AbortSignal` so non-listening hooks still time out, folds decisions with `deny > ask > allow` precedence, accumulates `additionalContext`, and self-removes `once: true` matchers after any successful hook. Errors are non-fatal: swallowed into the aggregated result and routed through an optional winston logger (falling back to `console.warn`), with internal matcher errors suppressed. 47 tests under `src/hooks/__tests__/` cover registry scoping, regex matching, precedence folding, once-self-removal (success/failure/mixed), timeout enforcement (including non-listening hooks), error non-fatality, and parent signal combination. * fix(hooks): address review findings — rename field, wire updatedOutput, cache patterns Resolves the comprehensive review on #87: - Rename `HookMatcher.matcher` -> `HookMatcher.pattern` to fix the self-referential naming (`m.matcher` reads as "the matcher's matcher"). - Wire `PostToolUseHookOutput.updatedOutput` through `AggregatedHookResult` and `fold()`. The type was previously a promise the executor didn't fulfill — a hook returning `{ updatedOutput: ... }` was silently dropped. - Correct the JSDoc on `updatedInput`/`updatedOutput`. The previous copy claimed non-determinism "in Promise.all resolution order" — but `Promise.all` preserves input order, so the fold is deterministic in registration order (outer loop over matchers, inner loop over each matcher's hooks). Updated the `executeHooks` function docstring to match. - Add regex compilation cache and length bound to `matchers.ts`. Patterns compile at most once per unique string and are reused across calls; patterns over MAX_PATTERN_LENGTH (512) are rejected outright as a cheap ReDoS mitigation. Documented the trust model (patterns are trusted input; hosts must validate user-supplied patterns upstream). - Document the wildcard-only semantics on query-less events: a non-empty pattern on an event that doesn't supply a matchQuery (RunStart, Stop, etc.) never matches. - Document the `once: true` concurrent-dispatch limitation: two parallel `executeHooks` calls each snapshot the matcher list, so `once` means "once per call", not "once globally". Matches Claude Code's semantics. Added a test that pins this behavior. - Merge `matchers.filter(...)` + task-build loop into a single pass per AGENTS.md "consolidate sequential O(n) operations." - Scope `eslint-disable no-console` to the single `console.warn` call in `reportErrors` instead of disabling it file-wide. - Fix the timeout-ignoring-hook test: clear the dangling `setTimeout` and assert the error surfaces an abort-shaped label. - Add tests: multi-hook `preventContinuation` (first-writer-wins on stopReason, flag without reason), `updatedOutput` flow + registration order, registration-order determinism for `updatedInput`, pattern length bound, compilation cache hit/miss/clear, concurrent `once` dispatch. Total: 47 -> 59 tests. Findings audited and resolved: - #1 ReDoS: cache + length bound + trust model doc (full fix blocked on host-side validation) - #2 Wrong updatedInput ordering JSDoc: fixed - #3 Dead `updatedOutput` type: wired through - #4 Concurrent `once` race: documented + test - #5 `HookMatcher.matcher` naming: renamed to `pattern` - #6 `WideCallback` dead code: rejected as inaccurate (used in `runHook`) - #7 Eslint-disable scope: line-scoped - #8 File-path comments: rejected; matches existing project convention - #9 Two-pass filter: single pass - #10 Regex recompilation: cached - CL-4 Undocumented wildcard semantics: documented - RT-3 Timeout test dangling timer: fixed + error content verified - RT-4 No multi-hook preventContinuation test: added * perf(hooks): atomic once, bounded LRU, ReDoS heuristic, shared signal Rework the phase-1 hooks scaffolding for the multi-tenant scale we actually operate at. These are behavior changes (semantics of `once` tightened, new pattern-rejection path) but none of them are wired into the runtime yet — this PR is still inert — so the blast radius is the hooks directory only. ## Atomic `once: true` Removal now happens synchronously inside `executeHooks`, between `registry.getMatchers` and the first `await`. Node's event loop serialises sync work, so two concurrent `executeHooks` calls can never both observe the same `once` matcher — whichever call runs its sync prefix first consumes it, and the loser sees an empty bucket. Semantic change: `once` is now at-most-one-dispatch, not at-most-one-successful-execution. If every hook in a `once` matcher throws, the matcher is still gone. The old retry-on-failure behavior was Claude Code's model — fine for a single-user CLI, but in a multi-tenant server it opens a window where concurrent bursts can re-dispatch a "once" hook non-idempotently. "At most once, ever" is the correct semantic for a shared registry. Hosts that need retry should register a normal matcher and self-unregister via the callback returned from `registry.register`. Dropped `collectOnceMatchersForRemoval` + its Map/array allocations — the post-hoc collection step is no longer needed. Tests: replaced "keeps the matcher when every hook throws" with "removes the matcher even when every hook in it throws". Added "fires exactly once across 3 concurrent calls" and "fires exactly once across 8-way concurrent dispatch when hooks are slow" (the slow test deliberately uses a 10ms hook to force overlap). ## Bounded LRU regex cache `matchers.ts` was unbounded. Under multi-tenant use where different tenants register different patterns, the cache would leak. Capped at `MAX_CACHE_SIZE = 256` entries with LRU eviction: hits refresh position (delete-then-set), misses at capacity evict the oldest key. Failed compiles are cached the same way so a tenant spamming bad patterns doesn't burn CPU on every call. Exposed `getMatcherCacheSize` and `MAX_CACHE_SIZE` for test observability. Tests verify eviction order (cold patterns recompiled after overflow) and that hot patterns refreshed mid-stream survive eviction pressure. ## ReDoS heuristic: nested-quantifier detector `hasNestedQuantifier` walks a pattern linearly, tracks paren depth, and rejects any pattern where a quantified group contains another quantifier — the `(a+)+`, `(.*)*`, `(\w+)+` shape that is the #1 catastrophic-backtracking source. Character classes (`[*+?]`) and escaped quantifiers (`\+`) are correctly ignored. Rejection happens before `new RegExp`, so adversarial input never reaches the compiler. This is a floor, not a ceiling — it doesn't catch ambiguous-alternation ReDoS like `(a|a)+`. Hosts that accept user-supplied patterns must still validate upstream. Documented in the `HookMatcher.pattern` and `matchesQuery` JSDoc. Test verifies that `(a+)+$` on a 35-char adversarial input resolves in under 50ms (would be seconds without the heuristic). ## Shared `AbortSignal` per matcher Was: one `AbortSignal.timeout(ms)` + one `AbortSignal.any([parent, timeout])` per hook. For a matcher with N hooks and the same timeout, that's N timers + N composite signals. Now: one signal per matcher, shared across all its hooks. Since hooks in a matcher already share `matcher.timeout`, they may as well share the timer. Behavior is identical (all hooks fire and race the same deadline) but allocation drops from N to 1 per matcher. ## Tests 72 passing (was 59, +13): - Atomic once: 2 concurrent tests + 1 failure-removal test updated - LRU: cache size cap, eviction order, hit refresh - Nested quantifier: classic shapes, deep nesting, escaped chars, character classes, false positives on legitimate patterns - ReDoS mitigation: rejection path, <50ms response on adversarial input * fix(hooks): fix ReDoS heuristic false-positives on (?:...) groups Resolves the follow-up review on #87. Four findings, all valid. ## #1 MAJOR — hasNestedQuantifier false-positives on group syntax The previous scanner treated `?` as a quantifier at any depth > 0, but `?` immediately after `(` is group syntax, not a quantifier: `(?:...)`, `(?=...)`, `(?!...)`, `(?<name>...)`, `(?<=...)`, `(?<!...)`. Patterns like `(?:pre_)?tool_name` were silently rejected and the registering hook never fired — with no error surfaced to the caller. Fix: explicit group-syntax prefix skip inside the `(` handler. When the scanner enters a group it peeks for `?` and advances past the modifier characters (`:`, `=`, `!`, `<=`, `<!`, or a named-group label `<name>`) before processing the group body. The `?` is therefore never considered a quantifier at the start of a group. While in there I also fixed a second correctness bug the reviewer didn't catch: the old depth-indexed array didn't propagate a child group's "contains quantifier" signal up to its parent, so `(?:(a+))+` (outer quantifier over a wrapped quantified group, equivalent to `(a+)+`) escaped detection. The scanner now uses an explicit stack of `QuantifierFrame` records, and when a child group closes it propagates `hasBacktrackRisk` to the parent frame — whether the child itself was quantified or not. This catches `(?:(a+))+` correctly while still allowing non-risky wrappers like `(?:(ab))+`. Added 9 tests covering non-capturing groups, lookaheads, lookbehinds, named captures, `(?:(a+))+` risk propagation, and `((ab)+)+` deep wrapping. Verified existing ReDoS-detection tests still pass. ## #2 MINOR — executeHooks.test.ts did not clear the pattern cache Matcher cache is module-level. `matchers.test.ts` clears it in `beforeEach`; `executeHooks.test.ts` did not, so patterns compiled during one test persisted across subsequent tests in the same file. Added `clearMatcherCache()` to `executeHooks`'s `beforeEach` — no test failures today, but restores test independence. ## #3 MINOR — Test utilities leaked into production barrel `clearMatcherCache` and `getMatcherCacheSize` are test-only helpers (their JSDoc says so) but were exported from `src/hooks/index.ts`. When the integration PR eventually re-exports `src/hooks` from `src/index.ts` they would become public API. Removed both from the barrel. Tests already import them directly from `../matchers`, so no test changes needed. `hasNestedQuantifier`, `MAX_PATTERN_LENGTH`, and `MAX_CACHE_SIZE` remain exported — hosts can legitimately use them for upstream validation. ## #4 NIT — LRU refresh overhead at low cache pressure Every `compile()` cache hit was doing `delete + set` to refresh LRU position, even when the cache was nowhere near capacity. With `MAX_CACHE_SIZE = 256` and typical working sets of tens of patterns, eviction pressure is near-zero and the refresh is pure overhead. Added a `LRU_REFRESH_THRESHOLD = 75%` gate: refresh only runs when the cache is above 192 entries. Below that the code fast-paths straight back from `compile()`. Above it the LRU semantics kick in so hot patterns still survive evictions — the existing "refreshes LRU position on hit" test still passes because by the time it runs the check, the cache is at capacity. Tests: 72 -> 81 (+9 for group-syntax and risk-propagation coverage).

* feat: wire tool hooks into event-driven ToolNode (phase 1/3) Fires PreToolUse, PostToolUse, PostToolUseFailure, and PermissionDenied hooks inside `ToolNode.dispatchToolEvents` — the single chokepoint for all event-driven tool execution in LibreChat. ## Hook lifecycle in dispatchToolEvents 1. **PreToolUse** fires per call in parallel before ON_TOOL_EXECUTE dispatch. Denied calls (deny or ask) produce error ToolMessages and fire PermissionDenied; surviving calls proceed with optional updatedInput applied to tool args. 2. Only approved calls are batched into ON_TOOL_EXECUTE and sent to the host. If all calls are denied, the batch dispatch is skipped entirely. 3. **PostToolUse** fires per successful result. If a hook returns updatedOutput, the ToolMessage content is replaced before the message is appended to the graph. 4. **PostToolUseFailure** fires per error result. Observational only — cannot modify the error message. Post hooks are awaited (not fire-and-forget) so updatedOutput takes effect before the ToolMessage is consumed. Hook errors are swallowed via .catch() to avoid disrupting the tool result flow. PermissionDenied is fire-and-forget since it's purely observational. ## Plumbing - `ToolNodeOptions.hookRegistry?: HookRegistry` — new optional field - `ToolNode` constructor stores `this.hookRegistry` - `Graph.createToolNode()` passes `this.hookRegistry` at both construction sites (event-driven and traditional) - `hookRegistry` is set on Graph BEFORE `createWorkflow()` is called (moved from the post-create assignment in Run constructor into `createLegacyGraph` and `createMultiAgentGraph`) so ToolNode receives a non-undefined registry at construction time ## hasHookFor guards All four hook fire points use `hasHookFor(event, runId)` before calling `executeHooks`, so when no hooks are registered for the event the overhead is a single Map lookup + array length check — no Promise allocations, no function calls. ## Tests 9 new integration tests in `src/hooks/__tests__/toolHooks.test.ts` using event-driven mode (toolDefinitions + ON_TOOL_EXECUTE handler): - PreToolUse fires with correct fields - PreToolUse deny blocks execution (tool handler never called) - PreToolUse ask blocks execution (v1) - PreToolUse updatedInput rewrites args before dispatch to host - PreToolUse hook errors are non-fatal (tool still executes) - PermissionDenied fires after deny with reason - PostToolUse fires with tool output - PostToolUseFailure fires on error result - No-hooks baseline works identically Total: 94 existing + 9 new = 103 passing. * fix(hooks): address PR 3 review — step completion, ordering, catch, tests Resolves all 10 findings from the review on #90. ## Structural rewrite of dispatchToolEvents The post-processing loop was rewritten to fix #1, #2, #6, #8, #10: - #1 MAJOR: Denied calls now dispatch ON_RUN_STEP_COMPLETED via a shared `dispatchStepCompleted` helper (extracted from the duplicate logic in the executed-results loop). Without this, the frontend would show stuck spinners for denied tool calls. - #2 MAJOR: Output messages are now collected in a Map keyed by tool_call_id, then returned in the original `toolCalls` input order via `toolCalls.map(call => messageByCallId.get(call.id))`. Before, denied messages were prepended to executed messages, breaking the implicit ordering contract. - #6 MINOR: `toolUsageCount` is now incremented only for approved calls (moved from the initial `preToolCalls.map` into the `approvedEntries.map` that builds ToolCallRequests). Denied calls no longer inflate the turn counter. - #8 MINOR: Eliminated the double `result.status === 'error'` check. ToolMessage is now constructed inside each status branch directly after hook dispatch. - #10 NIT: `requests.find()` replaced with a pre-built `Map<id, ToolCallRequest>` for O(1) lookup per result, per AGENTS.md guidance on Map/Set over Array.find. ## Error protection - #3 MAJOR: PreToolUse `Promise.all` now wraps each `executeHooks` call in `.catch(() => emptyResult)` so an infrastructure error in the hooks subsystem cannot crash the entire tool batch. Matches the stated "hook errors are non-fatal" contract and is consistent with PostToolUse/PostToolUseFailure which already had `.catch()`. ## New tests - #4 MAJOR: Added `updatedOutput replaces the ToolMessage content` — registers a PostToolUse hook returning `{updatedOutput: 'REDACTED'}`, verifies the ToolMessage content in the graph's run messages is `'REDACTED'` (not the original output). - #7 MINOR: PermissionDenied test replaced 50ms `setTimeout` with a `Promise` resolved by the hook callback. No more flaky sleep. ## Cleanup - #9 NIT: Updated stale "once the tool-hook PR lands" comment in `src/hooks/index.ts`. Tests: 103 -> 104 (+1 for updatedOutput). * fix(hooks): off-by-one in dispatchStepCompleted index, test guard, turn fallback - #1 MAJOR: dispatchStepCompleted read post-increment toolUsageCount for the `index` field. Added `turn?: number` parameter; approved-call site passes `request?.turn` (the pre-increment value), denied-call site omits it (fallback is correct since count was never bumped). - #2 NIT: updatedOutput test now asserts `expect(toolMsg).toBeDefined()` before content extraction so a missing tool message produces a clear failure rather than "Expected undefined to be 'REDACTED'". - #3 NIT: PreToolUse input `turn` field restored `?? 0` fallback so first-use tools see `turn: 0` instead of `turn: undefined`. * fix(hooks): batch tests, turn JSDoc, freeze sentinel, remove unused hookRegistry Addresses follow-up review findings on #90. - #1 MAJOR: Add multi-call batch tests — partial deny (echo denied, math approved, order preserved) and all-denied (ON_TOOL_EXECUTE handler never called). Uses counter-based IDs (#7 fix) to avoid Date.now() collisions across calls in the same test. - #2 MINOR: Document PreToolUseHookInput.turn semantics — within a single batch, all calls to the same tool share the same turn value. Per-call discrimination within a batch is not supported in v1. - #3 MINOR: Remove hookRegistry from the traditional (non-event-driven) ToolNode construction site in Graph.ts. Hooks only fire in dispatchToolEvents; passing the registry to a ToolNode that never uses it created a false impression of hook support. - #4 MINOR: Add PostToolUse error non-fatality test — throwing hook does not crash the batch, original content is preserved. - #5+#8 MINOR: Freeze the hook-error fallback sentinel via Object.freeze and rename from emptyResult to HOOK_FALLBACK. Prevents future mutation bugs if someone pushes to the shared arrays. - #6 MINOR: Document in ToolNodeOptions.hookRegistry JSDoc that hooks only fire for event-driven calls — directToolNames bypass dispatch. Tests: 104 -> 107 (+3 for batch partial-deny, all-denied, and PostToolUse error resilience). * nit: simplify test assertions, keep shallow freeze (deep conflicts with mutable type)

@remarks

#104) * feat: add subagent-as-tool primitive for hierarchical agent delegation Adds a SubagentTool that lets a parent agent delegate tasks to child agents with isolated context windows. Verbose tool outputs stay in the child's context; only a filtered text summary returns to the parent. Key components: - SubagentConfig type on AgentInputs for declaring child agent types - SubagentExecutor: creates child StandardGraph, invokes with isolated state, filters result (strips tool_use/thinking blocks), fires SubagentStart/SubagentStop hooks - SubagentTool: LCTool definition with dynamic enum from configs - Graph integration: tool created in createAgentNode() via graphTools, automatically becomes a direct tool (bypasses event-driven dispatch) - Self-spawn support: reuse parent's AgentInputs for context isolation without separate agent config Wires the existing SubagentStart (with deny support) and SubagentStop hook types that were defined but had zero fire points. * fix: resolve review findings for subagent primitive Address all 6 issues from PR review: 1. Add subagentHooks.test.ts — end-to-end hook integration test through real Run pipeline (SubagentStart payload, SubagentStop messages, deny blocks execution). 3 new tests. 2. Remove `as unknown as` cast — follow MultiAgentGraph pattern (init graphTools as empty array, push directly). 3. Deduplicate schema — extract buildSubagentToolParams() in SubagentTool.ts, use it in Graph.ts. createSubagentToolDefinition() now delegates to it. Single source of truth for schema/description. 4. Propagate threadId — tool callback extracts thread_id from LangGraph config.configurable, passes to executor.execute(). Hooks now receive correct threadId. 5. Pass tokenCounter to child graph — added to SubagentExecutorOptions, threaded from agentContext.tokenCounter through to child StandardGraph constructor. 6. Pass signal in invoke() config — workflow.invoke() now receives signal for more responsive abort between graph steps. * fix: resolve second-pass audit findings for subagent primitive Address 12 valid findings from the audit — 3 MAJOR, 6 MINOR, 3 NIT: MAJOR fixes: 1. Depth tracking now works across graph boundaries. Previously the `depth`/`maxDepth` check only worked within a single executor; when `allowNested: true`, the child graph spawned its own executor at depth=0, making `maxSubagentDepth` non-functional. Fix: decrement `maxSubagentDepth` in `buildChildInputs`, so the child's own executor inherits a smaller budget and naturally blocks further nesting when it reaches 0. Removed the now-vestigial `depth` field. 2. Detect duplicate `SubagentConfig.type` values. `Map` construction silently dropped duplicate keys. Now `resolveSubagentConfigs` throws with a clear error. 3. Add test coverage for `allowNested: true`: verifies depth is decremented across boundaries, child inherits subagentConfigs, and budget is clamped to >= 0. MINOR fixes: 4. Runtime guard for empty `description` in the tool callback — the LLM can send undefined/empty values; now we default to a safe fallback string. 5. Updated `src/hooks/index.ts` barrel comment to list SubagentExecutor as the fourth hook consumer (SubagentStart/SubagentStop). 6. Extracted property description strings as constants in SubagentTool.ts; `SubagentToolSchema` and `buildSubagentToolParams` now share a single source of truth. 7. Changed SubagentStop from fire-and-forget to awaited, matching the PostCompact pattern. Removes fragile `setTimeout` synchronization in tests. Errors still swallowed (observational). 8. Truncate error messages from child graph to 200 chars to prevent leaking internal details (stack traces, API errors) to the parent LLM. 9. Document that `ask` decision is treated identically to `deny` in the non-interactive subagent context. NIT fixes: 11. Import order in SubagentExecutor.ts — `@/types` (multi-line, longest) now comes first in the local types sub-group. 12. Removed unused `contentParts` destructuring in verification script. 13. Fixed unsafe `msg.content as string` cast in verification script; now handles array content correctly. Skipped #10 (NIT): `_sourceInputs` underscore prefix is an accepted TS convention for cross-file-accessible-but-semantically-internal fields. Test count: 1074 passed (up from 1065, +9 new tests), 0 regressions. * fix: address third-pass audit findings for subagent primitive Resolve 3 NITs from the second review pass plus 2 new codex comments: NIT #1 — Tighten truncation test: previously asserted only `result.content.length < 500`. Now asserts exact envelope of 219 chars ("Subagent error: " prefix + 200 body + "..." suffix) so that regressions in ERROR_MESSAGE_MAX_CHARS are caught. Added companion test verifying short messages are not truncated. NIT #2 — Skip tool creation when maxSubagentDepth: 0. When a host set `maxSubagentDepth: 0` alongside `subagentConfigs`, the tool was still bound to the LLM, and every invocation returned the depth- exceeded error. The LLM wasted a turn. Now the `createAgentNode` guard includes `effectiveSubagentDepth > 0`, so the tool is not created at the leaf level. Added integration test. NIT #3 — Document buildChildInputs audience. Added @remarks block explaining this is an advanced utility exported primarily for testing and internal use; hosts configuring subagents should not call it directly. Codex P1 (false positive, clarified): maxSubagentDepth propagation across graph boundaries was already correct via the countdown in buildChildInputs, but it wasn't obvious from the call site. Added a multi-line JSDoc comment above the executor construction in createAgentNode explaining how depth consumes across boundaries through the decremented AgentInputs.maxSubagentDepth. Codex P2 (real bug, fixed): toolSchemaTokens did not include the subagent tool's schema. `calculateInstructionTokens()` was called in `fromConfig()` before graphTools was populated, so the later- injected subagent tool's schema was missing from the budget. Fixes: 1. Include `graphTools` in the iteration in `calculateInstructionTokens` (also helps handoff tools, a pre-existing issue with the same root cause). 2. Re-trigger the calculation at the end of `createAgentNode` after the subagent tool is pushed, so the token count reflects the final graphTools state before `createCallModel` awaits `tokenCalculationPromise`. Added test asserting `toolSchemaTokens` is larger for agents with subagents vs without, confirming the recalculation runs. Test count: 1077 passed (up from 1074, +3 new tests), 0 regressions. * fix: break circular dep between SubagentExecutor and StandardGraph Rollup warned on build: Graph.ts → subagent/index.ts → SubagentExecutor.ts → Graph.ts SubagentExecutor needed `StandardGraph` at runtime to construct the child graph; Graph.ts needs SubagentExecutor to attach the tool. With preserveModules, these land in separate chunks producing circular chunk imports with undefined execution order. Fix: dependency injection. SubagentExecutor now takes a required `createChildGraph: ChildGraphFactory` option and only imports `StandardGraph` as `import type` (erased at build). Graph.ts passes `(input) => new StandardGraph(input)` when constructing the executor. Runtime module graph is now acyclic; `import type` does not appear in emitted JS. Test updates: replaced `jest.spyOn(GraphModule, 'StandardGraph')` (which relied on the runtime import we just removed) with direct mock factories passed through the new option. Cleaner — tests no longer need to reach into module internals. Same coverage. Build: `npx rollup -c` now emits zero warnings. Test count: 1077 passed, 0 regressions. * chore: add multi-agent-subagent script to package.json Introduced a new script entry for multi-agent-subagent, enabling execution of the corresponding TypeScript file with necessary configurations. This addition enhances the multi-agent functionality by allowing for more granular control over subagent operations. * fix: isolate child subagent from parent's callback chain The manual verification script (`npm run multi-agent-subagent`) crashed with: Error: No agent context found for agent ID researcher at StandardGraph.getAgentContext at ChatModelStreamHandler.handle Root cause: `Run.processStream` drives the parent via `streamEvents`, which captures events from every nested runnable in the call tree — including the child graph's LLM calls. When the child's "researcher" agent streamed `on_chat_model_stream` events, the parent's `ChatModelStreamHandler` received them and tried to resolve the child's agent ID in the parent's `agentContexts` map, which only contains the supervisor. Error thrown. The Phase 1 plan was for child events to stay fully isolated. Unit tests missed this because the child graph was always mocked (no real `streamEvents` plumbing); only running the full pipeline end-to-end surfaced it. Fix: pass `callbacks: []` in the child `workflow.invoke()` config. This overrides the inherited callbacks for that invocation, breaking the event propagation chain. Also set `runName: subagent:<type>` and a child-scoped `configurable.thread_id` so the child has a distinct trace root rather than polluting the parent's trace. Manual verification now works: supervisor delegates to researcher, receives filtered text, synthesizes the final answer. Test suite: 1079 passed, 0 subagent-suite regressions. The two newly failing suites (ProgrammaticToolCalling / ToolSearch Live API tests) are pre-existing — they require `LIBRECHAT_CODE_BASEURL` to be configured and fail identically on the prior commit. They were skipped before because OPENAI_API_KEY was absent. * feat: align filterSubagentResult fallback with Claude Code When a subagent's last AIMessage is pure tool_use (e.g. the model hit maxTurns mid-tool-call), the previous implementation returned the "Task completed" sentinel, discarding any textual progress from earlier turns. Claude Code's agentToolUtils.finalizeAgentTool walks backwards from the last message, and on messages without text content continues scanning earlier AIMessages, surfacing the most recent textual thought instead of the sentinel. This change matches that behavior. The three `return` sites inside the loop that previously exited with the sentinel now `continue` when the AIMessage yields no text; the loop exits with "Task completed" only when no AIMessage in the history contains any text. Concrete example: [0] Human: "What's the capital of France?" [1] AI: "Let me search." + tool_use(search, ...) [2] Tool: "Paris." [3] AI: tool_use(search, ...) <-- maxTurns, no text Before: "Task completed" After: "Let me search." Added two tests — the turn-limit scenario above, and the analogous empty-string-content edge case. All nine existing filterSubagentResult tests still pass unchanged. * fix: update model name for non-Anthropic API to gpt-5.4 Changed the model name used in the multi-agent subagent script from 'gpt-4.1' to 'gpt-5.4' for improved performance and capabilities. This update ensures compatibility with the latest API offerings.

* feat: forward subagent child events and enable event-driven tools Adds an opt-in event forwarding path for `SubagentExecutor` so host apps that rely on event-driven tool dispatch (`toolDefinitions` + `ON_TOOL_EXECUTE` handler) can use subagents — including self-spawn. Before: a subagent in event-driven mode got `toolDefinitions` stripped in `buildChildInputs` and ran with `callbacks: []`, so the LLM had no tool schemas and ON_TOOL_EXECUTE never reached the host. The subagent replied in prose without invoking any tools. Now: - `SubagentExecutor` accepts `parentHandlerRegistry` (a direct registry or a lazy getter — `Run` wires the registry onto the graph AFTER `createWorkflow`, so `createAgentNode` must capture lazily). - When provided, `buildChildInputs` keeps `toolDefinitions`, and the child's invoke attaches a `BaseCallbackHandler` forwarder that: - routes ON_TOOL_EXECUTE to the parent's handler (event-driven tools work), and - wraps other custom events (run_step, run_step_delta, run_step_completed, message_delta, reasoning_delta) in a new `ON_SUBAGENT_UPDATE` envelope so hosts can render child progress in a separate UI surface. - New `SubagentUpdateEvent` type and `ON_SUBAGENT_UPDATE` GraphEvents entry; start/stop/error envelopes fire around the child invoke. - Legacy isolation is preserved when no registry is passed. Test plan: - `npx jest SubagentExecutor SubagentTool` — 36 pass (4 new covering forwarding, toolDefinition retention, legacy strip, lazy getter). - Integration test against OpenAI (`subagent-event-driven-debug.ts`) confirms the child uses `calculator` via the parent's ON_TOOL_EXECUTE path and the host sees start/run_step/run_step_completed/stop ON_SUBAGENT_UPDATE envelopes. * fix: gate child toolDefinitions on ON_TOOL_EXECUTE handler + propagate tool_call_id Two follow-ups from review on #107: 1. Codex P1: `SubagentExecutor` was treating any non-null parent registry as "forwarding enabled" and always kept `toolDefinitions` on the child. Because `Run.create` always constructs a `HandlerRegistry`, this meant configurations with tools declared but no `ON_TOOL_EXECUTE` handler would hang forever — the child's `ToolNode` would dispatch a batch request and the forwarder would find no handler to resolve or reject it, leaving the promise pending. Now `keepToolDefinitions` is gated on `parentRegistry.getHandler(ON_TOOL_EXECUTE) != null`. A registry without the handler falls back to the legacy "strip toolDefinitions" path, which is a recoverable no-tools state instead of a hang. 2. `SubagentUpdateEvent` now carries `parentToolCallId`. `Graph.ts` pulls it from `config.toolCall.id` (LangChain threads the originating `ToolCall` onto `RunnableConfig`) when the subagent tool is invoked. This lets hosts correlate subagent updates to the parent tool call deterministically instead of inferring by event ordering — a concern raised on the LibreChat PR using this event stream. Also bumps to `3.1.67-dev.2`. Tests: `npx jest SubagentExecutor` — 38 pass (2 new covering the handler gate and `parentToolCallId` propagation). Integration script verified against real OpenAI: every ON_SUBAGENT_UPDATE envelope now carries the parent's `tool_call_id`. * chore: address audit findings — drop unused param, tighten cast, add tests Addresses the comprehensive audit of #107: ### MAJOR #1 — ON_TOOL_EXECUTE forwarding coverage Two new tests in the `event forwarding` suite drive the forwarder callback directly with a synthesized `ToolExecuteBatchRequest`: - `routes child ON_TOOL_EXECUTE dispatches through the parent registry` asserts the parent's `ON_TOOL_EXECUTE` handler is invoked, the batch's `resolve` fires with canned results, and the child receives them. - `does NOT forward ON_TOOL_EXECUTE when the parent registry has no handler (safe fallback)` documents the companion case: without a handler the `resolve` never fires, which is why the executor gates `keepToolDefinitions` on handler presence in the first place. ### MINOR #2 — unused `childGraph` parameter Removed from `createForwarderCallback`'s arg shape, destructuring, and call site. It was accepted as `_childGraph` and never read. ### MINOR #3 — `data as never` `EventHandler.handle` didn't include `ToolExecuteBatchRequest` in its accepted data union, which is why the forwarder used `as never` to shut the compiler up. Extended the union and switched the cast to the concrete `ToolExecuteBatchRequest` type — the forwarder is now fully type-checked on the hot path. ### MINOR #4 — error-phase envelope test `emits an 'error' phase envelope when the child graph throws` uses the throwing factory helper already in the file, registers a real `ON_SUBAGENT_UPDATE` handler, and asserts start/error ordering plus the `data.message` payload and `parentToolCallId` passthrough. ### MINOR #6 — dynamic imports in tests Replaced `await import('@/events')` / `await import('@/common')` with static top-level imports. Consistent with the rest of the file. ### MINOR #7 — `summarizeEvent` unit tests Exported the pure function from `@/tools/subagent` and added 10 focused tests covering every branch: tool_calls with one/many names, empty tool_calls, message_creation, ON_TOOL_EXECUTE with/without names, completed steps with/without a tool name, message_delta, and the unknown-event fallback. Every user-visible label is now regression protected. ### MINOR #8 — `config.toolCall` comment Expanded the JSDoc to note the specific LangChain source (`ToolRunnableConfig` in `@langchain/core/tools`, stable since 0.3.x) and that the defensive read degrades to `undefined` on upstream changes. ### NIT #9 — unused import Removed `createContentAggregator` from `subagent-tools-debug.ts`. ### NIT #10 — `awaitHandlers = true` doc Added a JSDoc block explaining the trade-off: required for `ON_TOOL_EXECUTE` to actually block on the host's resolve, but it also serializes observational events through any slow host handlers. Refinement path noted for later if host-side latency becomes a problem. Version bumped to `3.1.67-dev.3`. Tests: 51 pass (40 SubagentExecutor + 10 new summarizeEvent + 1 new error-phase + existing coverage retained). * fix: clear parent run summary and discoveredTools from child inputs Paired with Codex P1 on the LibreChat PR (danny-avila/LibreChat#12725) that uses this event stream. `buildChildInputs` shallow-spreads the parent's `AgentInputs`, which carries two run-scoped fields that leak into the isolated child by default: - `initialSummary` — cross-run conversation summary. On a conversation that has already been summarized, every child would inherit it and reason against unrelated prior chat. - `discoveredTools` — tool names the parent's LLM searched for in earlier turns via `tool_search`. The child gets primed to use whatever the parent happened to find. Both are cleared on the returned child inputs. Configuration fields (summarization policy, provider options, context pruning config) are preserved — those are policy, not context. Affects both paths: - Self-spawn: `resolveSubagentConfigs` clones parent `_sourceInputs`, then `buildChildInputs` strips. - Explicit children: host-built `AgentInputs` goes through the same `buildChildInputs` before the child graph runs. Version bumped to `3.1.67-dev.4`. Regression test: `strips parent-run-scoped initialSummary and discoveredTools from child inputs` — seeds parent inputs with `initialSummary: { text, tokenCount }` and `discoveredTools: [...]`, asserts both are `undefined` on the returned child. 52 pass. * test: cover summarizeEvent step.type fallback branches Addresses reviewer F2.1 (NIT) on #107. `summarizeEvent` resolves run-step detail type via `step.stepDetails?.type ?? step.type ?? 'step'`. The existing tests all passed `{ stepDetails: { type } }`, leaving the second and third clauses of the chain uncovered. Two new tests: - top-level `step.type` (no `stepDetails` wrapper) for both `tool_calls` and `message_creation` — exercises the second clause - empty `{}` payload — exercises the `'step'` default and the generic `Step: <detailType>` branch Carried MINOR #5 (`SubagentUpdateEvent.data: unknown`) left as a documented deferral; the JSDoc already notes "shape depends on phase" and a discriminated union is a larger refactor. 54/54 pass. * chore: add npm run scripts for subagent debug scripts - `npm run subagent` — supervisor + researcher/coder subagent demo (no tool use; from #104) - `npm run subagent:events` — event-driven flow matching LibreChat's architecture (toolDefinitions + ON_TOOL_EXECUTE handler + self-spawn with calculator). This is the script that demonstrates the #107 fix. - `npm run subagent:tools` — traditional LangChain tool instances with self-spawn; pre-fix parity case. All three accept `OPENAI_API_KEY` (and `--provider anthropic` for the first).

Codex P1 (preserve session hooks across HITL resumes): - `Run.processStream` finally block was unconditionally calling `hookRegistry.clearSession(this.id)` — including after an interrupt fired. The very next call is `Run.resume()`, which needs the same policy hooks (e.g. the `PreToolUse` matcher that produced the interrupt) to fire on the re-executed node and uphold the approval flow. Clearing leaked the gate on resume. Now gated on `this._interrupt == null` so the session survives until either natural completion, error, or hook-driven halt. - Regression test covers the full lifecycle: interrupt → session matcher still present, resume → hook fires again (preCallCount === 2, proving policy actually applied), then natural completion → matcher cleared. Comprehensive review (audited 10 findings, addressed 7 valid): #2 — Mixed deny/ask/allow batch test: previously missing. Added a test with three tools where the policy denies one, asks one, and allows one. Verifies that on first pass only the ask appears in the interrupt and NO tools dispatched (LangGraph rolls back the node's effects on `interrupt()` throw — partial execution while a human is being asked would leak side effects ahead of approval). On resume, all three tools land in state with the correct outcomes. #4 — Documented in `ToolApprovalDecision` JSDoc that `respond` does NOT fire per-tool `PostToolUse` (no real execution happened) but DOES appear in the `PostToolBatch` entry array, so batch-level audit / convention hooks see the full set of outcomes. #5 — Added clarifying JSDoc at both hook-context injection sites (`Run.runPreStreamHooks`, `ToolNode.dispatchToolEvents`) explaining why we use `HumanMessage` with `additional_kwargs.role: 'system'`: mid-conversation `SystemMessage`s are rejected by Anthropic and Google providers; the `role` field is metadata only for hosts inspecting state. Mirrors the existing `convertInjectedMessages` convention. #6 — Decomposed `processStream`: extracted the pre-stream hook block (RunStart + UserPromptSubmit + context injection + halt short-circuit) into a private `runPreStreamHooks()` helper. Returns a boolean: `true` to halt, `false` to proceed. Cuts the method's body and matches AGENTS.md "Break complex operations into well-named helpers." #8 — Moved `HookHaltSignal` interface above the `HookRegistry` class JSDoc block so it doesn't read like a continuation of the Map-mutation rationale. #9 — `blockEntry` now records `turn` on its `PostToolBatch` entry push, matching the executed-path push. Same applies to the `respond` branch. Batch hooks now see uniform entry shapes regardless of outcome. Not addressed (intentionally, per project direction): - #1 default-on HITL: deliberate — most apps expect this. JSDoc on `RunConfig.humanInTheLoop` already calls out the contract. - #3 mixed-resume test: already covered by the existing "bundles multiple ask decisions into a single interrupt and resolves per call" test (resumes with `[approve, reject]`). - #7 unconditional MemorySaver allocation: HITL-on-by-default contract; opt out with `{ enabled: false }` for the latency- sensitive path. Per-Run MemorySaver overhead is small. - #10 `HumanInterruptType` exported but unused: kept as public API surface; LibreChat's wire types reference it. Validation: 32/32 HITL tests pass (2 new under "Codex review fixes" describe), full hooks/tools/summarization suite (621 tests) green, typecheck clean, lint clean on touched files (one pre-existing warning at `src/tools/ToolNode.ts:751` from `df6b739`, untouched), build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* 🛡️ feat: First-class HITL & permissions surface Adds the SDK side of human-in-the-loop and tool-permission support so hosts can build approval flows, audit policies, and ask-user prompts against a stable surface that mirrors Claude Code's vocabulary. What's in: - HITL on by default. `RunConfig.humanInTheLoop` is opt-out via `{ enabled: false }`. When enabled and the host did not provide a checkpointer, `Run.create` installs a process-local `MemorySaver` fallback so `interrupt()` / `Command({ resume })` work out of the box. - ToolNode bundles every `ask`-decision tool call from one batch into a single `interrupt()` carrying a `tool_approval` payload, anchored to the node config via `AsyncLocalStorageProviderSingleton.runWithConfig` (works around `ToolNode.trace = false`). Resume decisions support approve / reject / edit / respond, accepted as either an array or a record keyed by `tool_call_id`. - AskUserQuestion as a separate `HumanInterruptType`. New `askUserQuestion()` helper for use inside any custom node; `Run.resume<T>` is generic so the same method handles both interrupt categories. Type guards `isToolApprovalInterrupt` / `isAskUserQuestionInterrupt` for narrowing the payload union. - `createToolPolicyHook` declarative permission hook with allow / deny / ask lists + `mode: 'default' | 'dontAsk' | 'bypass'`. Glob `*` patterns. Evaluation order matches Claude Code's: deny → bypass → allow → ask → dontAsk → fallthrough(ask). - New `PostToolBatch` hook event with full `PostToolBatchEntry[]` per dispatch. Per-hook `allowedDecisions` override flows into the interrupt's `review_configs`. `additionalContext` from any hook now injects a single `HumanMessage` into the conversation (previously collected but discarded). `preventContinuation` honored both pre-stream (early return) and mid-flight (`HookRegistry.haltRun()` signal polled by `Run.processStream`, skips the `Stop` hook). Async fire-and-forget hooks via `{ async: true }` — output is ignored, background work continues. - `Run.getInterrupt()` and `Run.getHaltReason()` for hosts to inspect why the run paused or stopped. Re-exports `Command`, `MemorySaver`, `BaseCheckpointSaver`, `INTERRUPT`, `interrupt`, `isInterrupted`, and the `Interrupt` type from `@langchain/langgraph` so hosts that build durable checkpointers extend the same instance the SDK was compiled against (no version skew). 49 new tests across `src/hooks/__tests__/createToolPolicyHook.test.ts` and `src/tools/__tests__/hitl.test.ts`; full hooks/tools/summarization suite (616 tests) green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🔖 chore: bump version to 3.1.75-dev.2 Carries the HITL & permissions surface to the next dev pre-release so LibreChat can pin it during Slice B integration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🛠️ fix: address Codex review on HITL ToolNode Three issues from the PR #134 review: 1. P1: Unknown HITL resume decision types now fail closed instead of silently flowing into `approvedEntries`. Hosts deserialize resume payloads from untyped JSON, so a typo (`'aproved'`) or schema drift would have bypassed the approval gate entirely. Restructured to explicit approve/edit/reject/respond branches with a typed-widened final check that routes anything else through `blockEntry` with a diagnostic reason. 2. P2: `PostToolBatch` entries now carry the post-`PostToolUse`-hook output. Previously the batch entry recorded `result.content` even when a hook had replaced it via `updatedOutput`, so audit / redaction / convention-injecting batch hooks observed stale or unredacted data. Added a loop-scoped `finalToolOutput` that the PostToolUse-hook branch updates and the batch entry reads. 3. P2: `PostToolUseFailure` `additionalContext` is now injected into the next model turn. The hook result was awaited but discarded, so recovery guidance returned on tool errors (e.g. "if this errors, suggest the user retry with X") was silently dropped despite the API surface advertising the field. Now collected into the same batch additional-contexts accumulator. Three new tests in `src/tools/__tests__/hitl.test.ts` under the new "Codex review fixes" describe block: - host sends `{ type: 'aproved' }` → blocked, tool never dispatched - PostToolUse rewrite → batch entry sees the redacted output - tool errors with PostToolUseFailure additionalContext → injected 31/31 HITL tests pass; full hooks/tools/summarization suite (619 tests) still green; typecheck clean; build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 📝 docs: correct HITL default-on contract on RunConfig.humanInTheLoop Codex review on PR #134 caught that the JSDoc still described the opt-in semantics from before the default-on flip. Integrators reading the type docs would omit the field expecting no interrupts, then hit unexpected pauses + a checkpointer fallback in production. Aligns the contract with runtime behavior: HITL is on by default (omit or `{ enabled: true }` engages everything); opt out with `{ enabled: false }` to restore the pre-HITL fail-closed path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🛠️ fix: address comprehensive review on HITL surface Codex P1 (preserve session hooks across HITL resumes): - `Run.processStream` finally block was unconditionally calling `hookRegistry.clearSession(this.id)` — including after an interrupt fired. The very next call is `Run.resume()`, which needs the same policy hooks (e.g. the `PreToolUse` matcher that produced the interrupt) to fire on the re-executed node and uphold the approval flow. Clearing leaked the gate on resume. Now gated on `this._interrupt == null` so the session survives until either natural completion, error, or hook-driven halt. - Regression test covers the full lifecycle: interrupt → session matcher still present, resume → hook fires again (preCallCount === 2, proving policy actually applied), then natural completion → matcher cleared. Comprehensive review (audited 10 findings, addressed 7 valid): #2 — Mixed deny/ask/allow batch test: previously missing. Added a test with three tools where the policy denies one, asks one, and allows one. Verifies that on first pass only the ask appears in the interrupt and NO tools dispatched (LangGraph rolls back the node's effects on `interrupt()` throw — partial execution while a human is being asked would leak side effects ahead of approval). On resume, all three tools land in state with the correct outcomes. #4 — Documented in `ToolApprovalDecision` JSDoc that `respond` does NOT fire per-tool `PostToolUse` (no real execution happened) but DOES appear in the `PostToolBatch` entry array, so batch-level audit / convention hooks see the full set of outcomes. #5 — Added clarifying JSDoc at both hook-context injection sites (`Run.runPreStreamHooks`, `ToolNode.dispatchToolEvents`) explaining why we use `HumanMessage` with `additional_kwargs.role: 'system'`: mid-conversation `SystemMessage`s are rejected by Anthropic and Google providers; the `role` field is metadata only for hosts inspecting state. Mirrors the existing `convertInjectedMessages` convention. #6 — Decomposed `processStream`: extracted the pre-stream hook block (RunStart + UserPromptSubmit + context injection + halt short-circuit) into a private `runPreStreamHooks()` helper. Returns a boolean: `true` to halt, `false` to proceed. Cuts the method's body and matches AGENTS.md "Break complex operations into well-named helpers." #8 — Moved `HookHaltSignal` interface above the `HookRegistry` class JSDoc block so it doesn't read like a continuation of the Map-mutation rationale. #9 — `blockEntry` now records `turn` on its `PostToolBatch` entry push, matching the executed-path push. Same applies to the `respond` branch. Batch hooks now see uniform entry shapes regardless of outcome. Not addressed (intentionally, per project direction): - #1 default-on HITL: deliberate — most apps expect this. JSDoc on `RunConfig.humanInTheLoop` already calls out the contract. - #3 mixed-resume test: already covered by the existing "bundles multiple ask decisions into a single interrupt and resolves per call" test (resumes with `[approve, reject]`). - #7 unconditional MemorySaver allocation: HITL-on-by-default contract; opt out with `{ enabled: false }` for the latency- sensitive path. Per-Run MemorySaver overhead is small. - #10 `HumanInterruptType` exported but unused: kept as public API surface; LibreChat's wire types reference it. Validation: 32/32 HITL tests pass (2 new under "Codex review fixes" describe), full hooks/tools/summarization suite (621 tests) green, typecheck clean, lint clean on touched files (one pre-existing warning at `src/tools/ToolNode.ts:751` from `df6b739`, untouched), build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🛠️ fix: address follow-up review on HITL extraction Four findings from the follow-up review on commit 389cc88: #1 (MAJOR) — runPreStreamHooks side-effect documentation. The extracted helper had three side effects (mutating `stateInputs.messages`, mutating `config.callbacks`, calling `registry.clearSession`) but the JSDoc only mentioned the first. Added a "Side effects" section that calls out each one and explains why `config.callbacks` is dropped inside the helper rather than by the caller (mirrors processStream's natural-completion cleanup). Also added a one-line comment on the defensive `this.Graph == null` guard explaining why it's there even though `processStream` already validates Graph (TS narrowing doesn't propagate across method boundaries; the guard keeps the body free of `!` assertions). #2 (MINOR) — stream errors after interrupt detection no longer preserve stale session hooks. Previously the `finally` guard checked only `_interrupt == null`, so a downstream handler throwing after the interrupt was stashed would re-throw, hit the finally with `_interrupt != null`, and leak the session matchers into the next run. Added a `streamThrew` flag set in the `catch` block and widened the guard to `_interrupt == null || streamThrew` so a captured-but-stale interrupt also triggers cleanup. New regression test covers this exactly: `customHandler` throws once it detects `run.getInterrupt() != null`, asserts the run still exposed the interrupt, AND that session hooks were cleared anyway. #3 (MINOR) — mixed deny/ask/allow batch test now subscribes a `PostToolBatch` hook and verifies the batch entry shape after resume reflects the FINAL outcomes (tool_a → error with policy reason, tool_b → success with ran:tool_b output, tool_c → success with ran:tool_c output). Closes the validation loop on the PostToolBatch entry data that batch-level audit / convention hooks would observe. #4 (NIT) — blockEntry's `turn` JSDoc no longer overpromises "mirrors the executed-path turn." Now correctly describes it as the pre-invocation count of prior successful invocations of the same tool name, with the same shape the executed path captures before incrementing. Not changed: - #5 (NIT) defensive Graph-null guard kept; reviewer agreed "no change needed." Added a one-line comment for clarity. Validation: 34/34 HITL tests (2 new under "Codex review fixes" describe), 622 across hooks/tools/summarization, typecheck clean, lint clean on touched files (only the pre-existing src/tools/ToolNode.ts:751 warning untouched), build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🛠️ fix: address two more Codex P2s on ToolNode HITL flow Two findings on the HITL deny path; both were silent correctness bugs in mixed batches: 1. Duplicate ON_RUN_STEP_COMPLETED on resume (deny + ask batch) — `blockEntry` was dispatching `ON_RUN_STEP_COMPLETED` and the `PermissionDenied` hook synchronously the moment a deny decision landed. When the same batch also held an `ask` decision, the subsequent `interrupt()` threw and LangGraph rolled back the node; on resume the node re-executed from scratch, `blockEntry` fired again, and the host saw two completion events for one logical denial. Fix: queue both side effects in `deferredBlockedSideEffects` instead of firing immediately, then flush exactly once at the end of the hook-handling block. The first-pass throw skips the flush; resume / no-interrupt passes reach it once. New test counts the dispatched `ON_RUN_STEP_COMPLETED` events for the denied tool's call_id and asserts exactly one across the interrupt + resume lifecycle. 2. PostToolBatch entry order non-deterministic in mixed batches — entries used to be `push`'d into a flat array at three different call sites (deny synchronous, approved post-execution, respond on resume), so a batch with `[allowed, denied]` could surface `[denied, allowed]` in the entry list whenever the deny path resolved first. The API contract documents these entries as reflecting the original `toolCalls` order, and hooks correlating outcomes by position would misattribute. Fix: track entries in a `Map<callId, PostToolBatchEntry>` and materialize the array in `toolCalls` order at dispatch time. Two new tests pin the order under both `[deny, allow]` and `[allow, deny]` orderings; the existing mixed-batch test now also asserts the full `[call_a, call_b, call_c]` sequence on the snapshot. Validation: 36/36 HITL tests (3 new), 624 across hooks/tools/ summarization, typecheck clean, lint clean (only pre-existing ToolNode.ts:751 warning untouched), build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🧹 chore: tighten mixed-batch test assertion + correct stale comment Round 3 review #2 (NIT). The mixed deny/ask/allow batch test comment claimed PostToolBatch fires on the interrupted pass with only tool_a, then again on resume with all three entries. That was incorrect: PostToolBatch is dispatched at the bottom of `dispatchToolEvents`, after tool execution, so `interrupt()` throws before the dispatch ever runs. Only the resume pass yields a snapshot. Tightened the assertion from `toBeGreaterThanOrEqual(1)` to `toHaveLength(1)` so the test pins the actual behavior, and reworded the comment to match. Future contributors reading this test now build the right mental model of when PostToolBatch fires relative to interrupts. The snapshot's per-entry assertions and order assertion (added in commit 6a9b6fd) are unchanged. Round 3 review #1 (the duplicate denial side-effect events) was already addressed in commit 6a9b6fd via the `deferredBlockedSideEffects` queue + post-interrupt flush. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🧹 chore: handle two NITs from Round 4 review #1: Add explanatory comment on the `respond` decision branch's immediate `dispatchStepCompleted` call. Without context, the asymmetry with `blockEntry`'s deferred dispatch reads as inconsistent — a future contributor might either "fix" respond to also defer (unnecessary, wastes effort) or wonder if it's a latent bug. The new comment explains: respond only executes inside the decision- processing loop, which is reachable only AFTER `interrupt()` has returned, so there's no rollback risk and no risk of duplicate ON_RUN_STEP_COMPLETED dispatch. #2: Add a test for mixed respond + reject in the same resume batch. Both decisions land on the same resume pass: respond dispatches ON_RUN_STEP_COMPLETED immediately via the resume branch; reject queues into deferredBlockedSideEffects and dispatches via the flush. The test asserts each tool dispatches exactly once, the PostToolBatch snapshot fires once with entries in the original toolCalls order, and the ToolMessages match the resume decisions. Pins the timing interaction between the immediate-dispatch and deferred-flush paths. Validation: 37/37 HITL tests pass, 625 across hooks/tools/ summarization, typecheck clean, lint clean (only pre-existing ToolNode.ts:751 warning untouched), build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🛡️ fix: enforce allowedDecisions on resume + widen interrupt payload typing Two new Codex findings, both real correctness gaps in the HITL surface: P1 — `allowedDecisions` not enforced on resume. A `PreToolUse` hook returning `{ decision: 'ask', allowedDecisions: ['approve', 'reject'] }` advertises the restricted set in the interrupt's `review_configs`, but the SDK's resume handler never validated incoming `decision.type` against it. A buggy or hostile host UI could submit `{ type: 'edit', updatedInput: {...} }` and bypass the policy — mutating arguments or substituting a synthetic tool result that the hook explicitly forbade. Fix: per-entry check that reads `decision.type` through a widened view (handles untyped JSON / typo / missing field) and routes anything not in the per-call `allowedDecisions` allowlist through `blockEntry` with a diagnostic reason. The check fires before the existing approve/edit/reject/respond branches, so policy enforcement happens regardless of which decision type was offered. Two new tests pin both directions: an `edit` submitted against an allowlist of `['approve', 'reject']` is blocked and the host event never dispatches; an `approve` submitted against the same allowlist passes through and the tool runs with original args. P2 — `RunInterruptResult.payload` was typed as `HumanInterruptPayload` (the SDK's `tool_approval` / `ask_user_question` discriminated union), but `Run.resume` documents support for custom interrupt payloads from custom graph nodes. Hosts raising arbitrary shapes got back a mistyped value and downstream code could falsely assume the discriminator-based fields existed. Fix: - `RunInterruptResult<TPayload = HumanInterruptPayload>` is now generic. Default preserves the existing ergonomics for HITL-only consumers; custom hosts pass their own type. - `Run.getInterrupt<T = HumanInterruptPayload>(): RunInterruptResult<T>` is generic. Callers assert what shape they expect. - Internal storage typed as `RunInterruptResult<unknown>`, reflecting that the SDK does not validate the runtime payload. - `isToolApprovalInterrupt` / `isAskUserQuestionInterrupt` widened to accept `unknown` and narrow defensively (`typeof === 'object'`, null check, then discriminator). Lets host code call them on raw values from `getInterrupt()` without first asserting a type. Two new tests pin both: a custom-payload interrupt is captured and returned through `getInterrupt<MyCustom>()` with the right shape; the type guards safely return `false` for `null`, `undefined`, primitives, empty objects, and wrong-discriminator objects. Validation: 41/41 HITL tests pass (4 new), 629 across hooks/tools/ summarization, typecheck clean, lint clean (only pre-existing ToolNode.ts:751 warning untouched), build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🧹 chore: hoist single declaredType read in HITL decision loop Round 5 review NIT. The `allowedDecisions` enforcement guard (commit 21692f2) introduced `declaredDecisionType`, which read the exact same widened value as the existing `declaredType` further down. Two reads from the same object under different names with no functional difference — pure cosmetic redundancy. Hoisted one `declaredType` constant before both checks, removed the duplicate. The two callers (allowlist enforcement, unknown-type fallthrough) now share the single source so any future change to how the wire shape is widened lands in one place. Behavior unchanged. 41/41 HITL tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🛡️ fix: apply updatedInput before ask + capture nullish interrupt payloads Two new Codex findings, both real correctness bugs: P1 — Hook returning `decision: 'ask'` + `updatedInput` silently dropped the rewrite. The `ask` branch in `dispatchToolEvents`'s PreToolUse loop pushed to `askEntries` and immediately `continue`d, skipping the `applyInputOverride` block below. Result: - The interrupt payload's `action_requests[i].arguments` surfaced the ORIGINAL args to the reviewer (e.g. unredacted secrets a sister matcher had rewritten). - On approve, the host event dispatched with the ORIGINAL args too — the policy rewrite was lost entirely. Fix: call `applyInputOverride` before queuing into `askEntries` when both signals are present. Real-world pattern: one matcher redacts/sanitizes, another matcher requires approval; both must take effect. New test seeds a hook returning `{decision: 'ask', updatedInput: {command: 'redacted-command'}}` against an original arg of `'original-secret'`, then asserts (a) the interrupt payload shows the redacted value, and (b) the host event dispatches with the redacted value after approve. P2 — Nullish interrupt payloads (`interrupt(null)` / `interrupt(undefined)`) silently downgraded paused runs to "completed." The detection block in `processStream` only stashed the interrupt when `first.value != null`; for nullish payloads, `_interrupt` stayed undefined, `getInterrupt()` returned undefined, the run looked like a natural completion, and the `Stop` hook fired. A custom node that pauses without metadata (the pause itself is the signal) was effectively invisible to the host's resume handling. Fix: capture unconditionally on any `__interrupt__` chunk; preserve the payload as-is (may be `null` / `undefined`). New test pins: a custom node calling `interrupt(null)` produces a defined `getInterrupt()` result with `payload === null`, and the Stop hook does NOT fire. Validation: 43/43 HITL tests pass (2 new), 631 across hooks/tools/ summarization, typecheck clean, lint clean (only pre-existing ToolNode.ts:751 warning untouched), build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🛡️ fix: scope halt signals per session + add tool_call_id to review_configs Two findings from a Round 6 review (the colleague review aligned with the latest Codex P1): P1 — Cross-run halt-signal bleed. `HookRegistry._haltSignal` was a single `HookHaltSignal | undefined` field. Hosts that share one registry across concurrent runs (a normal pattern — a global policy hook lives on a shared registry) would see `preventContinuation` from run A trip run B's stream-loop poll on the next iteration, silently terminating an unrelated run. Real correctness bug for multi-tenant / parallel-run hosts. Fix: - Replace `_haltSignal` with `haltSignals: Map<sessionId, HookHaltSignal>`. - `haltRun(sessionId, reason, source)` — first-write-wins per session. - `getHaltSignal(sessionId)` / `clearHaltSignal(sessionId)` — scoped reads and clears. - `executeHooks` passes its `sessionId` (= the run id every in-tree callsite already uses) into `haltRun`. Hooks fired without a sessionId can't raise a halt — there's no run for the loop to poll under, which is the correct semantic. - `Run.processStream` polls / clears with `this.id` everywhere (resetValues, both pre-stream early-return paths, the mid-stream loop poll, and the `finally` cleanup). New regression test runs two Run instances on a shared registry: a RunStart hook halts run A; run B's signal stays undefined throughout, B's processStream completes naturally, and A's session entry is cleaned up before B even starts. Optional nit also addressed: add `tool_call_id` to `ToolApprovalReviewConfig`. The previous shape only carried `action_name`, so a UI mapping `review_configs[i]` → `action_requests[j]` had to assume positional pairing — fragile when a batch contains the same tool called twice (different `tool_call_id`s, same `name`). Carrying `tool_call_id` lets the UI key by id directly, matching the same field already on the action_requests side. New test pins both sides for a duplicate-tool batch. Validation: 45/45 HITL tests pass (2 new), 633 across hooks/tools/ summarization, typecheck clean, lint clean (only pre-existing ToolNode.ts:751 warning untouched), build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🛡️ fix: address Round 7 comprehensive review (8 findings, 5 new tests) Three MAJOR fixes (real correctness gaps), three MINOR (refactor + coverage + observability), two NIT (docs + style). One MINOR (MemorySaver warning) deferred — see below. F1 (MAJOR) — `respond` decision bypassed truncation. Every other tool-output path runs through `truncateToolResultContent` to enforce `maxToolResultChars`, but the `respond` branch wrote `decision.responseText` raw into the ToolMessage. A user pasting a large document as a manual response could blow past the model context window. Fix: truncate up front and use the truncated value in the ToolMessage, the PostToolBatch entry, and the `dispatchStepCompleted` event so all three see the same final output. Test pins: 200-char response with `maxToolResultChars: 50` → ToolMessage and batch entry both carry the truncated value. F3 (MAJOR) — Session hooks leaked when interrupt + halt coexisted. A hook returning BOTH `decision: 'ask'` AND `preventContinuation: true` set both signals (interrupt captured, halt raised). The `finally` block's session-clear guard checked only `_interrupt == null || streamThrew`, so it preserved sessions even though the run was halted (no resume expected). Fix: widen the guard to `_interrupt == null || _haltedReason != null || streamThrew`. Test pins: hook returns ask + preventContinuation, both signals land on the Run, and session matchers are cleared. F2 (MAJOR) — `async: true` hook docs over-promised. The current implementation awaits the hook callback fully (subject to timeout) and only ignores its OUTPUT in the fold. The previous JSDoc said "the agent already moved on" implying detachment, which is false. Fix: rewrote the JSDoc to be honest about the semantic — the return value is ignored for influence, but the callback promise is still awaited. The fire-and-forget pattern requires the hook body to detach via `void promise.catch()` and return immediately (the SDK can't speculatively detach based on output shape it hasn't seen yet). Added a WRONG-pattern example showing the common mistake (`await` inside the body undoes the intent). F5 (MINOR) — Extracted two helpers from `dispatchToolEvents`: - `buildToolApprovalInterruptPayload(askEntries)` — pure module- scope function, takes the askEntries list, returns the `tool_approval` payload. Exposed `AskEntry` as a module-scope type so it can move out of the function body. - `dispatchPostToolBatchAndInjectContext({...})` — private async method on ToolNode, materializes batch entries in `toolCalls` order, fires `PostToolBatch`, and pushes the consolidated context HumanMessage. Cuts ~70 lines from the main function. Both extractions preserve behavior; existing tests cover them. F6 (MINOR) — Added direct tests for `Run.resume()` and `Run.getHaltReason()`. Previously the resume codepath was only exercised through `graph.invoke(new Command(...))` and getHaltReason had only the pre-stream `preventContinuation` test. New tests: - `Run.resume([{type:'approve'}], config)` end-to-end through the wrapper API after an interrupt. - `Run.getHaltReason()` returns `'PII detected'` when `UserPromptSubmit` denies the prompt with a reason. - `Run.getHaltReason()` returns `'prompt_requires_approval'` when `UserPromptSubmit` returns `decision: 'ask'` without a custom reason. F7 (MINOR) — `UserPromptSubmit` deny/ask now sets `_haltedReason`. Previously the only signal that landed in `_haltedReason` was `preventContinuation`; deny/ask returned `undefined` from processStream with no way for hosts to distinguish "blocked" from "completed naturally with no output." Fix: set canonical reasons (`prompt_denied`, `prompt_requires_approval`) when the hook doesn't supply one, surface the hook's `reason` when it does. F8 (NIT) — Added JSDoc on `HumanInterruptType` documenting the downstream consumer (LibreChat's wire types in `librechat-data-provider`) so the export isn't read as dead code. F9 (NIT) — Removed `// src/path/to/file.ts` header comments from the four new files (askUserQuestion, hitl/index, createToolPolicyHook, types/hitl). Per AGENTS.md "Avoid standalone `//` comments unless absolutely necessary." The descriptive block comments below them stay. Deferred: F4 (MemorySaver warning). The `humanInTheLoop` JSDoc already documents "production hosts should always provide a durable checkpointer." A console.warn per Run.create would push hosts to opt out, defeating the deliberate default-on contract. If observability becomes a real issue, a debug-gated log via `emitAgentLog` is the right shape — but that needs a logger plumb-through and is out of scope for this PR. Validation: 50/50 HITL tests pass (5 new), 638 across hooks/tools/ summarization, typecheck clean, lint clean (only pre-existing ToolNode.ts:808 warning untouched), build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🛡️ fix: guard malformed respond decisions + cover prompt_denied canonical default Codex P1: malformed respond payloads (no responseText, or non-string responseText) crashed `truncateToolResultContent` on `content.length` — turning a fail-closed approval path into a hard run failure for one bad wire payload. Hosts deserialize resume payloads from untyped JSON, so the SDK can't trust the shape. Fix: validate `responseText` is a string BEFORE passing to truncation. If missing or non-string, route through `blockEntry` with a diagnostic that includes the actual offered shape (`<missing>` or the typeof) so the host can debug. Two regression tests pin both cases: - `{ type: 'respond' }` (no field) → blocked, error mentions `'<missing>'`, host event never dispatched - `{ type: 'respond', responseText: 42 }` → blocked, error mentions `'number'` Also addresses Round 7 follow-up review N2 (NIT, marked "not worth filing" by the reviewer): added an explicit test for the `prompt_denied` canonical default — the existing test only covered the custom-reason path. Now both the custom and canonical paths are explicitly pinned for both `decision: 'deny'` and `decision: 'ask'`. Skipped follow-up review N1 (pre-existing `// src/run.ts` header) — F9 was scoped to new files only; the reviewer agreed the scope is intentional. Validation: 53/53 HITL tests pass (3 new), 641 across hooks/tools/ summarization, typecheck clean, lint clean (only pre-existing warning untouched), build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🛡️ fix: validate edit payloads + correct HITL default doc + scope direct-tool gap Three findings from Round 8 review (Codex's 5; 1 stale, 2 architectural deferred per the reviewer's own option, 2 fixed here + 1 doc): P2 (Codex new) — Malformed `edit` decisions weren't validated. Same trust boundary as the `respond` fix in 6a93e55: hosts deserialize resume payloads from untyped JSON, and `{ type: 'edit' }` (no `updatedInput`), `{ type: 'edit', updatedInput: 'string' }`, or `{ type: 'edit', updatedInput: [...] }` would feed garbage into `applyInputOverride` and silently approve a tool with the wrong-shape args. Fix: typeof + null + Array.isArray check before approving; fail-closed via `blockEntry` with a diagnostic that includes the offered shape (`<missing>` / `null` / `array` / typeof). Three regression tests pin all three malformed cases. Refactored both `respond` and `edit` diagnostic-formatting code into a single `describeOfferedShape(value)` module-scope helper to drop the nested-ternary lint warnings the new check introduced and keep the diagnostic strings consistent across both decision branches. P3 (Codex new) — `ToolNodeOptions.humanInTheLoop` JSDoc still described the pre-default-on contract ("undefined keeps fail-closed"). Updated to mirror `RunConfig.humanInTheLoop`'s default-on language and explicitly note the event-driven-only scope (the same caveat already documented on the parallel `hookRegistry` field). Architectural P2s deferred — Codex's reviewer explicitly offered this path: "If this PR is only meant to unblock LibreChat's event-driven tool approval path, they can be documented as out of scope." - Direct tools (those routed through `directToolNames` like handoffs/subagents) bypass HITL entirely. - Mixed direct+event batches re-execute direct tools on resume because LangGraph rolls back the entire ToolNode on `interrupt()`. Both are real but require restructuring `ToolNode.run` to defer direct execution until after HITL approval — non-trivial and better done as a focused follow-up. Documented explicitly in `HumanInTheLoopConfig`'s JSDoc with practical implications for hosts: LibreChat-style event-driven hosts get the full surface; direct-tool hosts need to switch to event-driven mode or accept that direct tools aren't approval-gated; mixed batches with side-effecting direct tools should not be combined with approval-gated event tools. Skipped: P1 (malformed respond) — already fixed in 6a93e55. Codex just hadn't re-evaluated yet. Validation: 56/56 HITL tests pass (3 new), 644 across hooks/tools/ summarization, typecheck clean, lint clean (only pre-existing ToolNode.ts:826 warning untouched), build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🛡️ fix: preserve Graph sidecars across HITL interrupt + resume Codex P1: after a clean HITL interrupt, `processStream` was wiping the sidecars `Run.resume()` needs: - `Graph.resetValues()` on the next invocation cleared `toolCallStepIds`, `stepKeyIds`, `messageStepHasToolCalls`, accumulated `messages`, etc. - `Graph.clearHeavyState()` in the prior `finally` cleared `toolCallStepIds`, `_toolOutputRegistry`, `sessions`, `hookRegistry`, `humanInTheLoop`. Result: when LangGraph re-entered ToolNode on resume, the lookup in `dispatchStepCompleted`'s `toolCallStepIds.get(toolCallId) ?? ''` returned `''`. The host then dispatched `ON_RUN_STEP_COMPLETED` with an empty step id, and `stream.ts` dropped the completed tool result. Specifically broken for LibreChat's stream-resume story where the resumed approval path needs to complete the already-rendered tool step. The existing HITL tests didn't catch this because they replace `run.graphRunnable` with a custom graph whose ToolNode owns its own prefilled `toolCallStepIds` Map — disjoint from the SDK Graph's, so the SDK Graph's clear didn't observably affect the test ToolNode's lookups. Two gates added: 1. `processStream` finally: skip `Graph.clearHeavyState()` when `_interrupt != null && _haltedReason == null && !streamThrew` — i.e., a clean interrupt awaiting resume. Halt-driven and error-driven paths still clean up because no resume is expected. 2. `processStream` entry: skip `Graph.resetValues()` when entering via `Command` (resume). We're continuing an in-flight run, not starting fresh — the sidecars must survive the boundary. Cross-process resume (host rebuilds Run from scratch) is a separate concern handled host-side; documented in `HumanInTheLoopConfig` JSDoc earlier. This fix covers the same-Run-instance pause-and-resume path that's the SDK's primary contract. Two regression tests, paired: - "preserves Graph sidecars across HITL interrupt + resume" wires the test ToolNode to share `run.Graph.toolCallStepIds` by reference (mirroring how `StandardGraph` builds its inner ToolNode at Graph.ts:587). Pre-populates the map inside the agent node (mirroring `attemptInvoke`'s timing). Asserts the entry survives both the interrupt finally AND the resume entry, AND that the resumed dispatch fires `ON_RUN_STEP_COMPLETED` with the real step id (not `''`). - "clears Graph sidecars on natural completion" pins the negative case: a non-interrupt run still triggers `clearHeavyState`, so this gate doesn't accidentally leak memory across runs. Verified the regression tests catch the bug pre-fix by stashing `run.ts` and re-running: the preservation test fails on `expect(toolCallStepIds.has('call_1')).toBe(true)` exactly as the reviewer described. Validation: 58/58 HITL tests pass (2 new), 646 across hooks/tools/ summarization, typecheck/lint/build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🚦 chore: flip HITL default to OFF until host UI ships Hosts (notably LibreChat) don't yet render `tool_approval` interrupts, so a default-on HITL surface would let `ask` decisions pause the run with no resolver — surfaces to end users as a hung tool-call card. Flipping the default to OFF keeps existing hosts on the pre-HITL fail-closed path until they're ready to wire the resume UI. Hosts opt in explicitly with `humanInTheLoop: { enabled: true }`. The plan of record (documented on `HumanInTheLoopConfig`) is to flip the default back to ON in a future minor once the consumer ecosystem is ready end-to-end. - `Run.applyHITLCheckpointerFallback`: only attach the in-memory `MemorySaver` when `enabled === true`. Omitted/false leaves `compileOptions.checkpointer` untouched. - `ToolNode` ask-decision branch: collapse `ask` into a synchronous block (the pre-HITL behavior) unless `enabled === true`. - JSDoc on `HumanInTheLoopConfig`, `RunConfig.humanInTheLoop`, and `ToolNodeOptions.humanInTheLoop` rewritten to lead with default-off semantics + the migration plan. - Tests: rename "default-on" assertion to "default-off blocks", and flip the host-checkpointer-preservation test to thread `enabled: true` explicitly (since default-off no longer engages the fallback path). Bump to 3.1.75-dev.3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

#9 P1 — dangerous-command gate missed quoted targets ---------------------------------------------------- `validateBashCommand` checks `dangerousCommandPatterns` against `normalized` — the post-`stripQuotedContent` form. That pass blanks the contents of every quoted span, so destructive targets that the LLM (accidentally or not) wraps in quotes — `rm -rf "/"`, `rm -rf "$HOME"`, `chmod -R 777 "/"` — slip past the regex even though they execute identically to the bare form. Real safety regression on the default-on guard. Fix: split the destructive check into two passes. 1. The existing `dangerousCommandPatterns` continue to run against `normalized`. Same false-positive defence (`echo "rm -rf /"` stays valid because the destructive text is wrapped by the OUTER quote pair, not by quotes around the path argument). 2. New `quotedDestructivePatterns` run against the ORIGINAL command string. Each pattern explicitly REQUIRES a matching quote pair around the destructive path (`"/"`, `"$HOME"`, `'/'`, etc.). `echo "rm -rf /"` doesn't match — the `/` isn't surrounded by quotes there, only the whole `rm -rf /` string is. Tests: `codex review fixes (round 4) › quoted destructive targets` covers `"/"`, `"$HOME"`, `'/'`, `chmod -R 777 "/"`, no-regression on bare forms, and the `echo "rm -rf /"` no-false-positive case. #10 P1 — workspace policy hook compared lexical paths only ---------------------------------------------------------- `createWorkspacePolicyHook` did `isAbsolute(p) ? resolve(p) : resolve(root, p)` and called it done. A symlink inside the workspace pointing outside (e.g. `workspace/escape → /etc/secret`) lexically appears under `workspace/` and got auto-allowed — even though the agent reading through it touches a path the policy meant to gate. Critical when the hook is the primary gate: the documented "ask the user" setup turns `allowReadOutside: true` so the file tools' own clamp is off, leaving the hook as the sole defence. Fix: the hook now realpaths both the candidate and the workspace roots before comparing. Roots are pre-realpath'd once at construction (memoized via a Promise so the per-call cost is one realpath) and candidate paths get the same `realpathOfPathOrAncestor` walk `resolveWorkspacePathSafe` already uses — so paths that don't yet exist (e.g. `write_file` to a brand-new path) still get checked against their nearest existing ancestor's realpath. Two test cases pin both halves: - `workspace/escape → extra/secret.txt` — hook DENIES read of `escape` even though it's lexically inside. - `extra/alt-mount → workspace/` — hook ALLOWS read of `extra/alt-mount/file.ts` because the realpath lands inside the workspace root (covers the alternate-mount case so we don't over-correct). Tests: 1272 passing (was 1264; +8 across both fixes).

@deprecated

, #5, #7, #9, #10, #11) Comprehensive-review audit findings, fixed in one pass: #1 (attachments full-file MIME sniff): switch classifyAttachment to open() + 12-byte header read for sniffing; only readFile() the full buffer when we're about to embed. A 9 MB image now allocates 12 bytes for sniffing instead of 9 MB. #3 (fallback walker SKIP_DIRS too small): expanded the Node-fallback walker's skip set to cover dist/build/.next/.cache/__pycache__/ .venv/venv/target/vendor/coverage/.tox/.mypy_cache and friends. ripgrep itself respects .gitignore so this only affects the fallback. #5 (dynamic fs/promises import): replaced two `await import('fs/ promises').then(...)` calls in CompileCheckTool with a static `import { readFile, stat } from 'fs/promises'` per AGENTS.md. #7 (test-only resets in public API): added @internal JSDoc to all three `_reset*ForTests` functions so IDE autocomplete signals they aren't part of the SDK surface. (The leading underscore was convention-only; @internal is the standard tag.) #9 (legacy fields missing @deprecated): added @deprecated tags to `LocalExecutionConfig.cwd` and `allowOutsideWorkspace` pointing to their workspace.* replacements. #10 (dead lexicallyInside compute in workspace policy hook): dropped the unused variable. Realpath is the source of truth for both the symlink-escape and alternate-mount cases; the lexical check provides no information so we don't pay for it. #11 (voided unused imports in pi comparison script): removed the `void readdir; void cp;` workaround and the dead imports they were silencing. Bonus: also addressed the comprehensive review's #2 (bitwise OR) by verifying it was a misread — the code uses `||`, not `|`. Manual review's reading was wrong; no fix needed. Larger findings addressed: #4 (no editStrategies tests): added `src/tools/local/__tests__/editStrategies.test.ts` with 12 cases covering exact / line-trimmed / whitespace-normalized / indentation-flexible match boundaries, start/end-of-file matches, unicode content, multi-line needles spanning blank lines, ambiguous-match rejection, and applyEdit semantics. #12 (no FileCheckpointer tests): added `src/tools/local/__tests__/FileCheckpointer.test.ts` with 6 cases covering capture+rewind happy path, idempotent capture, absent-file rewind (deletion), multi-file rewind, oversize-file tracking-but-not-snapshotting, and rewind across a deleted parent directory. Performance: #8 (lineWindow split-whole-file): replaced `content.split('\\n')` in `lineWindow` with a newline-walk that's O(start + limit) instead of O(file). For a 10 MB file with `offset: 1, limit: 10` this drops from "split the whole file into millions of strings" to ~10 indexOf calls. Falls back to the simple split when no limit is supplied. Deferred: #6 (spill files never cleaned) needs a Run-lifecycle hook to run cleanup; tracked separately rather than fixed inline.

Follow-up audit on commit ebf8b6a flagged 4 nits/minors. All valid: F1 (NIT): the lineWindow perf fix introduced exactly the same `void <var>` pattern that finding #10 had me remove from createWorkspacePolicyHook. Embarrassing — dropped the dead `line` variable + `line++` + `void line`. F2 (MINOR): the editStrategies test commit message claimed coverage of "ambiguous-match rejection" but no test actually exercised the multi-match → null path. Added the missing case for the exact strategy. Noted in a comment that the looser strategies in the chain may still resolve duplicates unambiguously by falling through; whether that's correct is a separate design call. F3 (NIT): the new FileCheckpointer.test.ts had a `import('fs/ promises')` inside a catch block when `mkdir` was already available via the static import at the top. Static-imported `mkdir` and restructured the test to mkdir-then-writeFile unconditionally (the catch-and-retry pattern was unnecessary defensive plumbing). F4 (NIT): the unused-imports cleanup missed `sum`/`void sum` in the comparison harness. Removed the function and its void.

* feat: add local execution engine * feat(local-engine): hardening + ergonomics follow-ups Adds the follow-up items flagged in the design comparison so the local engine ships with a defensible default posture. Security & correctness - Bridge bearer token: per-Run 256-bit token; programmatic Python and bash callers send `x-librechat-bridge-token`; server uses `timingSafeEqual`. Stops cross-process listeners on the same loopback port from invoking SDK tools. - Symlink-aware workspace path: new `resolveWorkspacePathSafe()` walks up to the nearest existing ancestor, realpaths it, and rejects containment escapes — covers `write_file` to a brand-new path without false positives on `/tmp -> /private/tmp` style ancestors. - Read tool guards: stat-cap (default 10 MiB, configurable via `local.maxReadBytes`), NUL-byte binary detection on the first 8 KiB before reading the rest, returns descriptive stubs. - Local engine warns once per process when it runs without `@anthropic-ai/sandbox-runtime` wrapping so the no-sandbox default stays loud in CI/server logs. Validation - New `bashAst` config: `'off' | 'auto' | 'strict'` heuristic categorical-hazard pass over the quote-stripped command (command substitution, zsh privileged builtins, `/proc/<pid>/environ` reads, `IFS=` injection, hex-escape obfuscation, `source $VAR`, plus `eval`/`exec` denied in strict). Forward-compatible name for a future tree-sitter-bash AST drop-in. Ergonomics - File checkpointing: `LocalFileCheckpointerImpl` snapshots pre-write contents for `write_file`/`edit_file` and rewinds them on demand. `createLocalCodingToolBundle` returns the checkpointer alongside the tools when `local.fileCheckpointing: true`. - Pluggable spawn seam: `LocalExecutionConfig.spawn?: LocalSpawn` lets callers route every command through SSH/containers/remote runners without forking the engine. - Ripgrep fallback: `grep_search` and `glob_search` now probe for `rg` once per process and fall back to a Node-side walker + JS regex when ripgrep isn't on PATH; the old hard requirement is gone. Tests - 20 new unit tests covering AST findings, sandbox-off warning latch, checkpointer round-trip and file deletion, binary-file refusal, oversize stub, symlink escape, ripgrep fallback, and bridge auth rejection. Existing 109 tests still pass. * feat(toolnode): fire PreToolUse/PostToolUse/PostToolUseFailure on direct path Before this change, in-process tools added to `directToolNames` (every graphTool with a real implementation, including the new local-engine tools) skipped the hook lifecycle entirely. Only schema-only tools that round-tripped through the host event-dispatch path actually fired PreToolUse/PostToolUse/PostToolUseFailure/PermissionDenied. That meant `createToolPolicyHook`, the new HITL `'ask'` flow, and PostToolUse output rewriting all silently no-op'd on direct tools — including every `bash`/`code`/`edit_file`/`write_file` from the local engine. This adds `runDirectToolWithLifecycleHooks(call, config, batchContext)` which wraps `runTool` with the same hook semantics `dispatchToolEvents` uses for the single-call case, and routes the three direct-call sites in `run()` through it. Behavior: - Fast path: when the registry has none of PreToolUse / PostToolUse / PostToolUseFailure registered for this run, falls through to `runTool` with zero overhead. No regression for callers without hooks. - PreToolUse: pre-resolves args (matching the event path), fires the hook with resolved args. - `decision: 'deny'` → synthesizes a Blocked error ToolMessage and fires PermissionDenied (observational). - `decision: 'ask'` → fail-closed deny regardless of `humanInTheLoop.enabled`. The event path handles 'ask' via LangGraph's interrupt(), which requires restructuring the direct invoke path; until that lands, returning 'ask' on a direct tool blocks the call (strict improvement vs. ignoring the hook entirely). A one-time warning fires when HITL is enabled, so hosts notice the gap. - `updatedInput` → applied to the call before runTool; runTool's placeholder resolution is idempotent on already-resolved args. - PostToolUse: when runTool returns a non-error ToolMessage, fires PostToolUse and applies `updatedOutput` (preserving id/name/status). - PostToolUseFailure: when runTool returns a status='error' ToolMessage, fires the failure hook (observational); the original error continues to propagate. PostToolBatch participation across direct + dispatched outcomes is intentionally out of scope: the batch hook accumulates inside dispatchToolEvents and is fired at the end of its scope, and merging direct-call outcomes into that aggregation crosses the two paths' result-shaping logic. Left as a follow-up. 8 new unit tests in directToolHooks.test.ts cover deny+PermissionDenied, ask fail-closed, allow, updatedInput, updatedOutput, PostToolUseFailure, no-hooks fast path, and a mixed batch where one direct call denies and the sibling runs. Total non-API test count: 1163 -> 1171 passing. * chore(scripts): add live-API smoke scripts for the local engine Four scripts that exercise the local execution engine end-to-end with a real LLM provider (matching the convention in src/scripts/*). Each spins up a temporary workspace, runs the graph, then dumps observable state so the behavior under test is visible in the console. - local_engine.ts — baseline e2e: bash + write_file + read_file + edit_file + list_directory in a single multi-step task. Sets `bashAst: 'auto'` so the categorical-hazard pass is exercised. - local_engine_ptc.ts — `run_tools_with_code` (Python) calling write_file / read_file / edit_file through the localhost bridge in a single call. The bridge token auth (timingSafeEqual) is on by default for every Run, so this also exercises that path. - local_engine_hooks.ts — direct-path hook integration: registers a `createToolPolicyHook` that denies write_file / edit_file plus an explicit PostToolUse that prefixes every successful tool result with "[reviewed]". Asks the model to write then read; the write must be blocked, the read must come back tagged. Also asserts blocked.txt does not land on disk. - local_engine_checkpointer.ts — builds the local coding tools via `createLocalCodingToolBundle({ fileCheckpointing: true })` so the host keeps a handle on the checkpointer, lets the model edit two seeded files, then calls `rewind()` and verifies originals come back. Exits non-zero if rewind didn't restore both files. npm scripts added for each: `local`, `local:ptc`, `local:hooks`, `local:checkpointer`. All four typecheck clean (`tsc --noEmit`). * fix(graph): pass hookRegistry on the non-event-driven path Live integration test (`npm run local:hooks`) surfaced that hooks weren't actually firing on local-engine tools when used with a plain `graphConfig: { type: 'standard' }` (no `agentContext.toolDefinitions`). Reason: `Graph.ts` only forwarded `hookRegistry` and `humanInTheLoop` to ToolNode in the event-driven branch — the legacy non-event-driven branch dropped both, so the policy hook silently no-op'd and a `write_file` that the script's `createToolPolicyHook` was supposed to deny landed on disk. Two changes: 1. `StandardGraph.createToolNode` now passes `hookRegistry` and `humanInTheLoop` in BOTH branches. Hooks are conceptually orthogonal to whether tools dispatch as events or invoke in process; only the event path was historically wired. 2. `ToolNode.run`'s non-event-driven `else` branch now invokes `runDirectToolWithLifecycleHooks` instead of `runTool` directly, so PreToolUse / PostToolUse / PostToolUseFailure / PermissionDenied fire the same way they do on the event-driven mixed-batch path. The helper still falls through to `runTool` on the no-hooks fast path, so callers without hooks pay zero overhead. Verified by re-running `npm run local:hooks`: - PermissionDenied fired 1 time(s) for write_file with the policy reason - PostToolUse saw read_file (the allowed read got the [reviewed] tag) - blocked.txt landed on disk? false (expected: false) Also re-ran the full non-API jest suite — 1171 passing, no regressions. * feat(local-engine): coding-tool parity push (fuzzy edit, diff, BOM/EOL, overflow) Live cross-comparison against claude-code, pi-mono, and opencode (sst) surfaced four meaningful gaps in the foundation tools. This commit closes them; LSP integration and apply_patch (multi-file) are larger and stay as documented follow-ups. Edit fuzzy matching ------------------- Our edit_file required exact `oldText` match, so a single mistyped space would force the LLM to re-read and retry. New `editStrategies.ts` walks a four-stage chain — exact, line-trimmed, whitespace-normalized, indentation-flexible — stopping at the first strategy that locates EXACTLY ONE match. The matched on-disk slice is then literally replaced with `newText`; we never modify newText. The strategy that fired is surfaced in the tool result's `strategies` metadata so callers can see when fuzzy fallback kicked in. Inspired by opencode's nine-strategy chain; kept to the four highest-yield strategies for a first cut, room for more (block-anchor + Levenshtein etc.) as needed. Unified-diff output on edit/write --------------------------------- `edit_file` and `write_file` now embed a unified diff in the ToolMessage so the model sees what actually changed instead of the prior `Applied 1 edit(s)` blob. Uses npm `diff`'s `createTwoFilesPatch` with 3 lines of context, capped at 4 KB so large rewrites don't blow context. BOM + line-ending preservation ------------------------------ New `textEncoding.ts`: - `decodeFile` strips and remembers UTF-8 BOM and CRLF/LF newline. - `encodeFile` reapplies them on write. `edit_file` / `write_file` now read the file, decode to LF + remember, operate, and re-encode on write. Verified by tests: editing a CRLF file keeps CRLF; overwriting a BOM file keeps the BOM. Bash output overflow → temp file -------------------------------- `spawnLocalProcess` previously truncated overflowing output in place. Now: when total bytes exceed 2× `maxOutputChars`, the FULL stdout/stderr is written to a temp file (`/tmp/lc-local-output-*.txt`) and the path is appended to the formatted output as `full_output_path: …`. The model can then `bash tail -n N <path>` (or `grep`) to inspect specifics it actually needs. Mirrors pi-mono's and opencode's overflow handling. New deps -------- - `diff@9` and `@types/diff@7` — the canonical TS diff lib opencode uses too, MIT-licensed. Tests ----- Added unit tests for each new behavior — 5 new cases for fuzzy matching, diff embedding, CRLF preservation, BOM preservation. Ran the full non-API jest suite: 1176 passing (was 1171). Ran all four live integration scripts (`local`, `local:hooks`, `local:checkpointer`, `local:ptc`) end-to-end against the real Anthropic API; all four green and the diff payload now shows up in the model-visible tool results. * feat(local-engine): inline image/PDF attachments in read_file Closes the last "Close now" parity gap: read_file can now return image and PDF files as inline `MessageContentComplex[]` content blocks so vision-capable models actually see the bytes, instead of getting our old "binary file" stub. New module ---------- src/tools/local/attachments.ts - Magic-byte sniff for the five formats we care about (PNG, JPEG, GIF, WebP, PDF). Inlined the signatures rather than pulling in `file-type` (ESM-only, awkward under ts-jest); LibreChat's api/server/utils/files.js takes the same approach but uses the package since it's a CJS-OK Node server. - Returns a typed Attachment union that the read tool branches on: image -> image_url block, pdf -> image_url block (for Anthropic's PDF-as-data-url path), oversize -> stub, binary -> stub, text-or-unknown -> falls through to the existing line-window read. - Configurable via two new fields on `LocalExecutionConfig`: - `attachReadAttachments`: 'off' (default) | 'images-only' | 'images-and-pdf'. Defaults `'off'` to preserve current behavior; hosts opt in when their model is vision-capable. - `maxAttachmentBytes`: pre-encoding size cap (default 5 MiB) that bounds the post-base64 token cost. Plumbing -------- - read_file branches on attachment.kind. For an image, it returns `[ [textBlock, imageUrlBlock], { artifact } ]`. LangChain's Anthropic adapter (verified at node_modules/@langchain/anthropic/dist/utils/message_inputs.js) preserves image_url blocks inside `tool_result` content arrays; OpenAI Chat Completions accepts them on vision models too. The Responses API flattens to text on the wire (function_call_output is string-only) which is the safe degrade. - The textBlock that goes alongside the image_url carries the filename/mime/byte count so non-vision models still get useful signal. Tests ----- 4 new unit cases: - default behavior: still returns the binary stub - images-only: returns array content with `image_url` data URL + artifact { mime: 'image/png', attachment: 'image' } - oversize: caps embedding past `maxAttachmentBytes` -> "Refusing to embed" stub - text files unaffected when embedding is enabled 29 LocalExecutionTools tests pass; 1180 non-API tests overall. Live verification ----------------- New `npm run local:image` script: 1. Copies a real PNG (the macOS Certificate Assistant droppedImage) into a temp workspace. 2. Asks the agent to `read_file` it and describe what's in it. 3. Inspects the run's ToolMessage to confirm the content is `[textBlock, imageBlock]` with a `data:image/png;base64,...` URL. Live run against real Anthropic Sonnet: ToolMessage(name=read_file, content=[text,image_url] url=data:image/png;base64,iVBORw0KGgo…) Image attachment landed in tool result: true ✔ ASSISTANT: "The image shows a blank certificate template with a light blue border. The word 'Certificate' appears at the top in an elegant, cursive script font. There's a decorative flourish or underline beneath the text, and a gold/yellow seal or badge decoration in the lower right portion..." That's the actual content of the source PNG — vision is wired through end-to-end. * feat(local-engine): post-edit syntax check + compile_check tool The two cheap "tell-the-agent-when-it-broke-the-file" alternatives I called out as the pragmatic substitute for full LSP integration. Both are opt-in. 1. Post-edit syntax check ------------------------- New `local.postEditSyntaxCheck: 'off' | 'auto' | 'strict'` (default `'off'`). After every successful `edit_file` / `write_file`, runs a fast per-file syntax checker keyed on extension and appends any error to the tool result so the model self-corrects on the next turn without a separate read round-trip: - `.js` / `.mjs` / `.cjs` / `.jsx` -> `node --check <file>` - `.py` / `.pyw` -> `python3 -m py_compile` - `.json` -> `JSON.parse` (in process) - `.sh` / `.bash` -> `bash -n <file>` `'auto'` appends `[syntax-check warning ...]` to the tool result and returns success. `'strict'` throws the error so the next turn must react. Each checker probes once for tool availability and silently skips if missing. TypeScript per-file syntax check is intentionally not in this list: per-file `tsc` is slow and only catches a small fraction of TS issues without type information. Use `compile_check` (below) for that. 2. compile_check tool --------------------- New tool auto-bound when `engine: 'local'` and `includeCodingTools` is on. Lets the agent ask "did my change break anything?" without us shipping a real LSP client. Auto-detection (first hit wins): - tsconfig.json or package.json[typescript] -> `npx --no-install tsc --noEmit` - Cargo.toml -> `cargo check --message-format=short` - go.mod -> `go vet ./...` - pyproject.toml/setup.py with mypy -> `python3 -m mypy .` - pyproject.toml/setup.py without mypy -> `compileall .` - none of the above -> tells the agent "no toolchain" Honours `local.compileCheck.command` and `compile_check.command` arg overrides; honours `local.compileCheck.timeoutMs` (default 120s). Output is truncated through the standard local-engine output cap and spills overflow to a temp file like the bash tool does. Returns exit-code-aware `PASSED`/`FAILED` headline + body. Tests ----- 8 new unit cases: - JS/JSON broken/valid round-trip - returns null for unknown extensions - write_file appends `[syntax-check warning ...]` in `auto` - write_file in `strict` throws - compile_check reports "no recognised project marker" in an empty workspace - compile_check honours an explicit command override and reports exit codes 37 LocalExecutionTools tests pass; 1188 non-API tests overall. Live integration ---------------- New `npm run local:compile` script that: 1. Seeds a tiny TS workspace (tsconfig.json + index.ts). 2. Asks Anthropic Sonnet to write a broken JS file (post-edit syntax-check warns) -> fix it -> write a broken TS file with a real type error -> call compile_check (must FAIL with `error TS2322`) -> fix it -> call compile_check again (must PASS). Live run, end-to-end: - `[syntax-check warning via node --check]` appended to write_file - First compile_check: FAILED via `npx --no-install tsc --noEmit` (exit=2): `broken.ts(1,14): error TS2322: Type 'string' is not assignable to type 'number'.` - Second compile_check (after edit_file fix): PASSED - Both assertions ✔. Other live scripts (local, local:hooks, local:checkpointer, local:ptc, local:image) still green. * chore(scripts): side-by-side comparison harness vs pi-mono `npm run compare:pi` runs four identical tasks against pi-mono's CLI and our local engine in parallel temp workspaces using the same model (claude-sonnet-4-5 by default). Captures: tool-call shape, wall time, input/output/cache tokens, and verifies the workspace ended in the expected state. Tasks ----- - T1 simple-edit : single literal substitution in greet.py - T2 fuzzy-edit : edit a file with trailing whitespace + tabs; exercises both agents' fuzzy-match recovery - T3 syntax-error-fix : pre-seeded broken JS; ours has post-edit syntax check, pi has to discover via bash - T4 type-error-fix : pre-seeded TS file with a real type error in a tiny tsconfig project; ours has compile_check, pi has bash + tsc Pi runner spawns `node $PI_BIN --print --mode json --no-session --provider anthropic --model <model>` and parses its JSON event stream. Ours uses Run.create with `engine: 'local'` and walks the final conversation. PI_BIN defaults to ~/Projects/pi-mono/packages/coding-agent/dist/cli.js. Live results (real Anthropic, Sonnet 4.5) ----------------------------------------- task metric pi ours T1 simple-edit verify ✔ ✔ wall 8.4s 11.0s tool calls 2 2 output tok 313 281 T2 fuzzy-edit verify ✔ ✔ wall 16.1s 7.2s tool calls 2 2 output tok 384 246 T4 type-error-fix-loop verify ✔ ✔ wall 23.5s 13.6s tool calls 6 4 output tok 805 397 T3 syntax-error-fix verify ✔ ✔ wall 16.2s 11.7s tool calls 3 4 (note: ours used 1 bash, pi used 2) output tok 371 503 Both agents 4/4 verify ✔. Ours wins wall-time on 3/4 tasks, output tokens on every task, and tool-call count on T4 (compile_check short-circuits the bash+read+bash loop pi did). Open follow-up surfaced by the harness: pi reports thousands of cacheRead tokens per turn (Anthropic prompt cache rebate), ours reports zero. We're paying full rate for the system+tools prefix on every turn. Not a regression introduced here, but worth its own PR to wire prompt caching on the Anthropic adapter. * chore(scripts): compare:pi — caching enabled, +2 tasks, +variance Updates to the side-by-side harness based on first-run findings. 1. Anthropic prompt caching wired on our side ----------------------------------------------- The first comparison reported ours' cache_read=0 — pre-existing PR clarification: caching IS supported (`addCacheControl` + agent context's `hasAnthropicPromptCache()`), but it's gated on `clientOptions.promptCache === true`. The harness's first cut set that on `graphConfig.clientOptions`, but `Run.createLegacyGraph` (src/run.ts:156) destructures `llmConfig` and rebuilds `agentConfig.clientOptions` from it — `graphConfig.clientOptions` is silently dropped on the standard path. So in this harness we set `promptCache: true` on the llmConfig itself. After the fix, ours reports real cache_read / cache_creation per turn. Caching works. 2. New tasks ------------ - T5 multi-file-rename : rename `calc_total` → `calculateTotal` across src/lib.ts, src/index.ts, src/index.test.ts. Tests how each agent finds + applies the rename. - T6 image-read-and-describe (ours-only): copies a real PNG into the workspace (macOS Certificate Assistant icon) and asks the model to describe it. Exercises our `attachReadAttachments: 'images-only'` end-to-end. pi has no equivalent and is skipped via the new `skip: 'pi'` field. 3. Variance support ------------------- COMPARE_ITERS=N runs each task N times per side and reports the mean. Skipped tasks count cleanly (no false N/A entries). The verify check now force-fails when a runner errored, so a soft verify (e.g. T6's "file-still-on-disk" check) can't mask a real provider rejection. 4. Cost computation ------------------- We now compute our cost from the same per-turn breakdown using Sonnet 4.5 pricing ($3/$15 per Mtok input/output, $3.75 cache write, $0.30 cache read), so the cost columns are directly comparable. Live results (N=2) ------------------ Functional: pi 10/10 ✔, ours 12/12 ✔ (T6 ours-only adds 2). Wall time: ours wins all 5 shared tasks (~30% faster on average). Tool calls: ours fewer or equal on every shared task. Output tokens: ours fewer on every shared task. Cost: ours is ~6× more expensive per task even with caching on. Why the cost gap survives caching --------------------------------- Per-turn "input new" tokens: pi (T1): 35 ours (T1): 28298 pi (T4): 76 ours (T4): 48630 Pi only sends the new turn payload (~35 tokens); ~the entire system prefix and tool defs are cache-hits. Ours sends ~28k tokens of new input per turn even with caching on, which strongly suggests our cache prefix isn't covering tool definitions, OR the breakpoint position isn't matching across turns. Worth a dedicated PR to tune the cache breakpoints in `addCacheControl` / `AgentContext.systemRunnable` so tool defs sit inside the cached prefix; this will close the cost gap considerably. The harness made this gap impossible to miss — the entire reason for the cross-comparison. * feat(cache): tool-prefix cache breakpoint + harness accounting fix Two related changes; second one was the bigger win. 1. Tool-prefix cache breakpoint (Anthropic) ------------------------------------------- New `partitionAndMarkAnthropicToolCache` helper at src/messages/anthropicToolCache.ts. When `clientOptions.promptCache === true` and the provider is Anthropic, we now: - Stable-partition the bound tool list into [static, deferred] using `defer_loading` from `agentContext.toolDefinitions`. - Stamp `cache_control: { type: 'ephemeral' }` on the LAST static tool via the `extras` field LangChain's Anthropic adapter forwards (`AnthropicToolExtrasSchema`). This way, the cached prefix covers exactly the static tool inventory. Discovered deferred tools that arrive across turns sit *after* the breakpoint and don't invalidate the prefix. The wrap is non-mutating (clones with `Object.create(getProto…)` so the original tool reference is untouched) and idempotent. 10 unit tests in `src/messages/__tests__/anthropicToolCache.test.ts` cover partition order, stamp behaviour, prototype preservation, extras merging, and idempotency. Wired in `Graph.ts` right after `resolveLocalToolsForBinding`, gated on Anthropic + `promptCache: true`. In practice the impact is smaller than I expected because `addCacheControl` on the last user message ALREADY creates a cache breakpoint that covers the tools+system prefix as part of the cumulative cached prefix. The tool-level breakpoint becomes load- bearing in cases where: - A turn arrives without a fresh user message (resume after tool call only), - The user message would otherwise blow the cache breakpoint, or - Cache budget needs to be split across multiple breakpoints to fit inside Anthropic's 4-breakpoint cap. Still correct; still worth shipping; just not a 6× win on its own. 2. Harness accounting fix (the actual ~6× swing) ------------------------------------------------ `compare:pi`'s previous run reported ours' input tokens as ~28k–48k per turn vs pi's ~35–80, suggesting a 6× cost gap that survived caching. That number was wrong. LangChain's Anthropic adapter at src/llm/anthropic/utils/message_outputs.ts:31 reports `usage_metadata.input_tokens` as the SUM of uncached input + cache_creation + cache_read — i.e., the total prompt size, NOT the uncached portion. Pi reports `usage.input` as uncached only. So our "input new" column was double-counting cached tokens. The harness now subtracts `cache_read + cache_creation` from `input_tokens` when computing our "input new", so the column is apples-to-apples with pi's `input` field. Recomputed live results (N=2, caching on, real Anthropic): task pi cost ours cost delta T1 simple-edit $0.0158 $0.0171 +8% T2 fuzzy-edit $0.0157 $0.0173 +10% T3 syntax-fix $0.0210 $0.0248 +18% T4 type-error-loop $0.0325 $0.0292 -10% ← we win T5 multi-file $0.0264 $0.0312 +18% T6 image (ours-only) — $0.0158 Functional: pi 10/10 ✔, ours 12/12 ✔. Wall time: ours wins T3/T4/T5 (most-complex tasks), pi wins T1/T2. Tool calls: ours wins or ties every shared task. Output tokens: ours wins every shared task. Cost: pi ~10–20% cheaper on simpler tasks, ours wins the most complex one (T4). We're effectively at parity. The "we're 6× more expensive" claim posted earlier was the harness error. * feat(local-engine): workspace + exec seams + HITL on direct path Three changes that fit together — first two answer the "what about a second engine" + "what's the workspace" questions; the third was the last gap blocking workspace-policy `'ask'` from working end-to-end. 1. WorkspaceFS exec seam (engine reusability) --------------------------------------------- New `WorkspaceFS` interface in src/tools/local/workspaceFS.ts (read, write, stat, readdir, mkdir, realpath, unlink, open) with a `nodeWorkspaceFS` default. Every file-touching factory in the local coding suite (read_file, write_file, edit_file, grep_search, glob_search, list_directory, file checkpointer, binary detection helper) now routes through this interface. A future engine — stateful remote sandbox, in-memory test FS, ssh jail — supplies its own `WorkspaceFS` and inherits every tool's behavior (fuzzy-match edit, BOM/EOL preservation, syntax check, checkpointer, attachment classifier) for free. Same story for process spawning: the existing `local.spawn` is now nested under `local.exec.spawn` with the legacy top-level field still honoured. The seams are wired through three new resolvers: - `getWorkspaceRoots` — canonical root + additionalRoots - `getReadRoots`/`getWriteRoots` — split allow-outside knobs - `getWorkspaceFS` — `local.exec.fs` ?? nodeWorkspaceFS - `getSpawn` — `local.exec.spawn` ?? local.spawn ?? Node Plus 7 unit tests in src/tools/__tests__/workspaceSeam.test.ts that verify additionalRoots, the new allowReadOutside/allowWriteOutside split, legacy `cwd` + `allowOutsideWorkspace` back-compat, and that a custom WorkspaceFS actually receives the file tool calls. 2. First-class workspace boundary + policy hook ----------------------------------------------- Nested `local.workspace` config: workspace: { root: string, additionalRoots?: readonly string[], // monorepo: extend boundary allowReadOutside?: boolean, // split from write allowWriteOutside?: boolean, } Back-compat: top-level `cwd` and `allowOutsideWorkspace` still work and map to `workspace.root` / both `allowOutside*` flags. `resolveWorkspacePathSafe(path, config, intent)` now takes a `'read'` | `'write'` intent so reads and writes can have different allow- outside semantics. Default callers pass `'write'` for stricter clamping; read tools (read_file, grep_search, glob_search, list_directory) explicitly pass `'read'`. New `createWorkspacePolicyHook` in src/hooks/createWorkspacePolicyHook.ts returns a `PreToolUse` callback that: - inspects each tool call's input via per-tool path extractors (defaults cover the local-engine coding suite; hosts can override / extend), - returns `allow` for in-workspace paths (incl. additionalRoots), - returns `'ask'` (default) / `'allow'` / `'deny'` for outside paths, with separate read/write policies. This is exactly the user-visible "ask once / always" semantic claude-code and opencode have, but built on the existing PreToolUse + HITL machinery instead of a parallel permissions API. 14 unit tests in src/hooks/__tests__/createWorkspacePolicyHook.test.ts. 3. HITL interrupt() on the direct path (the missing piece) ---------------------------------------------------------- The previous direct-path hook commit fail-closed `'ask'` decisions because the interrupt machinery only existed in the event-dispatch path. The workspace policy hook above wants to use `'ask'` for real, so this commit lifts the gap. `runDirectToolWithLifecycleHooks` now: - When `humanInTheLoop.enabled === true` and PreToolUse returns `'ask'`: builds a single-tool `tool_approval` payload and raises a real LangGraph `interrupt()` (anchored to the node's RunnableConfig the same way `dispatchToolEvents` does — ToolNode disables LangSmith tracing so the AsyncLocalStorage frame must be re-established). - On resume: - `approve` runs the tool with the (possibly hook-rewritten) args - `reject` blocks via the new `blockDirectCall` helper - `respond` returns the host-supplied `responseText` as a synthetic success ToolMessage (no tool execution) - `edit` re-runs the tool with the host-edited args - When HITL is off: collapses to a fail-closed deny (matches the rest of the SDK's HITL-disabled default). One-time warning logged so hosts notice the gap. - `allowedDecisions` enforcement matches the event path — a decision type outside the policy's allowlist is fail-closed. `blockDirectCall` is the shared helper that synthesises the Blocked ToolMessage and fires `PermissionDenied`, used by every deny path (deny decision, fail-closed ask, reject, allowedDecisions violation, malformed respond). Updated the existing fail-closed-on-ask test in directToolHooks.test.ts to assert the new HITL-disabled behaviour explicitly. Live verification ----------------- New `npm run local:workspace` script: - workspace.root = temp dir - additionalRoots = sibling temp dir - createWorkspacePolicyHook with outsideRead/outsideWrite='ask' - humanInTheLoop.enabled = true - asks the model to read 3 files: in-workspace, sibling, outside Live result against real Anthropic Sonnet: ✔ inside.txt → workspace policy 'allow' → read INSIDE ✔ sibling.txt → 'allow' (additionalRoots) → read SIBLING ✔ outside.txt → 'ask' → INTERRUPT raised with action_request carrying the path → script auto-approves → tool runs → read SECRET Final: HITL interrupts handled = 1, successful tool messages = 3 The model didn't even need its "retry if blocked" instruction — HITL handled the approval transparently. Functional + non-API tests: 1245 passing (was 1224; +21 new across workspace seam + workspace policy hook + the updated direct-tool ask test). * feat(common): canonical tool-name constants for local coding tools LibreChat's `getToolIconType` (client/src/components/Chat/Messages/ Content/ToolOutput/ToolIcon.tsx) special-renders four of our tools by exact name match: `bash_tool`, `read_file`, `execute_code`, `run_tools_with_code`. We already use those names, so they pick up the right icon end-to-end. The newer local-engine tools (`write_file`, `edit_file`, `grep_search`, `glob_search`, `list_directory`, `compile_check`) were defined as per-file `Local*ToolName` consts — string literals that would silently drift if anyone renamed them in only one place, and that no consumer UI could discover programmatically. Promotes them to `Constants.*` (single source of truth) and adds two grouped lists alongside the workspace types so consumers can match against canonical strings: - `Constants.WRITE_FILE` ('write_file') - `Constants.EDIT_FILE` ('edit_file') - `Constants.GREP_SEARCH` ('grep_search') - `Constants.GLOB_SEARCH` ('glob_search') - `Constants.LIST_DIRECTORY` ('list_directory') - `Constants.COMPILE_CHECK` ('compile_check') - `LOCAL_CODING_TOOL_NAMES` — the 7 local-only tools (the 6 above plus READ_FILE) - `LOCAL_CODING_BUNDLE_NAMES` — the local-only set + the bash/code/ PTC pair the bundle wraps around the existing factories The per-file `Local*ToolName` and `CompileCheckToolName` exports are retained as back-compat aliases pointing at the canonical Constants so existing consumers keep working. Wired through: - `createWorkspacePolicyHook` default path extractors now key off `Constants.*` (was string literals). - `LocalCodingTools.ts` Local*ToolName aliases collapsed into `Constants.*` references. - `CompileCheckTool.ts` likewise. 5 new unit tests in localToolNames.test.ts pin: - per-file aliases match Constants - canonical strings match the wire-level expectations LibreChat (and other consumer UIs) special-case - LOCAL_CODING_BUNDLE_NAMES matches what `createLocalCodingTools` actually emits - LOCAL_CODING_TOOL_NAMES matches what `createLocalCodingToolDefinitions` emits - bundle list is a strict superset of the local-only list Net effect: when LibreChat (or any other consumer UI) eventually adds icons for write_file / edit_file / grep_search / glob_search / list_directory / compile_check, the wiring picks up automatically without an SDK change. And renames from one side without the other get caught at test time. Tests: 1250 passing (was 1245; +5 new). * fix(local): three Codex P2 review nits — bash args, rg cache, root-relative extras Three real issues Codex flagged on the PR. Each is small but behavior-affecting; tests pin each. #1 — `executeLocalCode` dropped `args` when lang was bash -------------------------------------------------------- For every other runtime (py, js, ts, php, go, rs, c, cpp, java, r, d, f90), `getRuntimeCommand` appends `input.args` as positional parameters. The `lang === 'bash'` short-circuit was calling `executeLocalBash(input.code, config)` and silently discarding `args`, so `{lang:'bash', code:'echo $1', args:['hi']}` printed an empty string instead of `hi`. Fix: when lang is bash AND args is non-empty, route through a new `executeLocalBashWithArgs` helper that uses the standard `bash -lc <code> -- <args...>` form so `$1`, `$2`, … resolve. Same AST validation as the no-args path. The plain `executeLocalBash` (used by the `bash_tool` factory itself) is unchanged. Test: `codex review fixes › executeLocalCode bash args › passes input.args as positional shell parameters when lang is bash`. #2 — ripgrep availability cache was process-global -------------------------------------------------- With per-Run `local.exec.spawn` backends (the seam added for the future remote-engine), a single `let cachedRgAvailable` would let Run A's "rg works" verdict poison Run B whose backend (e.g. a remote sandbox without rg) doesn't have it. Run B would skip the probe, try to spawn rg, throw, and never reach the Node fallback. Fix: cache is now a `WeakMap<LocalSpawn, Promise<boolean>>` keyed on the effective backend (whatever `getSpawn(config)` returns). WeakMap lets disposed backends GC their entry; the test reset hook re-creates the map. Test: `codex review fixes › ripgrep cache backend scope › does not bleed an "rg available" verdict from one backend to another`. #3 — `additionalRoots` resolved against process.cwd, not workspace root ---------------------------------------------------------------------- With `workspace: { root: '/repo/app', additionalRoots: ['../shared'] }`, the intended boundary is `/repo/shared`. The previous `path.resolve(extra)` anchored to the process cwd, producing `${process.cwd()}/../shared` — completely different on a server with a different cwd, and could deny the legitimate sibling or allow an unrelated directory. Fix: `getWorkspaceRoots` and `createWorkspacePolicyHook` now both do `isAbsolute(extra) ? resolve(extra) : resolve(root, extra)`. Behaviour matches what users would expect from a "monorepo sibling" config. Test: `codex review fixes › additionalRoots resolved against workspace root › treats relative additionalRoots as siblings of root, not of process.cwd`. Tests: 1254 passing (was 1250; +4 new). Lint + tsc clean. * docs(hitl): unstale the "event-driven only" caveats; pin resume scope Three JSDoc blocks were honest at PR #134 ship time but went stale once `6122460` (hooks fire on the legacy non-event-driven path) and `caed7a2` (HITL interrupt() lifted into the direct path) landed: - `HumanInTheLoopConfig` — "Scope: event-driven tools only" header plus the bullet declaring direct-path tools "bypass the hook system entirely. PreToolUse hooks do not fire for those tools and HITL approval does not gate them." Both halves are now false. - `ToolNodeOptions.hookRegistry` — "Only fires for event-driven tool calls; tools routed through directToolNames bypass hook dispatch entirely." Now false. - `ToolNodeOptions.humanInTheLoop` — "the interrupt path is only wired into the event-driven dispatch (dispatchToolEvents), not into directToolNames execution — direct tools bypass HITL entirely." Now false. Also re-evaluated the "mixed direct + event batches re-execute the direct half on resume" caveat. New tests in `directToolHITLResumeScope.test.ts` pin the actual scope: - When a single tool interrupts via the direct path, its body runs **once** total. The first pass interrupted before reaching runTool; the resume pass ran the body after the host's decision was applied. The PreToolUse hook fires twice (the documented idempotency caveat). - When a sibling tool already ran in the same batch before the interrupting tool, the sibling's body runs **twice** — once on each pass. This applies symmetrically to event-dispatched and direct siblings; LangGraph rolls back the entire ToolNode invocation, not just the direct half. Updated wording reflects both: HITL spans both paths uniformly, and the side-effect warning applies to "any tool with side effects in a batch where another tool interrupts" rather than singling out direct tools. Hosts that want a tool to skip HITL must omit it from any registered matcher rather than relying on the path it takes. Tests: 1256 passing (was 1254; +2 new for resume-scope pinning). * fix(local): two more Codex review nits — output OOM cap + bash_tool args #4 P1 — `spawnLocalProcess` could OOM the host on noisy commands ---------------------------------------------------------------- Previous implementation accumulated every stdout/stderr chunk into in-memory strings and only truncated post-`close`. A noisy command (`yes`, `cat /dev/urandom | base64`, a verbose build) could stream gigabytes before the cap kicked in — production-stability risk since local tools execute arbitrary shell. Fix: stream-time cap with three layers, all in `spawnLocalProcess`: 1. Per-stream in-memory buffer capped at `maxOutputChars * 2`. Past that, chunks divert to a lazy temp file (the spill). 2. The spill file is seeded with whatever was already buffered so it holds the FULL output (head from memory + tail from stream), not just the post-cap remainder. 3. New `local.maxSpawnedBytes` config (default 50 MiB) hard-kills the process tree once total stdout+stderr crosses the cap. Independent from `maxOutputChars` (which only affects what the model sees) — this is the OOM backstop. `finish()` now waits for the spill stream to flush before resolving, so the model never sees `full_output_path: …` for a still-being- written file. Tests pinned: `codex review fixes (round 2) › streaming output cap`: - `yes` gets killed in ~200ms when capped at 64 KiB (vs the 30s timeout that would otherwise apply) - 200 KiB output with an 8 KB inline cap successfully spills; the file holds more than the in-memory truncation - small outputs do NOT create a spill file (no perf regression) #5 P2 — `bash_tool` factory broke positional shell parameters ------------------------------------------------------------- Same shape as the previous `executeLocalCode` bash-args fix (`9b5fb85`), but in the `bash_tool` factory itself. It was appending args literally to the command string: `${command} ${args.map(shellQuote).join(' ')}` So `command: 'echo "$1"'` with `args: ['hi']` ran `echo "$1" hi` — `$1` was empty, `hi` got appended literally. Tool schema advertised args as execution arguments; the actual behavior didn't match. Fix: route through the `executeLocalBashWithArgs` helper (now exported from LocalExecutionEngine) which uses `bash -lc <command> -- <args...>` so `$1`, `$2`, … resolve correctly. Falls back to the no-args `executeLocalBash` when args is empty (no behavior change for that case). Test: `bash_tool args › populates positional shell parameters from input.args` (and a second case for missing args). Tests: 1261 passing (was 1256; +5 new). * fix(local): three more Codex nits — config.shell, probe cache scope, grep -e #6 P1 — validateBashCommand hard-coded DEFAULT_SHELL ---------------------------------------------------- The `bash -n -c <command>` syntax preflight was hard-coded to DEFAULT_SHELL ('bash' on POSIX, 'bash.exe' on Windows), so a Run configured with `local.shell: '/bin/sh'` (or pointing at zsh, dash, or any host where bash isn't available) would get its commands syntax-validated against a different binary than the one that would actually execute them. Valid commands could be rejected before execution; commands relying on bashisms could be falsely accepted on hosts without bash. Fix: route the syntax preflight through `config.shell ?? DEFAULT_SHELL` so validation uses the same binary the actual execution path uses. Test: `codex review fixes (round 3) › validateBashCommand honours configured shell › routes the -n preflight through local.shell when set`. Intercepts the spawn calls and asserts the syntax check used `/bin/sh`, not the DEFAULT_SHELL fallback. #7 P2 — syntax-check probe cache was process-global --------------------------------------------------- Same shape as the ripgrep-cache fix in `9b5fb85`: per-Run `local.exec.spawn` backends mean a "node available" / "python available" / "bash available" verdict from one backend's probe could poison subsequent Runs whose backend (e.g. a remote sandbox that has python but not node) doesn't have those tools. Fix: cache is now `WeakMap<LocalSpawn, ProbeCache>` keyed on the effective spawn backend (whatever `getSpawn(config)` returns). Disposed backends GC their entry; the test reset hook re-creates the map. Mirrors the ripgrep cache exactly. Test: `codex review fixes (round 3) › syntax-check probe cache is backend-keyed › does not bleed an "rg/node/python available" verdict from one backend to another`. #8 P2 — grep pattern parsed as flag when dash-prefixed ------------------------------------------------------ The `grep_search` tool sent `input.pattern` as a positional arg to `rg`. A pattern like `-foo` got parsed as an unknown flag and rg bailed out instead of searching for it. `rg --help` explicitly requires `-e/--regexp` (or `--`) for dash-prefixed patterns. Fix: pass the pattern via `-e <pattern>`. Same trick also defends against any future flag-conflict if a user query happens to look like an rg long option. The Node fallback path doesn't need this — JS RegExp accepts dash-prefixed patterns natively. Test: `codex review fixes (round 3) › grep passes pattern via -e › handles dash-prefixed patterns without rg interpreting them as flags`. Tests: 1264 passing (was 1261; +3 new). * fix(local): two more P1 Codex nits — quoted destructive + symlink-escape #9 P1 — dangerous-command gate missed quoted targets ---------------------------------------------------- `validateBashCommand` checks `dangerousCommandPatterns` against `normalized` — the post-`stripQuotedContent` form. That pass blanks the contents of every quoted span, so destructive targets that the LLM (accidentally or not) wraps in quotes — `rm -rf "/"`, `rm -rf "$HOME"`, `chmod -R 777 "/"` — slip past the regex even though they execute identically to the bare form. Real safety regression on the default-on guard. Fix: split the destructive check into two passes. 1. The existing `dangerousCommandPatterns` continue to run against `normalized`. Same false-positive defence (`echo "rm -rf /"` stays valid because the destructive text is wrapped by the OUTER quote pair, not by quotes around the path argument). 2. New `quotedDestructivePatterns` run against the ORIGINAL command string. Each pattern explicitly REQUIRES a matching quote pair around the destructive path (`"/"`, `"$HOME"`, `'/'`, etc.). `echo "rm -rf /"` doesn't match — the `/` isn't surrounded by quotes there, only the whole `rm -rf /` string is. Tests: `codex review fixes (round 4) › quoted destructive targets` covers `"/"`, `"$HOME"`, `'/'`, `chmod -R 777 "/"`, no-regression on bare forms, and the `echo "rm -rf /"` no-false-positive case. #10 P1 — workspace policy hook compared lexical paths only ---------------------------------------------------------- `createWorkspacePolicyHook` did `isAbsolute(p) ? resolve(p) : resolve(root, p)` and called it done. A symlink inside the workspace pointing outside (e.g. `workspace/escape → /etc/secret`) lexically appears under `workspace/` and got auto-allowed — even though the agent reading through it touches a path the policy meant to gate. Critical when the hook is the primary gate: the documented "ask the user" setup turns `allowReadOutside: true` so the file tools' own clamp is off, leaving the hook as the sole defence. Fix: the hook now realpaths both the candidate and the workspace roots before comparing. Roots are pre-realpath'd once at construction (memoized via a Promise so the per-call cost is one realpath) and candidate paths get the same `realpathOfPathOrAncestor` walk `resolveWorkspacePathSafe` already uses — so paths that don't yet exist (e.g. `write_file` to a brand-new path) still get checked against their nearest existing ancestor's realpath. Two test cases pin both halves: - `workspace/escape → extra/secret.txt` — hook DENIES read of `escape` even though it's lexically inside. - `extra/alt-mount → workspace/` — hook ALLOWS read of `extra/alt-mount/file.ts` because the realpath lands inside the workspace root (covers the alternate-mount case so we don't over-correct). Tests: 1272 passing (was 1264; +8 across both fixes). * fix(local): treat maxSpawnedBytes=0 as unlimited (Codex P2 #11) The public LocalExecutionConfig contract documents `maxSpawnedBytes: 0` as "no cap", but the kill check `totalSpawnedBytes > hardKillBytes` triggered on the very first byte when the cap was zero. Skip the kill path entirely when `hardKillBytes <= 0` so configs that explicitly opt out of the OOM backstop behave as documented. * chore(scripts): forward ON_RUN_STEP in pi-vs-ours harness Without an ON_RUN_STEP handler the aggregator's stepMap is empty when ON_RUN_STEP_COMPLETED arrives, so every tool call logged "No run step or runId found for completed step event" to stderr. Mirror the pattern used in src/scripts/code_exec.ts (and friends): register both events. Harness-only — no SDK behavior change. * fix(local): ESM-safe spill + surface rg failures in glob_search (Codex P1 #12, P2 #13) Two issues from the latest review pass: 1) `spawnLocalProcess`'s spill path used `require('fs')` inside an ESM-shipped module. Fine in CJS test runs, throws `ReferenceError: require is not defined` for any ESM consumer that triggers the overflow path. Switch to a static `import { createWriteStream } from 'fs'` so both build outputs work. 2) `createLocalGlobSearchTool` ignored ripgrep's exit code and stderr, so `rg --files <missing-target>` (exit 2) silently mapped to "No files found." — the agent then treated a tooling failure as a real absence of matches. Now we check `result.exitCode > 1` (and `timedOut`) and return an explicit `glob_search failed: <stderr>` string + an artifact carrying the error. Tests pinned: `codex review fixes (round 5) › spill path is ESM-safe` and `… › glob_search surfaces ripgrep failures`. The glob test uses an injected spawn backend so it runs deterministically on hosts without ripgrep installed. * fix(local): sandbox loopback bridge + additionalRoots writes (Codex P1 #14, P2 #15) Two issues with how `buildSandboxRuntimeConfig` lays out the SandboxRuntimeConfig: 1) The local programmatic-tool bridge listens on 127.0.0.1, but the sandbox default `allowedDomains: []` denies all outbound network — so `run_tools_with_code` / `run_tools_with_bash` fail under sandbox even though they work unsandboxed. Seed allowedDomains with `127.0.0.1`, `localhost`, `::1` (skip any host the user explicitly listed in `deniedDomains`, and dedupe against user-supplied `allowedDomains`). 2) The sandbox `allowWrite` allowlist was seeded with `cwd` only, so `workspace.additionalRoots` paths were resolvable by file tools but blocked for sandboxed shell/code. Now seed from `getWorkspaceRoots` so both surfaces share the same boundary. Exported `buildSandboxRuntimeConfig` so the round-6 tests can drive it directly without standing up the real native sandbox runtime. Test count: 1283 passing (was 1276). Lint baseline unchanged. * fix(direct-path): HITL edit reads updatedInput; PostToolUse updates registry (Codex P1 #16, P2 #17) Two bugs in `runDirectToolWithLifecycleHooks`: 1) The `edit` decision branch read `decision.args`, but the documented `ToolApprovalDecision` shape uses `updatedInput`. Hosts following the public type signature were silently ignored — the tool ran with the original (un-edited) arguments. Now mirrors the event-driven path's validation: read `updatedInput`, fail closed on missing/wrong-shaped payloads (returns an error ToolMessage instead of executing with undefined args). 2) When a `PostToolUse` hook returns `updatedOutput`, the direct path returned a new ToolMessage with the replacement content but never updated the `toolOutputReferences` registry. Subsequent `{{tool<i>turn<n>}}` substitutions then delivered the stale pre-hook bytes while the model observed the post-hook replacement. Now reads `_refKey`/`_refScope` from the message metadata (already stamped by `recordOutputReference`) and calls `toolOutputRegistry.set` with the replacement. Tests: `direct-path HITL: resume scope > edit decision (Codex P1 #16)` covers both the happy path (edited args reach the body) and the fail-closed branch. `ToolNode tool output references > PostToolUse updatedOutput updates the registry (Codex P2 #17)` chains two calls where the second references the first via {{tool0turn0}} and asserts the substituted value matches the post-hook content. * fix: snapshot direct-batch args + curl-first bash bridge (Codex P1 #18, P2 #19) P1 #18 (ToolNode.ts): direct-path tools resolved {{tool…turn…}} placeholders against the LIVE registry inside `runTool`, which runs AFTER awaiting PreToolUse hooks. In a `Promise.all`-driven batch a slow hook on call A could let call B finish first and register its output, then A's late re-resolve would substitute B's output into A's args — order-dependent leakage that violates same-turn isolation. Snapshot the registry once per batch in `run()` (mirrors the event-driven path), thread it through `RunToolBatchContext`, and have both `runDirectToolWithLifecycleHooks` and `runTool` resolve against the snapshot when present. P2 #19 (LocalProgrammaticToolCalling.ts): the bash bridge required `python3` on PATH, breaking `run_tools_with_bash` on minimal containers and Windows hosts without python3 installed. Bridge now serves a `?mode=text` variant that returns the already-serialized result body so bash callers can use `curl` (universally available) without pulling in a JSON parser. Bash helper tries curl first, falls back to python3 for environments without curl, errors helpfully if neither is available. Tests pinned: `ToolNode tool output references › direct-batch snapshot isolation (Codex P1 #18)` (slow PreToolUse hook + sibling finishes-first scenario), and `ProgrammaticToolCalling › bash bridge script does not require python3 (Codex P2 #19)` (asserts both helper branches present in the generated script with curl preferred). * fix(local): block `--` end-of-options + gate compile_check overrides (Codex P1 #20, P1 #21) P1 #20: the destructive-command guard required the target path to follow option flags directly, so `rm -rf -- "/"` (and similarly `chmod -R 777 -- "/"`) bypassed the check via the GNU/BSD end-of-options marker. Both `dangerousCommandPatterns` and `quotedDestructivePatterns` now allow an optional `(?:--\\s+)?` between the flags and the destructive target. P1 #21: `compile_check` ran the resolved command through `bash -lc` without going through `validateBashCommand`, so a host-supplied `command` override (or `compileCheck.command` config entry) could execute mutating/destructive commands even with `readOnly: true` or the dangerous-command guard normally in force. Now routes through `validateBashCommand(detection.command, config)` first; on failure returns an explanatory ToolMessage with `ran: false` rather than spawning the process. Tests pinned: `codex review fixes (round 6) › destructive guard handles `--` end-of-options` (5 cases including a benign-`--` no-regression check) and `… › compile_check enforces validateBashCommand + readOnly` (rm -rf, touch under readOnly, benign echo). Lint baseline unchanged. * chore: audit-pass cleanups + missing test coverage (manual review #1, #3, #5, #7, #9, #10, #11) Comprehensive-review audit findings, fixed in one pass: #1 (attachments full-file MIME sniff): switch classifyAttachment to open() + 12-byte header read for sniffing; only readFile() the full buffer when we're about to embed. A 9 MB image now allocates 12 bytes for sniffing instead of 9 MB. #3 (fallback walker SKIP_DIRS too small): expanded the Node-fallback walker's skip set to cover dist/build/.next/.cache/__pycache__/ .venv/venv/target/vendor/coverage/.tox/.mypy_cache and friends. ripgrep itself respects .gitignore so this only affects the fallback. #5 (dynamic fs/promises import): replaced two `await import('fs/ promises').then(...)` calls in CompileCheckTool with a static `import { readFile, stat } from 'fs/promises'` per AGENTS.md. #7 (test-only resets in public API): added @internal JSDoc to all three `_reset*ForTests` functions so IDE autocomplete signals they aren't part of the SDK surface. (The leading underscore was convention-only; @internal is the standard tag.) #9 (legacy fields missing @deprecated): added @deprecated tags to `LocalExecutionConfig.cwd` and `allowOutsideWorkspace` pointing to their workspace.* replacements. #10 (dead lexicallyInside compute in workspace policy hook): dropped the unused variable. Realpath is the source of truth for both the symlink-escape and alternate-mount cases; the lexical check provides no information so we don't pay for it. #11 (voided unused imports in pi comparison script): removed the `void readdir; void cp;` workaround and the dead imports they were silencing. Bonus: also addressed the comprehensive review's #2 (bitwise OR) by verifying it was a misread — the code uses `||`, not `|`. Manual review's reading was wrong; no fix needed. Larger findings addressed: #4 (no editStrategies tests): added `src/tools/local/__tests__/editStrategies.test.ts` with 12 cases covering exact / line-trimmed / whitespace-normalized / indentation-flexible match boundaries, start/end-of-file matches, unicode content, multi-line needles spanning blank lines, ambiguous-match rejection, and applyEdit semantics. #12 (no FileCheckpointer tests): added `src/tools/local/__tests__/FileCheckpointer.test.ts` with 6 cases covering capture+rewind happy path, idempotent capture, absent-file rewind (deletion), multi-file rewind, oversize-file tracking-but-not-snapshotting, and rewind across a deleted parent directory. Performance: #8 (lineWindow split-whole-file): replaced `content.split('\\n')` in `lineWindow` with a newline-walk that's O(start + limit) instead of O(file). For a 10 MB file with `offset: 1, limit: 10` this drops from "split the whole file into millions of strings" to ~10 indexOf calls. Falls back to the simple split when no limit is supplied. Deferred: #6 (spill files never cleaned) needs a Run-lifecycle hook to run cleanup; tracked separately rather than fixed inline. * fix: comprehensive review (round 7) — bridge hooks, sandbox latch, nested-shell, regex DoS, attachments via WorkspaceFS, expose checkpointer Codex P2 (sandbox warning latch): syntax preflight, ripgrep-availability, and node/python/bash probes used to spawn through the public path that fires `maybeWarnSandboxOff`, both emitting a misleading "sandbox is off" warning when the run had `sandbox.enabled: true` AND latching `sandboxOffWarned = true` so a later genuinely-unsandboxed execution silently skipped its warning. Added an opt-in `{ internal: true }` options arg to spawnLocalProcess that suppresses both. Threaded it through every internal probe. Manual #A (P1): the in-process programmatic-tool bridge invoked inner tools via `executeTools` directly, bypassing every PreToolUse hook the host registered. ToolNode now plumbs a `hookContext` (registry + runId/threadId/agentId) into the programmatic-tool factory; the bridge runs PreToolUse before each inner tool.invoke. `deny` and `ask` (HITL not reachable from inside an HTTP handler) fail closed; `updatedInput` is applied to the inner tool args. 4 tests pinned. Manual #B (P1): already fixed in c6e2632 (round 9 — compile_check routes through validateBashCommand). Reviewer was at 75a54b3, predates that commit. Manual #C (P1): `bash -lc "rm -rf $HOME"` (and friends) bypassed the destructive-command guard because `stripQuotedContent` blanks the nested payload before the patterns run. Added a third pattern list `nestedShellDestructivePatterns` that runs against the original command and matches destructive ops (`rm -rf`, `chmod -R 777`, `chown -R`) inside `<shell> -[l]?c "..."` and `eval "..."` payloads. 4 tests including a benign-nested-echo no-regression check. Manual #D (P2): `fallbackGrep` compiled the model-supplied pattern via `new RegExp` and ran it across every file, opening the door to catastrophic-backtracking DoS (`(a+)+$`-style patterns). Added a guardrail layer: 1024-char pattern length cap, nested-quantifier heuristic rejection, 5-second wall-clock budget for the overall search. Surfaced as a structured `FallbackGrepError` -> tool result with engine: 'node-fallback' + the kind tag. Hosts that need bulletproof regex safety should install ripgrep (RE2-based, no backtracking). 2 tests pinned. Manual #E (P2): `local.fileCheckpointing: true` in RunConfig was a silent no-op in the auto-bind path — `createLocalCodingTools()` made a checkpointer that was immediately discarded. Now the auto-bind path uses `createLocalCodingToolBundle()` when checkpointing is on, returns the checkpointer through `ResolveLocalToolsResult`, and ToolNode stashes it + exposes via `getFileCheckpointer()`. 2 tests pinned. Manual #F (P3): `classifyAttachment` used host `fs/promises` directly instead of the configured `WorkspaceFS`. A custom/remote engine could fail to embed valid attachments OR accidentally read a host path with the same absolute name. Added optional `fs` arg to classifyAttachment threaded from `read_file` through to its `WorkspaceFS`. Backward compatible — defaults to host fs/promises when no fs is supplied. Tests touched: 768 passing across all suites (was 750ish). Lint baseline unchanged. * chore: address audit-of-audit findings (4/4 valid) Follow-up audit on commit ebf8b6a flagged 4 nits/minors. All valid: F1 (NIT): the lineWindow perf fix introduced exactly the same `void <var>` pattern that finding #10 had me remove from createWorkspacePolicyHook. Embarrassing — dropped the dead `line` variable + `line++` + `void line`. F2 (MINOR): the editStrategies test commit message claimed coverage of "ambiguous-match rejection" but no test actually exercised the multi-match → null path. Added the missing case for the exact strategy. Noted in a comment that the looser strategies in the chain may still resolve duplicates unambiguously by falling through; whether that's correct is a separate design call. F3 (NIT): the new FileCheckpointer.test.ts had a `import('fs/ promises')` inside a catch block when `mkdir` was already available via the static import at the top. Static-imported `mkdir` and restructured the test to mkdir-then-writeFile unconditionally (the catch-and-retry pattern was unnecessary defensive plumbing). F4 (NIT): the unused-imports cleanup missed `sum`/`void sum` in the comparison harness. Removed the function and its void. * fix: expose file checkpointer through Run/Graph (audit follow-up) Round-7 finding E only fixed half the problem: ToolNode got `getFileCheckpointer()` so direct `new ToolNode(...)` users could reach it, but the normal `Run.create(...)` path constructed ToolNodes inline inside `StandardGraph.initializeTools` and dropped the reference. So `RunConfig.toolExecution.local.fileCheckpointing: true` was still a no-op for Run callers — and the public JSDoc on `fileCheckpointing` referenced a `Run.rewindFiles()` that didn't exist. Now: - Graph adds `getOrCreateFileCheckpointer()` — lazily constructs ONE per-Run checkpointer when local fileCheckpointing is on. Cleared in `clearHeavyState()` along with the other per-Run state. - `StandardGraph.initializeTools` threads it into BOTH ToolNode constructions (the agentContext branch and the traditional-tools branch). Multi-agent graphs end up sharing a single snapshot store, so `rewind()` reaches writes from any agent. - ToolNode accepts `fileCheckpointer?` in its constructor and passes it to `resolveLocalExecutionTools`, which forwards into `createLocalCodingToolBundle({ checkpointer })`. Caller-provided wins over the auto-created one. - Run gains `getFileCheckpointer()` and `rewindFiles()` — the documented APIs. `rewindFiles()` returns 0 when checkpointing is disabled (no-op rather than throwing). - `LocalExecutionConfig.fileCheckpointing` JSDoc updated to reference the now-real APIs. Tests pinned: `… › fileCheckpointer reachable through Run.getFileChec…

configurable jina url

6dc156a

JumpLink mentioned this pull request Mar 7, 2026

[Enhancement]: Support custom reranker provider with configurable URL and model name #65

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configurable custom jina url#10

Configurable custom jina url#10
uhercik wants to merge 1 commit intodanny-avila:mainfrom
gratex:jina-custom-url

uhercik commented Aug 5, 2025

Uh oh!

davidjrh commented Aug 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

uhercik commented Aug 5, 2025

Uh oh!

davidjrh commented Aug 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants