🤝 fix: Load Handoff Agents for Agents API#12740
Conversation
Extracts the BFS discovery + ACL-gated initialization of handoff sub-agents into a shared `discoverConnectedAgents` helper in `@librechat/api` and wires it into the OpenAI-compatible `/v1/chat/completions` and Open Responses `/v1/responses` controllers. These endpoints previously only passed the primary agent config to `createRun` while keeping `primaryConfig.edges` intact, which forced `MultiAgentGraph` into multi-agent mode without loading the referenced sub-agents and caused StateGraph to throw "Found edge ending at unknown node <id>". The discovery helper also filters orphaned edges (deleted sub-agents or those the caller lacks VIEW permission on), so API users see the same graceful fallback the chat UI already had.
CI `tsc --noEmit -p packages/api/tsconfig.json` caught that the test helpers typed `req` as `express.Request`, which is not assignable to `DiscoverConnectedAgentsParams.req` (typed as `ServerRequest` whose `user` is `IUser`). Local jest passed because ts-jest is transpile-only, but the CI typecheck uses the full compiler.
There was a problem hiding this comment.
Pull request overview
Fixes multi-agent handoff failures on the OpenAI-compatible /v1/chat/completions and Open Responses /v1/responses endpoints by ensuring all edge-referenced sub-agents are discovered, ACL-checked, initialized, and included in the createRun() agent list (matching the UI initialize flow).
Changes:
- Introduces a shared BFS-based
discoverConnectedAgentshelper to load reachable sub-agents, merge MCP auth, and filter orphaned edges. - Updates OpenAI-compat and Responses controllers to use discovered
[primary, ...subAgents]and to route tool execution via per-agent tool contexts. - Refactors the UI
initializeClientflow to reuse the shared helper and updates unit tests/mocks accordingly.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/api/src/agents/validation.ts | Widens req typing for model validation to accept ServerRequest shape. |
| packages/api/src/agents/index.ts | Re-exports the new discovery helper. |
| packages/api/src/agents/discovery.ts | Adds shared BFS discovery + ACL gating + edge filtering + MCP auth merging. |
| packages/api/src/agents/discovery.spec.ts | Adds unit coverage for discovery behavior (multi-hop, orphan/permission filtering, cycles, etc.). |
| api/server/services/Endpoints/agents/initialize.js | Replaces inline BFS discovery with discoverConnectedAgents and keeps tool-context wiring. |
| api/server/controllers/agents/openai.js | Loads handoff sub-agents for /v1/chat/completions, passes full agent list to createRun, routes tools by agentId. |
| api/server/controllers/agents/responses.js | Same as above for /v1/responses (streaming + non-streaming). |
| api/server/controllers/agents/tests/openai.spec.js | Updates mocks for new discovery + initialization expectations. |
| api/server/controllers/agents/tests/responses.unit.spec.js | Updates mocks for new discovery + initialization expectations. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const chain = await createSequentialChainEdges([primaryConfig.id].concat(agent_ids), '{convo}'); | ||
| collectEdges(chain as unknown as GraphEdge[]); | ||
| } |
| const modelsConfig = await getModelsConfig(req); | ||
| const { | ||
| agentConfigs: handoffAgentConfigs, | ||
| edges: discoveredEdges, | ||
| userMCPAuthMap: discoveredMCPAuthMap, | ||
| } = await discoverConnectedAgents( |
| const modelsConfig = await getModelsConfig(req); | ||
| const { | ||
| agentConfigs: handoffAgentConfigs, | ||
| edges: discoveredEdges, | ||
| userMCPAuthMap: discoveredMCPAuthMap, | ||
| } = await discoverConnectedAgents( |
GitNexus: 🚀 deployedThe |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e9e435b796
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| collectEdges(chain as unknown as GraphEdge[]); | ||
| } | ||
|
|
||
| const edges = filterOrphanedEdges(Array.from(edgeMap.values()), skippedAgentIds); |
There was a problem hiding this comment.
Remove edges that reference skipped agents as sources
After discovering sub-agents, this path only applies filterOrphanedEdges before returning edges, and that helper drops skipped IDs from to only. If a skipped/deleted/inaccessible sub-agent appears on the from side (for example, a bidirectional graph has A -> B and B -> A, and B is skipped), B -> A is still passed to createRun, which leaves an edge that references an agent not present in agents and can still fail graph compilation instead of gracefully skipping. The returned edge list should remove skipped IDs from both endpoints.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch — valid P1. Fixed in 7418dd8 by widening filterOrphanedEdges to drop edges where either from OR to references a skipped id. Added a bidirectional-graph regression test in discovery.spec.ts plus updated the to-only cases in edges.spec.ts.
|
@codex review |
GitNexus: 🚀 deployedThe |
Addresses the P1 codex finding on #12740: `filterOrphanedEdges` previously only removed edges whose `to` referenced a skipped agent. Edges whose `from` was a skipped agent — the symmetric case in a bidirectional graph like `A <-> B` where `B` is deleted or the user lacks VIEW on it — leaked through to `createRun` and re-triggered `Found edge ending at unknown node <id>` at StateGraph compile time. The filter now drops an edge if either endpoint references a skipped id, and the existing `to`-only test cases were updated to reflect the stricter behavior. Adds a bidirectional-graph regression test in `discovery.spec.ts`.
|
@codex review |
GitNexus: 🚀 deployedThe |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7418dd83b5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| resourceType: ResourceType.AGENT, | ||
| resourceId: agent._id, | ||
| requiredPermission: PermissionBits.VIEW, |
There was a problem hiding this comment.
Enforce REMOTE_AGENT ACL when loading handoff sub-agents
discoverConnectedAgents authorizes discovered handoff agents with ResourceType.AGENT, but the OpenAI-compatible routes are explicitly protected by remote-agent permissions (createCheckRemoteAgentAccess / REMOTE_AGENT). This lets a caller who can access the orchestrator remotely and has regular AGENT visibility (but not REMOTE_AGENT visibility) execute an unshared sub-agent through /v1/chat/completions or /v1/responses, bypassing remote-sharing boundaries for sub-agents. The permission check here should be endpoint-aware (or use REMOTE_AGENT for these controllers) to preserve the same ACL model as route entry.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch — confirmed the API routes gate on REMOTE_AGENT while discovery was only checking AGENT. Fixed in e491a23: added a resourceType param to discoverConnectedAgents (defaults to AGENT for the in-app chat path) and both API controllers now pass ResourceType.REMOTE_AGENT so every discovered sub-agent clears the same sharing boundary the route enforces. Test added in discovery.spec.ts.
Addresses the second P1 codex finding on #12740: the OpenAI-compat `/v1/chat/completions` and Open Responses `/v1/responses` routes gate the primary agent on `REMOTE_AGENT` (via `createCheckRemoteAgentAccess`), but `discoverConnectedAgents` was checking handoff sub-agents against the looser in-app `AGENT` resource type. That allowed a remote caller who could reach the orchestrator but had only in-app visibility on a sub-agent to invoke it via the API — bypassing the remote-sharing boundary. Adds an optional `resourceType` param to `discoverConnectedAgents` (defaulting to `AGENT` for the chat UI path) and passes `ResourceType.REMOTE_AGENT` from both API controllers so every discovered sub-agent clears the same sharing boundary enforced at route entry.
|
@codex review |
GitNexus: 🚀 deployedThe |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e491a23a77
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| endpointOption, | ||
| allowedProviders, |
There was a problem hiding this comment.
Enforce provider allowlist when initializing handoff agents
discoverConnectedAgents forwards endpointOption directly into initializeAgent, but on the OpenAI-compatible routes this option uses the primary provider endpoint (for example openai), not the agents endpoint. Because initializeAgent only checks allowedProviders when isAgentsEndpoint(endpointOption.endpoint) is true, discovered sub-agents bypass the configured provider allowlist and can run disallowed providers via handoff edges.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch — confirmed. The OpenAI-compat routes build endpointOption with endpoint: agent.provider (e.g. openai), so the isAgentsEndpoint-gated allowedProviders guard in initializeAgent was being skipped for discovered sub-agents. Fixed in 7ec1ba2: the helper now overrides endpoint to EModelEndpoint.agents on every per-sub-agent init call so the allowlist is always enforced for handoff targets, regardless of the caller's entry endpoint. Test added.
Addresses the third P1 codex finding on #12740: `discoverConnectedAgents` forwarded the caller's `endpointOption` verbatim into `initializeAgent`, but on the OpenAI-compat routes that option's `endpoint` is the primary agent's provider (e.g. `openai`), not `agents`. `initializeAgent` only enforces `allowedProviders` when `isAgentsEndpoint(endpointOption.endpoint)` is true, so handoff sub-agents silently bypassed the provider allowlist configured under `endpoints.agents.allowedProviders`. Override `endpointOption.endpoint` to `EModelEndpoint.agents` for every per-sub-agent init call. The primary agent still uses the caller's endpointOption as before — this only affects the BFS-loaded handoff targets. Regression test asserts the override.
GitNexus: 🚀 deployedThe |
|
Codex Review: Didn't find any major issues. 🚀 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7ec1ba246f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const edges = filterOrphanedEdges(Array.from(edgeMap.values()), skippedAgentIds); | ||
|
|
||
| return { | ||
| agentConfigs, | ||
| edges, |
There was a problem hiding this comment.
Prune disconnected sub-agents after orphan edge filtering
After filterOrphanedEdges removes edges that referenced skipped agents, discoverConnectedAgents still returns every initialized sub-agent in agentConfigs. In a graph like A -> B -> C, if B is skipped (missing or no VIEW), C is still initialized from the original edge scan, both edges are then removed, and callers pass C into createRun as an extra agent; because createRun switches to multi-agent mode when agents.length > 1, C can run as an unintended parallel start node even though no surviving edge connects it to A. This should prune agentConfigs to agents that remain reachable from the primary through the filtered edge set.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch — confirmed. In an A -> B -> C graph (with B's edges stored directly on A), B being skipped stranded C as an initialized-but-disconnected agent, which createRun was then turning into a stray parallel start. Fixed in 4cbfffc: after filterOrphanedEdges, compute the reachability closure from the primary through surviving edges and prune agentConfigs to that set. Two regression tests added — one for the pruning case, one that verifies agents on surviving paths are kept.
Addresses the fourth P1 codex finding on #12740: BFS eagerly initializes every sub-agent referenced in the primary's edge scan, but once `filterOrphanedEdges` drops edges whose endpoints were skipped, some of those sub-agents end up disconnected from the primary. In an `A -> B -> C` graph (edges stored directly on A) where B is skipped (missing or no VIEW), both edges are filtered, but C was already loaded and would still be passed to `createRun` — which flips into multi-agent mode on `agents.length > 1` and turns C into an unintended parallel start node. After filtering edges, compute the set of agent ids reachable from the primary through the surviving edge set and prune `agentConfigs` to that set. Two regression tests added: one for the pruning case, one that confirms agents connected via surviving edges are still kept.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4cbfffc9ee
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| checkAgentInit, | ||
| skippedAgentIds, | ||
| onAgentInitialized: (agentId, agent, config) => { | ||
| agentConfigs.set(agentId, config); |
There was a problem hiding this comment.
Drop pruned agents from initializeClient run list
discoverConnectedAgents now prunes sub-agents that become unreachable after edge filtering, but this callback writes every initialized sub-agent into agentConfigs before that pruning happens, and the later merge step only adds entries (never removes stale ones). In a graph like A->B and B->C where B is skipped (missing/no access), C is pruned from the helper result but still remains in agentConfigs, so AgentClient passes [A, C] to createRun and C can execute as an unintended parallel start node.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch — this was the missing reconciliation in initialize.js after the pass-4 prune. Fixed in e54440b: the callback no longer writes agentConfigs (it fires during BFS, before the helper knows which agents will end up disconnected), and the outer merge step now copies directly from the pruned discoveredConfigs — so stranded sub-agents like C in A->B->C with B skipped are no longer handed to AgentClient. The API controllers (openai.js / responses.js) were already using the helper's returned map directly, so they were unaffected.
GitNexus: 🚀 deployedThe |
…lback Addresses the fifth P1 codex finding on #12740: `onAgentInitialized` fires during BFS, BEFORE the helper prunes agents that become disconnected once `filterOrphanedEdges` runs. Writing the sub-agent straight into the outer `agentConfigs` there and then only additively merging the pruned `discoveredConfigs` left stranded entries in the outer map, and `AgentClient` would still hand them to `createRun` as extra parallel start nodes (the exact failure mode the pass-4 prune was meant to eliminate for the API controllers). Drop the `agentConfigs.set` from the callback and replace the additive merge with a direct copy from `discoveredConfigs`, which is now the single authoritative source of what the run should see. The per-agent tool context map is still populated during BFS — stale entries there are harmless because they're only read by closure inside `ON_TOOL_EXECUTE` and are unreachable once the agent is not in `agentConfigs`.
|
Hit the 5-round cap on codex review iterations. Summary of landed fixes from this review loop (all codex P1 findings, all validated as real bugs):
CI is green. Happy to run another pass if anything here looks wrong, but otherwise this is ready. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e54440b7d1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for (const agentId of [...agentConfigs.keys()]) { | ||
| if (!reachable.has(agentId)) { | ||
| agentConfigs.delete(agentId); | ||
| } |
There was a problem hiding this comment.
Prune unreachable edges before returning discovery result
After computing reachable, this function deletes disconnected agent configs but still returns the original edges array unchanged. In graphs where a skipped intermediate node disconnects a downstream subgraph (for example A->B->C with B skipped and C having C->D), agentConfigs drops C/D but edges can still contain C->D, so createRun receives edges that reference agents not present in agents and can fail multi-agent graph compilation with unknown-node errors. The returned edges should be filtered to only include endpoints in the reachable set.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Yep — confirmed and fixed in e000354. The reachability pass now filters both agentConfigs (unchanged) and the returned edges list in the same step, so A->B->C->D with B skipped correctly drops C->D instead of letting it through to createRun. Regression test prunes surviving-but-unreachable edges from the return value (A->B->C->D, B skipped) asserts every returned edge endpoint is either primaryConfig.id or a key of agentConfigs.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 222716d193
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| resourceType: ResourceType.REMOTE_AGENT, | ||
| }, | ||
| { | ||
| getAgent: db.getAgent, | ||
| checkPermission, |
There was a problem hiding this comment.
Reuse remote-access resolver for handoff permission checks
This discovery call checks sub-agents with PermissionService.checkPermission on REMOTE_AGENT, but the route-level middleware (createCheckRemoteAgentAccess) grants remote access via getRemoteAgentPermissions, which also treats AGENT owners (SHARE bit) as remotely authorized. In practice, a user can pass the primary model permission check yet have owned handoff agents silently skipped here, breaking multi-agent handoffs on /v1/chat/completions (and the same pattern is present in the Responses controller).
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch — confirmed. Fixed in 0de3684. Both API controllers now build their checkPermission wrapper using getRemoteAgentPermissions (imported from @librechat/api) with getEffectivePermissions injected from PermissionService, and compare the resulting bitmask against the required permission via hasPermissions. That mirrors what createCheckRemoteAgentAccess does at the route, so sub-agents owned by the caller (SHARE on AGENT, no explicit REMOTE_AGENT grant) are no longer silently skipped.
Addresses codex pass-10 P1 on #12740. The API controllers were handing `discoverConnectedAgents` a raw `PermissionService.checkPermission` call against `ResourceType.REMOTE_AGENT`, but the route-level middleware (`createCheckRemoteAgentAccess`) authorizes the primary agent via `getRemoteAgentPermissions`, which first consults the AGENT ACL and treats owners with the SHARE bit as remotely authorized even without an explicit REMOTE_AGENT grant. The mismatch meant a user could open the primary via `/v1/chat/completions` or `/v1/responses`, but their own owned handoff sub-agents were silently skipped — breaking multi-agent handoffs for the common "owner runs their own multi-agent orchestrator" case. Both controllers now pass `discoverConnectedAgents` a `checkPermission` wrapper that delegates to `getRemoteAgentPermissions` (with `getEffectivePermissions` injected from `PermissionService`) and compares the returned bitmask against the required permission via `hasPermissions`. Sub-agents are now authorized by the exact same rules the route middleware applies to the primary.
GitNexus: 🚀 deployedThe |
|
Codex Review: Didn't find any major issues. Nice work! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0de3684be9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return ids.every((id) => typeof id !== 'string' || reachableSet.has(id)); | ||
| }; | ||
|
|
||
| const reachable = new Set<string>([primaryConfig.id]); |
There was a problem hiding this comment.
Preserve valid multi-start branches during reachability prune
Seeding reachability with only primaryConfig.id causes this helper to drop edge-defined components that are intentionally disconnected from the primary (for example, a valid B -> C branch where B has no incoming edges). MultiAgentGraph treats no-incoming agents as start nodes, so those branches previously executed in parallel, but this filter now removes both their edges and agent configs, changing runtime behavior for existing multi-start graphs in every caller of discoverConnectedAgents.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch — reconciled with the earlier pass-4 'prune accidental orphans' using pre-filter reachability as the signal. Fixed in 7cc7aed.
- Agents that were reachable from the primary via the pre-filter edges but lose that path after
filterOrphanedEdgesruns are accidental orphans (a skipped hop broke the chain) → prune. - Agents that were never reachable from the primary, even pre-filter, are intentional parallel starts → seed them into post-filter reachability so their component survives.
New test preserves user-defined parallel-start branches disconnected from the primary covers the A -> B + X -> Y scenario. The pass-audit A -> B -> C -> D, B skipped regression test still passes because C/D were pre-filter reachable through B and lose that reachability post-filter.
Addresses codex pass-11 P2 on #12740. The post-filter reachability prune seeded only from `primaryConfig.id`, which killed `MultiAgentGraph`'s legitimate multi-start pattern — a user-defined edge like `X -> Y` where X has no incoming path (X is an intentional parallel starting node, run alongside the primary) was being dropped because neither X nor Y was reachable from the primary. Reconcile the tension with pass-4 ("prune accidental orphans when an intermediate is skipped") by using pre-filter reachability as the signal: - An agent that WAS reachable from the primary via the original (pre-filter) edges but loses that path when `filterOrphanedEdges` runs is an accidental orphan (a skipped hop broke the chain) — prune. - An agent that was NEVER reachable from the primary, even pre-filter, is an intentional parallel start — seed it into post-filter reachability so its component survives. Surviving-edge endpoint references still keep an agent (co-sources in multi-source edges). New test `preserves user-defined parallel-start branches disconnected from the primary` covers the pass-11 scenario; the existing `A->B->C->D, B skipped` regression test continues to pass because C/D were pre-filter reachable through B and lose that reachability after filtering.
|
@codex review |
GitNexus: 🚀 deployedThe |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7cc7aedea0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for (const agentId of agentConfigs.keys()) { | ||
| if (!preFilterReachable.has(agentId)) { | ||
| postFilterSeeds.add(agentId); | ||
| } |
There was a problem hiding this comment.
Seed only disconnected branch roots after filtering
postFilterSeeds currently adds every initialized agent that was not reachable from the primary in the pre-filter graph. In a graph with a disconnected component like X -> Y, if X is skipped (missing/no VIEW) but Y loads, Y is still seeded here, so discovery returns agents: [primary, Y] with no connecting edge; createRun then treats Y as an unintended start node and executes it in parallel. This is a behavior regression in ACL/orphan scenarios and can trigger tool execution for agents that should have been pruned with their skipped upstream.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch — fixed in 54de02c. Tightened the seed criterion from 'not pre-filter reachable from primary' to 'no incoming edge in pre-filter graph'. That catches the Y in X -> Y case: Y had an incoming edge pre-filter, so it's a downstream step, not a legitimate parallel start — and when X gets skipped, Y is correctly pruned rather than promoted to a root. Regression test does not promote a downstream orphan to a parallel start when its only upstream is skipped added; the pass-11 parallel-start preservation test still passes (X in the user-defined X -> Y branch has no incoming pre-filter, so it's still seeded).
…ng edge' Addresses codex pass-12 P1 on #12740. The pass-11 seed heuristic — 'agent is in `agentConfigs` but was not pre-filter reachable from the primary' — was too permissive. A downstream agent like Y in `X -> Y` where X gets skipped (missing / no VIEW) was never pre-filter reachable from the primary either, so the old rule promoted Y to a parallel start node and discovery returned `agents: [primary, Y]` with no connecting edge. The SDK then ran Y as an unintended parallel root — exactly the orphan behavior pass-4 wanted to prevent. Tighter criterion: seed a post-filter reachability root only when the agent had NO incoming edge in the pre-filter graph. That matches `MultiAgentGraph.analyzeGraph`'s "no-incoming-edge" definition of a start node applied to the user's original declared topology, so: - `A -> B` plus a user-defined `X -> Y` parallel branch: X has no incoming pre-filter → seeded → X and Y both survive. - `A -> B` plus `X -> Y` with X skipped: Y had an incoming pre-filter (`X -> Y`) → NOT seeded → Y is pruned as the orphan it is. - `A -> B -> C` with B skipped: C had an incoming pre-filter (`B -> C`) → NOT seeded → C is pruned. New test `does not promote a downstream orphan to a parallel start when its only upstream is skipped` locks in the pass-12 scenario. The pass-11 `preserves user-defined parallel-start branches` test continues to hold.
|
@codex review |
GitNexus: 🚀 deployedThe |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 54de02cb61
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| getUserCodeFiles: db.getUserCodeFiles, | ||
| getToolFilesByIds: db.getToolFilesByIds, | ||
| getCodeGeneratedFiles: db.getCodeGeneratedFiles, | ||
| filterFilesByAgentAccess, |
There was a problem hiding this comment.
Remove AGENT-only file filter from remote agent initialization
Passing filterFilesByAgentAccess into initializeAgent here makes remote API calls enforce ResourceType.AGENT on attached context files (api/server/services/Files/permissions.js uses resourceType: ResourceType.AGENT at the access check), even though this route authorizes callers with REMOTE_AGENT permissions via getRemoteAgentPermissions. A user who has REMOTE_AGENT_VIEWER but no AGENT_VIEW can invoke the shared agent, but all owner-attached context files get filtered out, so tools like file_search/context-backed retrieval silently stop working for remote consumers; the same regression is also introduced in responses.js with the identical dbMethods wiring.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Right — this was a regression I introduced when I refactored the DB-method bundle into a shared dbMethods object. Fixed in 6ce3462: both API controllers now omit filterFilesByAgentAccess from dbMethods (and drop the now-unused import). The in-app chat initialize.js keeps it because that path legitimately authorizes at the AGENT level.
This restores the pre-refactor API behavior — primeResources sees filterFiles: undefined and skips the per-file ACL check, so REMOTE_AGENT_VIEWER-only callers get the owner-attached context files the route's getRemoteAgentPermissions has already decided they can see.
Addresses codex pass-13 P1 on #12740. When I refactored the API controllers' DB-method bundle, I inadvertently started forwarding `filterFilesByAgentAccess` into `initializeAgent`. That helper calls `checkPermission` with `resourceType: ResourceType.AGENT`, but these routes authorize callers through `REMOTE_AGENT` (via `getRemoteAgentPermissions`). A user granted `REMOTE_AGENT_VIEWER` on a shared agent but lacking direct `AGENT_VIEW` could invoke the agent yet all its owner-attached context files would get silently filtered out — breaking `file_search`/context retrieval for remote consumers. Drop `filterFilesByAgentAccess` from the OpenAI-compat and Responses controllers' `dbMethods` (and remove the now-unused import). The chat UI's `initialize.js` keeps it since that path legitimately authorizes at the AGENT level. No functional change inside the helper — passing `undefined` simply tells `primeResources` to skip the per-file ACL filter, restoring the pre-refactor API behavior.
GitNexus: 🚀 deployedThe |
|
Codex Review: Didn't find any major issues. Swish! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6ce34624e8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for (const agentId of [...agentConfigs.keys()]) { | ||
| if (!reachable.has(agentId) && !referencedByEdge.has(agentId)) { | ||
| agentConfigs.delete(agentId); |
There was a problem hiding this comment.
Prune unreachable co-sources from surviving edges
When an upstream agent is skipped, a multi-source edge can still reference a source that is no longer reachable (for example A->C, skipped X->B, and ['B','C']->D). This loop keeps B solely because it is still referenced by edges, so createRun includes B even though the reachability pass excluded it. In MultiAgentGraph.analyzeGraph, that leaves B with no incoming edges and it executes as an unintended parallel start node, which changes workflow behavior after permission/orphan filtering.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good call — fixed in 718b000. Instead of keeping unreachable co-sources via the referencedByEdge fallback (which then propped B up as an incoming-less parallel root), surviving multi-source edges now get their from array sanitized to drop any unreachable co-source. [B,C] -> D in the scenario you described becomes [C] -> D, every endpoint of every surviving edge is guaranteed to be in reachable, and the agent prune collapses to a strict reachable.has(agentId) check. Regression test strips unreachable co-sources from surviving multi-source edges (no stray parallel root) asserts B is absent from every surviving edge endpoint; all 22 prior discovery tests still pass unchanged.
Addresses codex pass-14 P1 on #12740. The earlier pass-8 fix kept any agent referenced as an endpoint of a surviving edge (via a `referencedByEdge` fallback) to avoid the SDK's `validateEdgeAgents` failing on missing nodes. But that fallback propped up unreachable co-sources too: with `[A -> C, X -> B, [B,C] -> D]` and X skipped, `X -> B` gets filtered, the `[B,C] -> D` fan-in survives because C is reachable, and B stays in `agentConfigs` solely because the fan-in still lists it. `MultiAgentGraph.analyzeGraph` then sees B with no incoming edge and runs it as an unintended parallel root. Sanitize surviving edges instead: for a kept edge whose `from` is an array, filter out any co-source that isn't reachable. The SDK's per-source `addEdge` fires independently, so dropping an unreachable co-source doesn't invalidate the remaining routes — in the scenario above `[B,C] -> D` becomes `[C] -> D`, every endpoint of every surviving edge is now reachable, and the agent prune collapses to a strict `reachable.has(agentId)` check. No more referenced-by-edge fallback. Regression test added: `strips unreachable co-sources from surviving multi-source edges (no stray parallel root)` — asserts B is absent from every surviving edge endpoint and the fan-in's `from` is just `['C']`. All 22 prior discovery tests still pass unchanged.
GitNexus: 🚀 deployedThe |
|
Codex Review: Didn't find any major issues. 🎉 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
@codex review |
|
Codex Review: Didn't find any major issues. Another round soon, please! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
Codex has converged — two consecutive Initial bug fix
Codex passes 1–6 (initial six iterations)
External audit follow-up
Codex passes 7–14 (second wave)
Tests
The companion agents-SDK change (danny-avila/agents#111) was clean from codex in a single pass and remains unchanged. |
SummaryNo follow-up code changes are needed based on this trigger. The PR appears converged (two consecutive clean Checks
|
* 🤝 fix: load handoff sub-agents on OpenAI-compat endpoints (danny-avila#12726) Extracts the BFS discovery + ACL-gated initialization of handoff sub-agents into a shared `discoverConnectedAgents` helper in `@librechat/api` and wires it into the OpenAI-compatible `/v1/chat/completions` and Open Responses `/v1/responses` controllers. These endpoints previously only passed the primary agent config to `createRun` while keeping `primaryConfig.edges` intact, which forced `MultiAgentGraph` into multi-agent mode without loading the referenced sub-agents and caused StateGraph to throw "Found edge ending at unknown node <id>". The discovery helper also filters orphaned edges (deleted sub-agents or those the caller lacks VIEW permission on), so API users see the same graceful fallback the chat UI already had. * 🧪 fix: use ServerRequest in discovery spec helpers CI `tsc --noEmit -p packages/api/tsconfig.json` caught that the test helpers typed `req` as `express.Request`, which is not assignable to `DiscoverConnectedAgentsParams.req` (typed as `ServerRequest` whose `user` is `IUser`). Local jest passed because ts-jest is transpile-only, but the CI typecheck uses the full compiler. * 🪲 fix: drop orphan edges on both endpoints, not just `to` Addresses the P1 codex finding on danny-avila#12740: `filterOrphanedEdges` previously only removed edges whose `to` referenced a skipped agent. Edges whose `from` was a skipped agent — the symmetric case in a bidirectional graph like `A <-> B` where `B` is deleted or the user lacks VIEW on it — leaked through to `createRun` and re-triggered `Found edge ending at unknown node <id>` at StateGraph compile time. The filter now drops an edge if either endpoint references a skipped id, and the existing `to`-only test cases were updated to reflect the stricter behavior. Adds a bidirectional-graph regression test in `discovery.spec.ts`. * 🔒 fix: enforce REMOTE_AGENT ACL on handoff sub-agents for API routes Addresses the second P1 codex finding on danny-avila#12740: the OpenAI-compat `/v1/chat/completions` and Open Responses `/v1/responses` routes gate the primary agent on `REMOTE_AGENT` (via `createCheckRemoteAgentAccess`), but `discoverConnectedAgents` was checking handoff sub-agents against the looser in-app `AGENT` resource type. That allowed a remote caller who could reach the orchestrator but had only in-app visibility on a sub-agent to invoke it via the API — bypassing the remote-sharing boundary. Adds an optional `resourceType` param to `discoverConnectedAgents` (defaulting to `AGENT` for the chat UI path) and passes `ResourceType.REMOTE_AGENT` from both API controllers so every discovered sub-agent clears the same sharing boundary enforced at route entry. * 🧯 fix: enforce allowedProviders for discovered sub-agents Addresses the third P1 codex finding on danny-avila#12740: `discoverConnectedAgents` forwarded the caller's `endpointOption` verbatim into `initializeAgent`, but on the OpenAI-compat routes that option's `endpoint` is the primary agent's provider (e.g. `openai`), not `agents`. `initializeAgent` only enforces `allowedProviders` when `isAgentsEndpoint(endpointOption.endpoint)` is true, so handoff sub-agents silently bypassed the provider allowlist configured under `endpoints.agents.allowedProviders`. Override `endpointOption.endpoint` to `EModelEndpoint.agents` for every per-sub-agent init call. The primary agent still uses the caller's endpointOption as before — this only affects the BFS-loaded handoff targets. Regression test asserts the override. * ✂️ fix: prune unreachable sub-agents after orphan-edge filtering Addresses the fourth P1 codex finding on danny-avila#12740: BFS eagerly initializes every sub-agent referenced in the primary's edge scan, but once `filterOrphanedEdges` drops edges whose endpoints were skipped, some of those sub-agents end up disconnected from the primary. In an `A -> B -> C` graph (edges stored directly on A) where B is skipped (missing or no VIEW), both edges are filtered, but C was already loaded and would still be passed to `createRun` — which flips into multi-agent mode on `agents.length > 1` and turns C into an unintended parallel start node. After filtering edges, compute the set of agent ids reachable from the primary through the surviving edge set and prune `agentConfigs` to that set. Two regression tests added: one for the pruning case, one that confirms agents connected via surviving edges are still kept. * 🔁 fix: don't seed initialize.js agentConfigs from the pre-pruning callback Addresses the fifth P1 codex finding on danny-avila#12740: `onAgentInitialized` fires during BFS, BEFORE the helper prunes agents that become disconnected once `filterOrphanedEdges` runs. Writing the sub-agent straight into the outer `agentConfigs` there and then only additively merging the pruned `discoveredConfigs` left stranded entries in the outer map, and `AgentClient` would still hand them to `createRun` as extra parallel start nodes (the exact failure mode the pass-4 prune was meant to eliminate for the API controllers). Drop the `agentConfigs.set` from the callback and replace the additive merge with a direct copy from `discoveredConfigs`, which is now the single authoritative source of what the run should see. The per-agent tool context map is still populated during BFS — stale entries there are harmless because they're only read by closure inside `ON_TOOL_EXECUTE` and are unreachable once the agent is not in `agentConfigs`. * 🔬 fix: address audit findings on discovery helper Resolves findings from a comprehensive external audit of danny-avila#12740. **Finding 1 (CRITICAL) — stale edges survive the reachability prune.** The pass-4 prune removed unreachable agents from `agentConfigs` but left matching edges in the return value. In an `A -> B -> C -> D` graph (all edges stored on A) where B is skipped, `filterOrphanedEdges` drops A->B and B->C but keeps C->D (neither endpoint is skipped). The caller then sees `agentConfigs` without C/D but `edges` still references them, flipping `createRun` into multi-agent mode with mismatched agents/edges — the exact crash this PR is supposed to fix. Now filter the edge list to the reachable set in the same pass, so the returned shape is self-consistent: every edge endpoint is either the primary id or a key of `agentConfigs`. New regression test covers A->B->C->D with B skipped. **Finding 2 (MAJOR) — unconditional `getModelsConfig` on every API request.** The OpenAI-compat and Responses controllers called `getModelsConfig(req)` and `discoverConnectedAgents` even when the primary agent had no edges (the common single-agent API case). Gate both behind `primaryConfig.edges?.length > 0` so single-agent runs don't pay that cost. **Finding 5 (MINOR) — silent mutation of caller's `primaryConfig.userMCPAuthMap`.** The helper aliased that object and then `Object.assign`'d sub-agent entries into it, changing the caller's config in-place. Shallow-clone up front so the returned merged map is the only destination. **Finding 7 (NIT) — dead `?? []` coalescing.** `filterOrphanedEdges` always returns a concrete array, so the `discoveredEdges ?? []` fallback was never reached. Simplified the `primaryConfig.edges = …` assignment. Also adds a test that verifies `primaryConfig.userMCPAuthMap` is not mutated in-place. * 🧹 chore: address audit NITs on discovery helper Addresses two NIT findings from the post-fix audit: **F1** — the shallow clone on `primaryConfig.userMCPAuthMap` was only applied on the primary side; the `else` branch (hit when the primary had no MCP auth and the first sub-agent seeds the map) assigned the sub-agent's `config.userMCPAuthMap` directly, so a later sub-agent's `Object.assign` mutated the first one's map in place. Harmless in practice (per-request ephemeral objects) but asymmetric. Clone in the else branch too. Test added. **F2** — `initialize.js` had a defensive `if (agentConfigs.size > 0 && !edges) edges = []` normalizer. Pre-existing dead code: the helper now always returns a concrete array from `filteredEdges.filter(...)`. Removed for clarity. * 🕸 fix: require all sources reachable when traversing fan-in edges Addresses the seventh P1 codex finding on danny-avila#12740: the reachability BFS advanced through an edge as soon as any of its `from` endpoints matched the current frontier node (`sources.includes(current)`), but the subsequent edge filter required ALL sources to be reachable (`every`). The two-semantics mismatch let a fan-in edge like `{from: ['A','B'], to: 'C'}` mark C reachable purely via A even when B had no path from the primary, then drop the edge itself at filter time. Result: C survived in `agentConfigs` with no surviving edge connecting it to A, so `createRun` flipped into multi-agent mode on `agents.length > 1` and C ran as an unintended parallel root. Replace the BFS with a fixed-point iteration keyed on the same all-sources-reachable predicate used by the filter, so traversal and filtering stay aligned and multi-source edges only fire once every source is in the reachable set. Two regression tests added: - `{from: ['A','B'], to: 'C'}` with B having no incoming path — asserts neither B nor C leak into the result. - `A -> B`, `A -> C`, `['B','C'] -> D` — asserts the fan-in edge fires and D becomes reachable once both B and C are. * 🔀 fix: match SDK OR semantics for multi-source edge reachability Reverts the all-sources-required reachability gate from 4982f1c and replaces it with an any-source-reachable model, which matches how `@librechat/agents`'s `MultiAgentGraph.createWorkflow` actually wires multi-source edges at runtime (per-source `builder.addEdge(source, destination)`). With the previous `every` gate, a legitimate handoff edge `{ from: ['A', 'B'], to: 'C' }` where B had no incoming path was pruned along with C, regressing OR-semantics routing that the SDK would otherwise handle correctly. New behavior: 1. Reachability: an edge advances when ANY of its `from` endpoints is already reachable. Fixed-point iteration over `filteredEdges`. 2. Edge filter: keep an edge when it has at least one reachable source AND all destinations are reachable (a missing destination would still crash `StateGraph.compile` with `Found edge ending at unknown node`). 3. Agent prune: keep agents that are reachable OR referenced on any endpoint of a surviving edge. The second clause preserves co-sources in multi-source edges (B in `{ from: ['A','B'], to: 'C' }` when nothing else reaches B) so the SDK's per-source `addEdge` — and the `validateEdgeAgents` safety-net I added to the SDK in danny-avila#111 — still finds B as a node. The pass-audit A->B->C->D regression test continues to pass: with B skipped, `filterOrphanedEdges` drops both B-adjacent edges, reachability never expands past A, C->D has no reachable source so it gets filtered, and C/D are pruned because they're neither reachable nor referenced. * ✂️ fix: strip skipped co-members from multi-source/multi-dest edges Addresses codex pass-9 P2 on danny-avila#12740. `filterOrphanedEdges` previously dropped an edge whenever any `from` id was skipped, which was correct for scalar edges but over-aggressive for multi-source ones: the agents SDK adds one `builder.addEdge(source, destination)` per source, so `{ from: ['A','B'], to: 'C' }` with B skipped still has a valid `A -> C` route that was being thrown away. Now sanitize each endpoint: - Scalar skipped → drop the whole edge (no route survives). - Array with some skipped → strip the skipped ids, keep the edge with the surviving members. If the array empties out, drop the edge. Symmetric handling for `to` covers multi-destination fan-out when one co-destination is skipped. Tests updated/added: - `strips skipped co-sources from multi-source edges…` - `strips skipped co-destinations from multi-destination edges` - `drops multi-member edges only when every member on a side is skipped` - Discovery-side: `preserves valid routes when one co-source of a multi-source edge is skipped` asserts the end-to-end behavior — skipped co-source B gets stripped from the edge, A->C routing survives, and C remains in `agentConfigs`. * 🔓 fix: respect SHARE-on-AGENT fallback for handoff ACL on API routes Addresses codex pass-10 P1 on danny-avila#12740. The API controllers were handing `discoverConnectedAgents` a raw `PermissionService.checkPermission` call against `ResourceType.REMOTE_AGENT`, but the route-level middleware (`createCheckRemoteAgentAccess`) authorizes the primary agent via `getRemoteAgentPermissions`, which first consults the AGENT ACL and treats owners with the SHARE bit as remotely authorized even without an explicit REMOTE_AGENT grant. The mismatch meant a user could open the primary via `/v1/chat/completions` or `/v1/responses`, but their own owned handoff sub-agents were silently skipped — breaking multi-agent handoffs for the common "owner runs their own multi-agent orchestrator" case. Both controllers now pass `discoverConnectedAgents` a `checkPermission` wrapper that delegates to `getRemoteAgentPermissions` (with `getEffectivePermissions` injected from `PermissionService`) and compares the returned bitmask against the required permission via `hasPermissions`. Sub-agents are now authorized by the exact same rules the route middleware applies to the primary. * 🌱 fix: preserve user-defined parallel-start branches Addresses codex pass-11 P2 on danny-avila#12740. The post-filter reachability prune seeded only from `primaryConfig.id`, which killed `MultiAgentGraph`'s legitimate multi-start pattern — a user-defined edge like `X -> Y` where X has no incoming path (X is an intentional parallel starting node, run alongside the primary) was being dropped because neither X nor Y was reachable from the primary. Reconcile the tension with pass-4 ("prune accidental orphans when an intermediate is skipped") by using pre-filter reachability as the signal: - An agent that WAS reachable from the primary via the original (pre-filter) edges but loses that path when `filterOrphanedEdges` runs is an accidental orphan (a skipped hop broke the chain) — prune. - An agent that was NEVER reachable from the primary, even pre-filter, is an intentional parallel start — seed it into post-filter reachability so its component survives. Surviving-edge endpoint references still keep an agent (co-sources in multi-source edges). New test `preserves user-defined parallel-start branches disconnected from the primary` covers the pass-11 scenario; the existing `A->B->C->D, B skipped` regression test continues to pass because C/D were pre-filter reachable through B and lose that reachability after filtering. * 🎯 fix: tighten parallel-start seed criterion to 'no pre-filter incoming edge' Addresses codex pass-12 P1 on danny-avila#12740. The pass-11 seed heuristic — 'agent is in `agentConfigs` but was not pre-filter reachable from the primary' — was too permissive. A downstream agent like Y in `X -> Y` where X gets skipped (missing / no VIEW) was never pre-filter reachable from the primary either, so the old rule promoted Y to a parallel start node and discovery returned `agents: [primary, Y]` with no connecting edge. The SDK then ran Y as an unintended parallel root — exactly the orphan behavior pass-4 wanted to prevent. Tighter criterion: seed a post-filter reachability root only when the agent had NO incoming edge in the pre-filter graph. That matches `MultiAgentGraph.analyzeGraph`'s "no-incoming-edge" definition of a start node applied to the user's original declared topology, so: - `A -> B` plus a user-defined `X -> Y` parallel branch: X has no incoming pre-filter → seeded → X and Y both survive. - `A -> B` plus `X -> Y` with X skipped: Y had an incoming pre-filter (`X -> Y`) → NOT seeded → Y is pruned as the orphan it is. - `A -> B -> C` with B skipped: C had an incoming pre-filter (`B -> C`) → NOT seeded → C is pruned. New test `does not promote a downstream orphan to a parallel start when its only upstream is skipped` locks in the pass-12 scenario. The pass-11 `preserves user-defined parallel-start branches` test continues to hold. * 📁 fix: don't enforce AGENT-only file ACL on REMOTE_AGENT API callers Addresses codex pass-13 P1 on danny-avila#12740. When I refactored the API controllers' DB-method bundle, I inadvertently started forwarding `filterFilesByAgentAccess` into `initializeAgent`. That helper calls `checkPermission` with `resourceType: ResourceType.AGENT`, but these routes authorize callers through `REMOTE_AGENT` (via `getRemoteAgentPermissions`). A user granted `REMOTE_AGENT_VIEWER` on a shared agent but lacking direct `AGENT_VIEW` could invoke the agent yet all its owner-attached context files would get silently filtered out — breaking `file_search`/context retrieval for remote consumers. Drop `filterFilesByAgentAccess` from the OpenAI-compat and Responses controllers' `dbMethods` (and remove the now-unused import). The chat UI's `initialize.js` keeps it since that path legitimately authorizes at the AGENT level. No functional change inside the helper — passing `undefined` simply tells `primeResources` to skip the per-file ACL filter, restoring the pre-refactor API behavior. * 🪓 fix: strip unreachable co-sources from surviving multi-source edges Addresses codex pass-14 P1 on danny-avila#12740. The earlier pass-8 fix kept any agent referenced as an endpoint of a surviving edge (via a `referencedByEdge` fallback) to avoid the SDK's `validateEdgeAgents` failing on missing nodes. But that fallback propped up unreachable co-sources too: with `[A -> C, X -> B, [B,C] -> D]` and X skipped, `X -> B` gets filtered, the `[B,C] -> D` fan-in survives because C is reachable, and B stays in `agentConfigs` solely because the fan-in still lists it. `MultiAgentGraph.analyzeGraph` then sees B with no incoming edge and runs it as an unintended parallel root. Sanitize surviving edges instead: for a kept edge whose `from` is an array, filter out any co-source that isn't reachable. The SDK's per-source `addEdge` fires independently, so dropping an unreachable co-source doesn't invalidate the remaining routes — in the scenario above `[B,C] -> D` becomes `[C] -> D`, every endpoint of every surviving edge is now reachable, and the agent prune collapses to a strict `reachable.has(agentId)` check. No more referenced-by-edge fallback. Regression test added: `strips unreachable co-sources from surviving multi-source edges (no stray parallel root)` — asserts B is absent from every surviving edge endpoint and the fan-in's `from` is just `['C']`. All 22 prior discovery tests still pass unchanged.
* 🤝 fix: load handoff sub-agents on OpenAI-compat endpoints (danny-avila#12726) Extracts the BFS discovery + ACL-gated initialization of handoff sub-agents into a shared `discoverConnectedAgents` helper in `@librechat/api` and wires it into the OpenAI-compatible `/v1/chat/completions` and Open Responses `/v1/responses` controllers. These endpoints previously only passed the primary agent config to `createRun` while keeping `primaryConfig.edges` intact, which forced `MultiAgentGraph` into multi-agent mode without loading the referenced sub-agents and caused StateGraph to throw "Found edge ending at unknown node <id>". The discovery helper also filters orphaned edges (deleted sub-agents or those the caller lacks VIEW permission on), so API users see the same graceful fallback the chat UI already had. * 🧪 fix: use ServerRequest in discovery spec helpers CI `tsc --noEmit -p packages/api/tsconfig.json` caught that the test helpers typed `req` as `express.Request`, which is not assignable to `DiscoverConnectedAgentsParams.req` (typed as `ServerRequest` whose `user` is `IUser`). Local jest passed because ts-jest is transpile-only, but the CI typecheck uses the full compiler. * 🪲 fix: drop orphan edges on both endpoints, not just `to` Addresses the P1 codex finding on danny-avila#12740: `filterOrphanedEdges` previously only removed edges whose `to` referenced a skipped agent. Edges whose `from` was a skipped agent — the symmetric case in a bidirectional graph like `A <-> B` where `B` is deleted or the user lacks VIEW on it — leaked through to `createRun` and re-triggered `Found edge ending at unknown node <id>` at StateGraph compile time. The filter now drops an edge if either endpoint references a skipped id, and the existing `to`-only test cases were updated to reflect the stricter behavior. Adds a bidirectional-graph regression test in `discovery.spec.ts`. * 🔒 fix: enforce REMOTE_AGENT ACL on handoff sub-agents for API routes Addresses the second P1 codex finding on danny-avila#12740: the OpenAI-compat `/v1/chat/completions` and Open Responses `/v1/responses` routes gate the primary agent on `REMOTE_AGENT` (via `createCheckRemoteAgentAccess`), but `discoverConnectedAgents` was checking handoff sub-agents against the looser in-app `AGENT` resource type. That allowed a remote caller who could reach the orchestrator but had only in-app visibility on a sub-agent to invoke it via the API — bypassing the remote-sharing boundary. Adds an optional `resourceType` param to `discoverConnectedAgents` (defaulting to `AGENT` for the chat UI path) and passes `ResourceType.REMOTE_AGENT` from both API controllers so every discovered sub-agent clears the same sharing boundary enforced at route entry. * 🧯 fix: enforce allowedProviders for discovered sub-agents Addresses the third P1 codex finding on danny-avila#12740: `discoverConnectedAgents` forwarded the caller's `endpointOption` verbatim into `initializeAgent`, but on the OpenAI-compat routes that option's `endpoint` is the primary agent's provider (e.g. `openai`), not `agents`. `initializeAgent` only enforces `allowedProviders` when `isAgentsEndpoint(endpointOption.endpoint)` is true, so handoff sub-agents silently bypassed the provider allowlist configured under `endpoints.agents.allowedProviders`. Override `endpointOption.endpoint` to `EModelEndpoint.agents` for every per-sub-agent init call. The primary agent still uses the caller's endpointOption as before — this only affects the BFS-loaded handoff targets. Regression test asserts the override. * ✂️ fix: prune unreachable sub-agents after orphan-edge filtering Addresses the fourth P1 codex finding on danny-avila#12740: BFS eagerly initializes every sub-agent referenced in the primary's edge scan, but once `filterOrphanedEdges` drops edges whose endpoints were skipped, some of those sub-agents end up disconnected from the primary. In an `A -> B -> C` graph (edges stored directly on A) where B is skipped (missing or no VIEW), both edges are filtered, but C was already loaded and would still be passed to `createRun` — which flips into multi-agent mode on `agents.length > 1` and turns C into an unintended parallel start node. After filtering edges, compute the set of agent ids reachable from the primary through the surviving edge set and prune `agentConfigs` to that set. Two regression tests added: one for the pruning case, one that confirms agents connected via surviving edges are still kept. * 🔁 fix: don't seed initialize.js agentConfigs from the pre-pruning callback Addresses the fifth P1 codex finding on danny-avila#12740: `onAgentInitialized` fires during BFS, BEFORE the helper prunes agents that become disconnected once `filterOrphanedEdges` runs. Writing the sub-agent straight into the outer `agentConfigs` there and then only additively merging the pruned `discoveredConfigs` left stranded entries in the outer map, and `AgentClient` would still hand them to `createRun` as extra parallel start nodes (the exact failure mode the pass-4 prune was meant to eliminate for the API controllers). Drop the `agentConfigs.set` from the callback and replace the additive merge with a direct copy from `discoveredConfigs`, which is now the single authoritative source of what the run should see. The per-agent tool context map is still populated during BFS — stale entries there are harmless because they're only read by closure inside `ON_TOOL_EXECUTE` and are unreachable once the agent is not in `agentConfigs`. * 🔬 fix: address audit findings on discovery helper Resolves findings from a comprehensive external audit of danny-avila#12740. **Finding 1 (CRITICAL) — stale edges survive the reachability prune.** The pass-4 prune removed unreachable agents from `agentConfigs` but left matching edges in the return value. In an `A -> B -> C -> D` graph (all edges stored on A) where B is skipped, `filterOrphanedEdges` drops A->B and B->C but keeps C->D (neither endpoint is skipped). The caller then sees `agentConfigs` without C/D but `edges` still references them, flipping `createRun` into multi-agent mode with mismatched agents/edges — the exact crash this PR is supposed to fix. Now filter the edge list to the reachable set in the same pass, so the returned shape is self-consistent: every edge endpoint is either the primary id or a key of `agentConfigs`. New regression test covers A->B->C->D with B skipped. **Finding 2 (MAJOR) — unconditional `getModelsConfig` on every API request.** The OpenAI-compat and Responses controllers called `getModelsConfig(req)` and `discoverConnectedAgents` even when the primary agent had no edges (the common single-agent API case). Gate both behind `primaryConfig.edges?.length > 0` so single-agent runs don't pay that cost. **Finding 5 (MINOR) — silent mutation of caller's `primaryConfig.userMCPAuthMap`.** The helper aliased that object and then `Object.assign`'d sub-agent entries into it, changing the caller's config in-place. Shallow-clone up front so the returned merged map is the only destination. **Finding 7 (NIT) — dead `?? []` coalescing.** `filterOrphanedEdges` always returns a concrete array, so the `discoveredEdges ?? []` fallback was never reached. Simplified the `primaryConfig.edges = …` assignment. Also adds a test that verifies `primaryConfig.userMCPAuthMap` is not mutated in-place. * 🧹 chore: address audit NITs on discovery helper Addresses two NIT findings from the post-fix audit: **F1** — the shallow clone on `primaryConfig.userMCPAuthMap` was only applied on the primary side; the `else` branch (hit when the primary had no MCP auth and the first sub-agent seeds the map) assigned the sub-agent's `config.userMCPAuthMap` directly, so a later sub-agent's `Object.assign` mutated the first one's map in place. Harmless in practice (per-request ephemeral objects) but asymmetric. Clone in the else branch too. Test added. **F2** — `initialize.js` had a defensive `if (agentConfigs.size > 0 && !edges) edges = []` normalizer. Pre-existing dead code: the helper now always returns a concrete array from `filteredEdges.filter(...)`. Removed for clarity. * 🕸 fix: require all sources reachable when traversing fan-in edges Addresses the seventh P1 codex finding on danny-avila#12740: the reachability BFS advanced through an edge as soon as any of its `from` endpoints matched the current frontier node (`sources.includes(current)`), but the subsequent edge filter required ALL sources to be reachable (`every`). The two-semantics mismatch let a fan-in edge like `{from: ['A','B'], to: 'C'}` mark C reachable purely via A even when B had no path from the primary, then drop the edge itself at filter time. Result: C survived in `agentConfigs` with no surviving edge connecting it to A, so `createRun` flipped into multi-agent mode on `agents.length > 1` and C ran as an unintended parallel root. Replace the BFS with a fixed-point iteration keyed on the same all-sources-reachable predicate used by the filter, so traversal and filtering stay aligned and multi-source edges only fire once every source is in the reachable set. Two regression tests added: - `{from: ['A','B'], to: 'C'}` with B having no incoming path — asserts neither B nor C leak into the result. - `A -> B`, `A -> C`, `['B','C'] -> D` — asserts the fan-in edge fires and D becomes reachable once both B and C are. * 🔀 fix: match SDK OR semantics for multi-source edge reachability Reverts the all-sources-required reachability gate from 4982f1c and replaces it with an any-source-reachable model, which matches how `@librechat/agents`'s `MultiAgentGraph.createWorkflow` actually wires multi-source edges at runtime (per-source `builder.addEdge(source, destination)`). With the previous `every` gate, a legitimate handoff edge `{ from: ['A', 'B'], to: 'C' }` where B had no incoming path was pruned along with C, regressing OR-semantics routing that the SDK would otherwise handle correctly. New behavior: 1. Reachability: an edge advances when ANY of its `from` endpoints is already reachable. Fixed-point iteration over `filteredEdges`. 2. Edge filter: keep an edge when it has at least one reachable source AND all destinations are reachable (a missing destination would still crash `StateGraph.compile` with `Found edge ending at unknown node`). 3. Agent prune: keep agents that are reachable OR referenced on any endpoint of a surviving edge. The second clause preserves co-sources in multi-source edges (B in `{ from: ['A','B'], to: 'C' }` when nothing else reaches B) so the SDK's per-source `addEdge` — and the `validateEdgeAgents` safety-net I added to the SDK in danny-avila#111 — still finds B as a node. The pass-audit A->B->C->D regression test continues to pass: with B skipped, `filterOrphanedEdges` drops both B-adjacent edges, reachability never expands past A, C->D has no reachable source so it gets filtered, and C/D are pruned because they're neither reachable nor referenced. * ✂️ fix: strip skipped co-members from multi-source/multi-dest edges Addresses codex pass-9 P2 on danny-avila#12740. `filterOrphanedEdges` previously dropped an edge whenever any `from` id was skipped, which was correct for scalar edges but over-aggressive for multi-source ones: the agents SDK adds one `builder.addEdge(source, destination)` per source, so `{ from: ['A','B'], to: 'C' }` with B skipped still has a valid `A -> C` route that was being thrown away. Now sanitize each endpoint: - Scalar skipped → drop the whole edge (no route survives). - Array with some skipped → strip the skipped ids, keep the edge with the surviving members. If the array empties out, drop the edge. Symmetric handling for `to` covers multi-destination fan-out when one co-destination is skipped. Tests updated/added: - `strips skipped co-sources from multi-source edges…` - `strips skipped co-destinations from multi-destination edges` - `drops multi-member edges only when every member on a side is skipped` - Discovery-side: `preserves valid routes when one co-source of a multi-source edge is skipped` asserts the end-to-end behavior — skipped co-source B gets stripped from the edge, A->C routing survives, and C remains in `agentConfigs`. * 🔓 fix: respect SHARE-on-AGENT fallback for handoff ACL on API routes Addresses codex pass-10 P1 on danny-avila#12740. The API controllers were handing `discoverConnectedAgents` a raw `PermissionService.checkPermission` call against `ResourceType.REMOTE_AGENT`, but the route-level middleware (`createCheckRemoteAgentAccess`) authorizes the primary agent via `getRemoteAgentPermissions`, which first consults the AGENT ACL and treats owners with the SHARE bit as remotely authorized even without an explicit REMOTE_AGENT grant. The mismatch meant a user could open the primary via `/v1/chat/completions` or `/v1/responses`, but their own owned handoff sub-agents were silently skipped — breaking multi-agent handoffs for the common "owner runs their own multi-agent orchestrator" case. Both controllers now pass `discoverConnectedAgents` a `checkPermission` wrapper that delegates to `getRemoteAgentPermissions` (with `getEffectivePermissions` injected from `PermissionService`) and compares the returned bitmask against the required permission via `hasPermissions`. Sub-agents are now authorized by the exact same rules the route middleware applies to the primary. * 🌱 fix: preserve user-defined parallel-start branches Addresses codex pass-11 P2 on danny-avila#12740. The post-filter reachability prune seeded only from `primaryConfig.id`, which killed `MultiAgentGraph`'s legitimate multi-start pattern — a user-defined edge like `X -> Y` where X has no incoming path (X is an intentional parallel starting node, run alongside the primary) was being dropped because neither X nor Y was reachable from the primary. Reconcile the tension with pass-4 ("prune accidental orphans when an intermediate is skipped") by using pre-filter reachability as the signal: - An agent that WAS reachable from the primary via the original (pre-filter) edges but loses that path when `filterOrphanedEdges` runs is an accidental orphan (a skipped hop broke the chain) — prune. - An agent that was NEVER reachable from the primary, even pre-filter, is an intentional parallel start — seed it into post-filter reachability so its component survives. Surviving-edge endpoint references still keep an agent (co-sources in multi-source edges). New test `preserves user-defined parallel-start branches disconnected from the primary` covers the pass-11 scenario; the existing `A->B->C->D, B skipped` regression test continues to pass because C/D were pre-filter reachable through B and lose that reachability after filtering. * 🎯 fix: tighten parallel-start seed criterion to 'no pre-filter incoming edge' Addresses codex pass-12 P1 on danny-avila#12740. The pass-11 seed heuristic — 'agent is in `agentConfigs` but was not pre-filter reachable from the primary' — was too permissive. A downstream agent like Y in `X -> Y` where X gets skipped (missing / no VIEW) was never pre-filter reachable from the primary either, so the old rule promoted Y to a parallel start node and discovery returned `agents: [primary, Y]` with no connecting edge. The SDK then ran Y as an unintended parallel root — exactly the orphan behavior pass-4 wanted to prevent. Tighter criterion: seed a post-filter reachability root only when the agent had NO incoming edge in the pre-filter graph. That matches `MultiAgentGraph.analyzeGraph`'s "no-incoming-edge" definition of a start node applied to the user's original declared topology, so: - `A -> B` plus a user-defined `X -> Y` parallel branch: X has no incoming pre-filter → seeded → X and Y both survive. - `A -> B` plus `X -> Y` with X skipped: Y had an incoming pre-filter (`X -> Y`) → NOT seeded → Y is pruned as the orphan it is. - `A -> B -> C` with B skipped: C had an incoming pre-filter (`B -> C`) → NOT seeded → C is pruned. New test `does not promote a downstream orphan to a parallel start when its only upstream is skipped` locks in the pass-12 scenario. The pass-11 `preserves user-defined parallel-start branches` test continues to hold. * 📁 fix: don't enforce AGENT-only file ACL on REMOTE_AGENT API callers Addresses codex pass-13 P1 on danny-avila#12740. When I refactored the API controllers' DB-method bundle, I inadvertently started forwarding `filterFilesByAgentAccess` into `initializeAgent`. That helper calls `checkPermission` with `resourceType: ResourceType.AGENT`, but these routes authorize callers through `REMOTE_AGENT` (via `getRemoteAgentPermissions`). A user granted `REMOTE_AGENT_VIEWER` on a shared agent but lacking direct `AGENT_VIEW` could invoke the agent yet all its owner-attached context files would get silently filtered out — breaking `file_search`/context retrieval for remote consumers. Drop `filterFilesByAgentAccess` from the OpenAI-compat and Responses controllers' `dbMethods` (and remove the now-unused import). The chat UI's `initialize.js` keeps it since that path legitimately authorizes at the AGENT level. No functional change inside the helper — passing `undefined` simply tells `primeResources` to skip the per-file ACL filter, restoring the pre-refactor API behavior. * 🪓 fix: strip unreachable co-sources from surviving multi-source edges Addresses codex pass-14 P1 on danny-avila#12740. The earlier pass-8 fix kept any agent referenced as an endpoint of a surviving edge (via a `referencedByEdge` fallback) to avoid the SDK's `validateEdgeAgents` failing on missing nodes. But that fallback propped up unreachable co-sources too: with `[A -> C, X -> B, [B,C] -> D]` and X skipped, `X -> B` gets filtered, the `[B,C] -> D` fan-in survives because C is reachable, and B stays in `agentConfigs` solely because the fan-in still lists it. `MultiAgentGraph.analyzeGraph` then sees B with no incoming edge and runs it as an unintended parallel root. Sanitize surviving edges instead: for a kept edge whose `from` is an array, filter out any co-source that isn't reachable. The SDK's per-source `addEdge` fires independently, so dropping an unreachable co-source doesn't invalidate the remaining routes — in the scenario above `[B,C] -> D` becomes `[C] -> D`, every endpoint of every surviving edge is now reachable, and the agent prune collapses to a strict `reachable.has(agentId)` check. No more referenced-by-edge fallback. Regression test added: `strips unreachable co-sources from surviving multi-source edges (no stray parallel root)` — asserts B is absent from every surviving edge endpoint and the fan-in's `from` is just `['C']`. All 22 prior discovery tests still pass unchanged.
Summary
Fixes #12726 — the OpenAI-compatible
/v1/chat/completionsand Open Responses/v1/responsesendpoints throwFound edge ending at unknown node <id>when the target agent has handoff edges to sub-agents.discoverConnectedAgentshelper inpackages/api/src/agents/discovery.ts.[primaryConfig, ...subAgentConfigs]tocreateRunand route per-agent tool calls via a keyedagentToolContextsmap (matching whatinitializeClientalready does).VIEWpermission on) so the error path matches the existing chat UI behavior.api/server/services/Endpoints/agents/initialize.jsto the new helper — no behavior change, just deduplication.Root cause
primaryConfig.edgesis always loaded from the DB.packages/api/src/agents/run.ts:417routes toMultiAgentGraphwheneveredges.length > 0, regardless ofagents.length. The chat UI'sinitializeClienthad its own BFS that loaded every sub-agent referenced by those edges; the API controllers skipped that step and handedcreateRuna single-agentagents: [primaryConfig]with unresolved edges, which StateGraph then rejects at compile time.Test plan
packages/api/src/agents/discovery.spec.ts— 9 cases covering multi-hop BFS, orphaned-agent filtering, permission-denied filtering, dedup / cycle safety, legacyagent_idschain, MCP auth merge, validation failure, missing user.server/services/Endpoints/agents/initialize.spec.jsstill passes — adapted to the shared helper via DI overrides.server/controllers/agents/__tests__/openai.spec.jsandresponses.unit.spec.jsupdated with the new mocks; 22 tests pass.packages/api/src/agentstests pass (396 / 396, excluding the pre-existing tiktoken ESM e2e failure that's unrelated).api/server/controllers/agents+api/server/services/Endpoints/agentstests pass (208 / 208)./api/agents/v1/chat/completionsand/api/agents/v1/responses(streaming + non-streaming) with an orchestrator agent → expect handoff, not 500.Companion change
A matching safety-net fix in
danny-avila/agentsadds early validation inMultiAgentGraphso future consumers who hand it mismatchedagents/edgesget a clearedges reference agent(s) not present in agentserror instead of the cryptic LangGraph one.