Skip to content

🤝 fix: Load Handoff Agents for Agents API#12740

Merged
danny-avila merged 17 commits intodevfrom
claude/relaxed-colden-78db39
Apr 20, 2026
Merged

🤝 fix: Load Handoff Agents for Agents API#12740
danny-avila merged 17 commits intodevfrom
claude/relaxed-colden-78db39

Conversation

@danny-avila
Copy link
Copy Markdown
Owner

Summary

Fixes #12726 — the OpenAI-compatible /v1/chat/completions and Open Responses /v1/responses endpoints throw Found edge ending at unknown node <id> when the target agent has handoff edges to sub-agents.

  • Extracts the chat UI's BFS discovery + ACL-gated sub-agent initialization into a shared discoverConnectedAgents helper in packages/api/src/agents/discovery.ts.
  • Wires the helper into the two OpenAI-compat controllers so they pass [primaryConfig, ...subAgentConfigs] to createRun and route per-agent tool calls via a keyed agentToolContexts map (matching what initializeClient already does).
  • Filters orphaned edges (deleted sub-agents or sub-agents the caller lacks VIEW permission on) so the error path matches the existing chat UI behavior.
  • Collapses the previously-inline BFS in api/server/services/Endpoints/agents/initialize.js to the new helper — no behavior change, just deduplication.

Root cause

primaryConfig.edges is always loaded from the DB. packages/api/src/agents/run.ts:417 routes to MultiAgentGraph whenever edges.length > 0, regardless of agents.length. The chat UI's initializeClient had its own BFS that loaded every sub-agent referenced by those edges; the API controllers skipped that step and handed createRun a single-agent agents: [primaryConfig] with unresolved edges, which StateGraph then rejects at compile time.

Test plan

  • New unit tests: packages/api/src/agents/discovery.spec.ts — 9 cases covering multi-hop BFS, orphaned-agent filtering, permission-denied filtering, dedup / cycle safety, legacy agent_ids chain, MCP auth merge, validation failure, missing user.
  • Existing server/services/Endpoints/agents/initialize.spec.js still passes — adapted to the shared helper via DI overrides.
  • server/controllers/agents/__tests__/openai.spec.js and responses.unit.spec.js updated with the new mocks; 22 tests pass.
  • All packages/api/src/agents tests pass (396 / 396, excluding the pre-existing tiktoken ESM e2e failure that's unrelated).
  • All api/server/controllers/agents + api/server/services/Endpoints/agents tests pass (208 / 208).
  • Lint + rollup build clean across all workspaces.
  • Manual: POST to /api/agents/v1/chat/completions and /api/agents/v1/responses (streaming + non-streaming) with an orchestrator agent → expect handoff, not 500.
  • Manual regression: UI chat with the same orchestrator agent still works.
  • Manual: sub-agent deleted after orchestrator → API call still succeeds (orphan filter).
  • Manual: user API key lacks VIEW on a sub-agent → request skips that sub-agent and succeeds.

Companion change

A matching safety-net fix in danny-avila/agents adds early validation in MultiAgentGraph so future consumers who hand it mismatched agents/edges get a clear edges reference agent(s) not present in agents error instead of the cryptic LangGraph one.

Extracts the BFS discovery + ACL-gated initialization of handoff sub-agents
into a shared `discoverConnectedAgents` helper in `@librechat/api` and
wires it into the OpenAI-compatible `/v1/chat/completions` and Open
Responses `/v1/responses` controllers. These endpoints previously only
passed the primary agent config to `createRun` while keeping
`primaryConfig.edges` intact, which forced `MultiAgentGraph` into
multi-agent mode without loading the referenced sub-agents and caused
StateGraph to throw "Found edge ending at unknown node <id>".

The discovery helper also filters orphaned edges (deleted sub-agents or
those the caller lacks VIEW permission on), so API users see the same
graceful fallback the chat UI already had.
CI `tsc --noEmit -p packages/api/tsconfig.json` caught that the test
helpers typed `req` as `express.Request`, which is not assignable to
`DiscoverConnectedAgentsParams.req` (typed as `ServerRequest` whose
`user` is `IUser`). Local jest passed because ts-jest is transpile-only,
but the CI typecheck uses the full compiler.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes multi-agent handoff failures on the OpenAI-compatible /v1/chat/completions and Open Responses /v1/responses endpoints by ensuring all edge-referenced sub-agents are discovered, ACL-checked, initialized, and included in the createRun() agent list (matching the UI initialize flow).

Changes:

  • Introduces a shared BFS-based discoverConnectedAgents helper to load reachable sub-agents, merge MCP auth, and filter orphaned edges.
  • Updates OpenAI-compat and Responses controllers to use discovered [primary, ...subAgents] and to route tool execution via per-agent tool contexts.
  • Refactors the UI initializeClient flow to reuse the shared helper and updates unit tests/mocks accordingly.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
packages/api/src/agents/validation.ts Widens req typing for model validation to accept ServerRequest shape.
packages/api/src/agents/index.ts Re-exports the new discovery helper.
packages/api/src/agents/discovery.ts Adds shared BFS discovery + ACL gating + edge filtering + MCP auth merging.
packages/api/src/agents/discovery.spec.ts Adds unit coverage for discovery behavior (multi-hop, orphan/permission filtering, cycles, etc.).
api/server/services/Endpoints/agents/initialize.js Replaces inline BFS discovery with discoverConnectedAgents and keeps tool-context wiring.
api/server/controllers/agents/openai.js Loads handoff sub-agents for /v1/chat/completions, passes full agent list to createRun, routes tools by agentId.
api/server/controllers/agents/responses.js Same as above for /v1/responses (streaming + non-streaming).
api/server/controllers/agents/tests/openai.spec.js Updates mocks for new discovery + initialization expectations.
api/server/controllers/agents/tests/responses.unit.spec.js Updates mocks for new discovery + initialization expectations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +261 to +263
const chain = await createSequentialChainEdges([primaryConfig.id].concat(agent_ids), '{convo}');
collectEdges(chain as unknown as GraphEdge[]);
}
Comment thread api/server/controllers/agents/openai.js Outdated
Comment on lines +264 to +269
const modelsConfig = await getModelsConfig(req);
const {
agentConfigs: handoffAgentConfigs,
edges: discoveredEdges,
userMCPAuthMap: discoveredMCPAuthMap,
} = await discoverConnectedAgents(
Comment on lines +402 to +407
const modelsConfig = await getModelsConfig(req);
const {
agentConfigs: handoffAgentConfigs,
edges: discoveredEdges,
userMCPAuthMap: discoveredMCPAuthMap,
} = await discoverConnectedAgents(
@github-actions
Copy link
Copy Markdown
Contributor

GitNexus: 🚀 deployed

The LibreChat-pr-12740 index is now live on the MCP server.
Deploy run

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e9e435b796

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/api/src/agents/discovery.ts Outdated
collectEdges(chain as unknown as GraphEdge[]);
}

const edges = filterOrphanedEdges(Array.from(edgeMap.values()), skippedAgentIds);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove edges that reference skipped agents as sources

After discovering sub-agents, this path only applies filterOrphanedEdges before returning edges, and that helper drops skipped IDs from to only. If a skipped/deleted/inaccessible sub-agent appears on the from side (for example, a bidirectional graph has A -> B and B -> A, and B is skipped), B -> A is still passed to createRun, which leaves an edge that references an agent not present in agents and can still fail graph compilation instead of gracefully skipping. The returned edge list should remove skipped IDs from both endpoints.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — valid P1. Fixed in 7418dd8 by widening filterOrphanedEdges to drop edges where either from OR to references a skipped id. Added a bidirectional-graph regression test in discovery.spec.ts plus updated the to-only cases in edges.spec.ts.

@danny-avila
Copy link
Copy Markdown
Owner Author

@codex review

@github-actions
Copy link
Copy Markdown
Contributor

GitNexus: 🚀 deployed

The LibreChat-pr-12740 index is now live on the MCP server.
Deploy run

Addresses the P1 codex finding on #12740: `filterOrphanedEdges`
previously only removed edges whose `to` referenced a skipped agent.
Edges whose `from` was a skipped agent — the symmetric case in a
bidirectional graph like `A <-> B` where `B` is deleted or the user
lacks VIEW on it — leaked through to `createRun` and re-triggered
`Found edge ending at unknown node <id>` at StateGraph compile time.

The filter now drops an edge if either endpoint references a skipped
id, and the existing `to`-only test cases were updated to reflect the
stricter behavior. Adds a bidirectional-graph regression test in
`discovery.spec.ts`.
@danny-avila
Copy link
Copy Markdown
Owner Author

@codex review

@github-actions
Copy link
Copy Markdown
Contributor

GitNexus: 🚀 deployed

The LibreChat-pr-12740 index is now live on the MCP server.
Deploy run

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7418dd83b5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/api/src/agents/discovery.ts Outdated
Comment on lines +165 to +167
resourceType: ResourceType.AGENT,
resourceId: agent._id,
requiredPermission: PermissionBits.VIEW,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Enforce REMOTE_AGENT ACL when loading handoff sub-agents

discoverConnectedAgents authorizes discovered handoff agents with ResourceType.AGENT, but the OpenAI-compatible routes are explicitly protected by remote-agent permissions (createCheckRemoteAgentAccess / REMOTE_AGENT). This lets a caller who can access the orchestrator remotely and has regular AGENT visibility (but not REMOTE_AGENT visibility) execute an unshared sub-agent through /v1/chat/completions or /v1/responses, bypassing remote-sharing boundaries for sub-agents. The permission check here should be endpoint-aware (or use REMOTE_AGENT for these controllers) to preserve the same ACL model as route entry.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — confirmed the API routes gate on REMOTE_AGENT while discovery was only checking AGENT. Fixed in e491a23: added a resourceType param to discoverConnectedAgents (defaults to AGENT for the in-app chat path) and both API controllers now pass ResourceType.REMOTE_AGENT so every discovered sub-agent clears the same sharing boundary the route enforces. Test added in discovery.spec.ts.

Addresses the second P1 codex finding on #12740: the OpenAI-compat
`/v1/chat/completions` and Open Responses `/v1/responses` routes gate
the primary agent on `REMOTE_AGENT` (via `createCheckRemoteAgentAccess`),
but `discoverConnectedAgents` was checking handoff sub-agents against
the looser in-app `AGENT` resource type. That allowed a remote caller
who could reach the orchestrator but had only in-app visibility on a
sub-agent to invoke it via the API — bypassing the remote-sharing
boundary.

Adds an optional `resourceType` param to `discoverConnectedAgents`
(defaulting to `AGENT` for the chat UI path) and passes
`ResourceType.REMOTE_AGENT` from both API controllers so every
discovered sub-agent clears the same sharing boundary enforced at
route entry.
@danny-avila
Copy link
Copy Markdown
Owner Author

@codex review

@github-actions
Copy link
Copy Markdown
Contributor

GitNexus: 🚀 deployed

The LibreChat-pr-12740 index is now live on the MCP server.
Deploy run

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e491a23a77

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/api/src/agents/discovery.ts Outdated
Comment on lines +208 to +209
endpointOption,
allowedProviders,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Enforce provider allowlist when initializing handoff agents

discoverConnectedAgents forwards endpointOption directly into initializeAgent, but on the OpenAI-compatible routes this option uses the primary provider endpoint (for example openai), not the agents endpoint. Because initializeAgent only checks allowedProviders when isAgentsEndpoint(endpointOption.endpoint) is true, discovered sub-agents bypass the configured provider allowlist and can run disallowed providers via handoff edges.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — confirmed. The OpenAI-compat routes build endpointOption with endpoint: agent.provider (e.g. openai), so the isAgentsEndpoint-gated allowedProviders guard in initializeAgent was being skipped for discovered sub-agents. Fixed in 7ec1ba2: the helper now overrides endpoint to EModelEndpoint.agents on every per-sub-agent init call so the allowlist is always enforced for handoff targets, regardless of the caller's entry endpoint. Test added.

Addresses the third P1 codex finding on #12740: `discoverConnectedAgents`
forwarded the caller's `endpointOption` verbatim into `initializeAgent`,
but on the OpenAI-compat routes that option's `endpoint` is the primary
agent's provider (e.g. `openai`), not `agents`. `initializeAgent` only
enforces `allowedProviders` when `isAgentsEndpoint(endpointOption.endpoint)`
is true, so handoff sub-agents silently bypassed the provider allowlist
configured under `endpoints.agents.allowedProviders`.

Override `endpointOption.endpoint` to `EModelEndpoint.agents` for every
per-sub-agent init call. The primary agent still uses the caller's
endpointOption as before — this only affects the BFS-loaded handoff
targets. Regression test asserts the override.
@github-actions
Copy link
Copy Markdown
Contributor

GitNexus: 🚀 deployed

The LibreChat-pr-12740 index is now live on the MCP server.
Deploy run

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🚀

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@danny-avila
Copy link
Copy Markdown
Owner Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7ec1ba246f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/api/src/agents/discovery.ts Outdated
Comment on lines +288 to +292
const edges = filterOrphanedEdges(Array.from(edgeMap.values()), skippedAgentIds);

return {
agentConfigs,
edges,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Prune disconnected sub-agents after orphan edge filtering

After filterOrphanedEdges removes edges that referenced skipped agents, discoverConnectedAgents still returns every initialized sub-agent in agentConfigs. In a graph like A -> B -> C, if B is skipped (missing or no VIEW), C is still initialized from the original edge scan, both edges are then removed, and callers pass C into createRun as an extra agent; because createRun switches to multi-agent mode when agents.length > 1, C can run as an unintended parallel start node even though no surviving edge connects it to A. This should prune agentConfigs to agents that remain reachable from the primary through the filtered edge set.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — confirmed. In an A -> B -> C graph (with B's edges stored directly on A), B being skipped stranded C as an initialized-but-disconnected agent, which createRun was then turning into a stray parallel start. Fixed in 4cbfffc: after filterOrphanedEdges, compute the reachability closure from the primary through surviving edges and prune agentConfigs to that set. Two regression tests added — one for the pruning case, one that verifies agents on surviving paths are kept.

Addresses the fourth P1 codex finding on #12740: BFS eagerly initializes
every sub-agent referenced in the primary's edge scan, but once
`filterOrphanedEdges` drops edges whose endpoints were skipped, some of
those sub-agents end up disconnected from the primary. In an `A -> B ->
C` graph (edges stored directly on A) where B is skipped (missing or
no VIEW), both edges are filtered, but C was already loaded and would
still be passed to `createRun` — which flips into multi-agent mode on
`agents.length > 1` and turns C into an unintended parallel start node.

After filtering edges, compute the set of agent ids reachable from the
primary through the surviving edge set and prune `agentConfigs` to that
set. Two regression tests added: one for the pruning case, one that
confirms agents connected via surviving edges are still kept.
@danny-avila
Copy link
Copy Markdown
Owner Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4cbfffc9ee

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

checkAgentInit,
skippedAgentIds,
onAgentInitialized: (agentId, agent, config) => {
agentConfigs.set(agentId, config);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Drop pruned agents from initializeClient run list

discoverConnectedAgents now prunes sub-agents that become unreachable after edge filtering, but this callback writes every initialized sub-agent into agentConfigs before that pruning happens, and the later merge step only adds entries (never removes stale ones). In a graph like A->B and B->C where B is skipped (missing/no access), C is pruned from the helper result but still remains in agentConfigs, so AgentClient passes [A, C] to createRun and C can execute as an unintended parallel start node.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — this was the missing reconciliation in initialize.js after the pass-4 prune. Fixed in e54440b: the callback no longer writes agentConfigs (it fires during BFS, before the helper knows which agents will end up disconnected), and the outer merge step now copies directly from the pruned discoveredConfigs — so stranded sub-agents like C in A->B->C with B skipped are no longer handed to AgentClient. The API controllers (openai.js / responses.js) were already using the helper's returned map directly, so they were unaffected.

@github-actions
Copy link
Copy Markdown
Contributor

GitNexus: 🚀 deployed

The LibreChat-pr-12740 index is now live on the MCP server.
Deploy run

…lback

Addresses the fifth P1 codex finding on #12740: `onAgentInitialized`
fires during BFS, BEFORE the helper prunes agents that become
disconnected once `filterOrphanedEdges` runs. Writing the sub-agent
straight into the outer `agentConfigs` there and then only additively
merging the pruned `discoveredConfigs` left stranded entries in the
outer map, and `AgentClient` would still hand them to `createRun` as
extra parallel start nodes (the exact failure mode the pass-4 prune
was meant to eliminate for the API controllers).

Drop the `agentConfigs.set` from the callback and replace the additive
merge with a direct copy from `discoveredConfigs`, which is now the
single authoritative source of what the run should see. The
per-agent tool context map is still populated during BFS — stale
entries there are harmless because they're only read by closure inside
`ON_TOOL_EXECUTE` and are unreachable once the agent is not in
`agentConfigs`.
@danny-avila
Copy link
Copy Markdown
Owner Author

Hit the 5-round cap on codex review iterations. Summary of landed fixes from this review loop (all codex P1 findings, all validated as real bugs):

Pass Commit Fix
1 7418dd8 filterOrphanedEdges now drops edges where either from or to references a skipped agent — previously only to was checked, leaking B -> A through on a bidirectional graph where B was skipped.
2 e491a23 API routes now pass ResourceType.REMOTE_AGENT to discoverConnectedAgents — previously sub-agents were authorized against the looser AGENT resource, letting a caller bypass the remote-sharing boundary for handoff targets.
3 7ec1ba2 Sub-agent init now forces endpointOption.endpoint = EModelEndpoint.agents so initializeAgent's isAgentsEndpoint-gated allowedProviders guard always fires — previously the OpenAI-compat routes passed the primary's provider as the endpoint, bypassing the provider allowlist.
4 4cbfffc After filterOrphanedEdges, agents unreachable from the primary via surviving edges are pruned from agentConfigs — previously a stranded C in A -> B -> C (with B skipped) survived as a stray parallel start node.
5 e54440b initialize.js no longer seeds the outer agentConfigs from the pre-pruning BFS callback — now copies directly from the helper's pruned result, so the pass-4 fix actually lands on the in-app chat path too.

CI is green. Happy to run another pass if anything here looks wrong, but otherwise this is ready.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e54440b7d1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +318 to +321
for (const agentId of [...agentConfigs.keys()]) {
if (!reachable.has(agentId)) {
agentConfigs.delete(agentId);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Prune unreachable edges before returning discovery result

After computing reachable, this function deletes disconnected agent configs but still returns the original edges array unchanged. In graphs where a skipped intermediate node disconnects a downstream subgraph (for example A->B->C with B skipped and C having C->D), agentConfigs drops C/D but edges can still contain C->D, so createRun receives edges that reference agents not present in agents and can fail multi-agent graph compilation with unknown-node errors. The returned edges should be filtered to only include endpoints in the reachable set.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep — confirmed and fixed in e000354. The reachability pass now filters both agentConfigs (unchanged) and the returned edges list in the same step, so A->B->C->D with B skipped correctly drops C->D instead of letting it through to createRun. Regression test prunes surviving-but-unreachable edges from the return value (A->B->C->D, B skipped) asserts every returned edge endpoint is either primaryConfig.id or a key of agentConfigs.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 222716d193

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread api/server/controllers/agents/openai.js Outdated
Comment on lines +290 to +294
resourceType: ResourceType.REMOTE_AGENT,
},
{
getAgent: db.getAgent,
checkPermission,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reuse remote-access resolver for handoff permission checks

This discovery call checks sub-agents with PermissionService.checkPermission on REMOTE_AGENT, but the route-level middleware (createCheckRemoteAgentAccess) grants remote access via getRemoteAgentPermissions, which also treats AGENT owners (SHARE bit) as remotely authorized. In practice, a user can pass the primary model permission check yet have owned handoff agents silently skipped here, breaking multi-agent handoffs on /v1/chat/completions (and the same pattern is present in the Responses controller).

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — confirmed. Fixed in 0de3684. Both API controllers now build their checkPermission wrapper using getRemoteAgentPermissions (imported from @librechat/api) with getEffectivePermissions injected from PermissionService, and compare the resulting bitmask against the required permission via hasPermissions. That mirrors what createCheckRemoteAgentAccess does at the route, so sub-agents owned by the caller (SHARE on AGENT, no explicit REMOTE_AGENT grant) are no longer silently skipped.

Addresses codex pass-10 P1 on #12740. The API controllers were handing
`discoverConnectedAgents` a raw `PermissionService.checkPermission` call
against `ResourceType.REMOTE_AGENT`, but the route-level middleware
(`createCheckRemoteAgentAccess`) authorizes the primary agent via
`getRemoteAgentPermissions`, which first consults the AGENT ACL and
treats owners with the SHARE bit as remotely authorized even without
an explicit REMOTE_AGENT grant. The mismatch meant a user could open
the primary via `/v1/chat/completions` or `/v1/responses`, but their
own owned handoff sub-agents were silently skipped — breaking
multi-agent handoffs for the common "owner runs their own multi-agent
orchestrator" case.

Both controllers now pass `discoverConnectedAgents` a `checkPermission`
wrapper that delegates to `getRemoteAgentPermissions` (with
`getEffectivePermissions` injected from `PermissionService`) and
compares the returned bitmask against the required permission via
`hasPermissions`. Sub-agents are now authorized by the exact same
rules the route middleware applies to the primary.
@github-actions
Copy link
Copy Markdown
Contributor

GitNexus: 🚀 deployed

The LibreChat-pr-12740 index is now live on the MCP server.
Deploy run

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Nice work!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@danny-avila
Copy link
Copy Markdown
Owner Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0de3684be9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/api/src/agents/discovery.ts Outdated
return ids.every((id) => typeof id !== 'string' || reachableSet.has(id));
};

const reachable = new Set<string>([primaryConfig.id]);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve valid multi-start branches during reachability prune

Seeding reachability with only primaryConfig.id causes this helper to drop edge-defined components that are intentionally disconnected from the primary (for example, a valid B -> C branch where B has no incoming edges). MultiAgentGraph treats no-incoming agents as start nodes, so those branches previously executed in parallel, but this filter now removes both their edges and agent configs, changing runtime behavior for existing multi-start graphs in every caller of discoverConnectedAgents.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — reconciled with the earlier pass-4 'prune accidental orphans' using pre-filter reachability as the signal. Fixed in 7cc7aed.

  • Agents that were reachable from the primary via the pre-filter edges but lose that path after filterOrphanedEdges runs are accidental orphans (a skipped hop broke the chain) → prune.
  • Agents that were never reachable from the primary, even pre-filter, are intentional parallel starts → seed them into post-filter reachability so their component survives.

New test preserves user-defined parallel-start branches disconnected from the primary covers the A -> B + X -> Y scenario. The pass-audit A -> B -> C -> D, B skipped regression test still passes because C/D were pre-filter reachable through B and lose that reachability post-filter.

Addresses codex pass-11 P2 on #12740. The post-filter reachability
prune seeded only from `primaryConfig.id`, which killed
`MultiAgentGraph`'s legitimate multi-start pattern — a user-defined
edge like `X -> Y` where X has no incoming path (X is an intentional
parallel starting node, run alongside the primary) was being dropped
because neither X nor Y was reachable from the primary.

Reconcile the tension with pass-4 ("prune accidental orphans when an
intermediate is skipped") by using pre-filter reachability as the
signal:

- An agent that WAS reachable from the primary via the original
  (pre-filter) edges but loses that path when `filterOrphanedEdges`
  runs is an accidental orphan (a skipped hop broke the chain) — prune.
- An agent that was NEVER reachable from the primary, even pre-filter,
  is an intentional parallel start — seed it into post-filter
  reachability so its component survives.

Surviving-edge endpoint references still keep an agent (co-sources in
multi-source edges). New test `preserves user-defined parallel-start
branches disconnected from the primary` covers the pass-11 scenario;
the existing `A->B->C->D, B skipped` regression test continues to
pass because C/D were pre-filter reachable through B and lose that
reachability after filtering.
@danny-avila
Copy link
Copy Markdown
Owner Author

@codex review

@github-actions
Copy link
Copy Markdown
Contributor

GitNexus: 🚀 deployed

The LibreChat-pr-12740 index is now live on the MCP server.
Deploy run

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7cc7aedea0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +371 to +374
for (const agentId of agentConfigs.keys()) {
if (!preFilterReachable.has(agentId)) {
postFilterSeeds.add(agentId);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Seed only disconnected branch roots after filtering

postFilterSeeds currently adds every initialized agent that was not reachable from the primary in the pre-filter graph. In a graph with a disconnected component like X -> Y, if X is skipped (missing/no VIEW) but Y loads, Y is still seeded here, so discovery returns agents: [primary, Y] with no connecting edge; createRun then treats Y as an unintended start node and executes it in parallel. This is a behavior regression in ACL/orphan scenarios and can trigger tool execution for agents that should have been pruned with their skipped upstream.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 54de02c. Tightened the seed criterion from 'not pre-filter reachable from primary' to 'no incoming edge in pre-filter graph'. That catches the Y in X -> Y case: Y had an incoming edge pre-filter, so it's a downstream step, not a legitimate parallel start — and when X gets skipped, Y is correctly pruned rather than promoted to a root. Regression test does not promote a downstream orphan to a parallel start when its only upstream is skipped added; the pass-11 parallel-start preservation test still passes (X in the user-defined X -> Y branch has no incoming pre-filter, so it's still seeded).

…ng edge'

Addresses codex pass-12 P1 on #12740. The pass-11 seed heuristic — 'agent
is in `agentConfigs` but was not pre-filter reachable from the primary' —
was too permissive. A downstream agent like Y in `X -> Y` where X gets
skipped (missing / no VIEW) was never pre-filter reachable from the
primary either, so the old rule promoted Y to a parallel start node and
discovery returned `agents: [primary, Y]` with no connecting edge. The
SDK then ran Y as an unintended parallel root — exactly the orphan
behavior pass-4 wanted to prevent.

Tighter criterion: seed a post-filter reachability root only when the
agent had NO incoming edge in the pre-filter graph. That matches
`MultiAgentGraph.analyzeGraph`'s "no-incoming-edge" definition of a
start node applied to the user's original declared topology, so:

- `A -> B` plus a user-defined `X -> Y` parallel branch: X has no
  incoming pre-filter → seeded → X and Y both survive.
- `A -> B` plus `X -> Y` with X skipped: Y had an incoming pre-filter
  (`X -> Y`) → NOT seeded → Y is pruned as the orphan it is.
- `A -> B -> C` with B skipped: C had an incoming pre-filter (`B -> C`)
  → NOT seeded → C is pruned.

New test `does not promote a downstream orphan to a parallel start when
its only upstream is skipped` locks in the pass-12 scenario. The pass-11
`preserves user-defined parallel-start branches` test continues to hold.
@danny-avila
Copy link
Copy Markdown
Owner Author

@codex review

@github-actions
Copy link
Copy Markdown
Contributor

GitNexus: 🚀 deployed

The LibreChat-pr-12740 index is now live on the MCP server.
Deploy run

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 54de02cb61

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread api/server/controllers/agents/openai.js Outdated
getUserCodeFiles: db.getUserCodeFiles,
getToolFilesByIds: db.getToolFilesByIds,
getCodeGeneratedFiles: db.getCodeGeneratedFiles,
filterFilesByAgentAccess,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove AGENT-only file filter from remote agent initialization

Passing filterFilesByAgentAccess into initializeAgent here makes remote API calls enforce ResourceType.AGENT on attached context files (api/server/services/Files/permissions.js uses resourceType: ResourceType.AGENT at the access check), even though this route authorizes callers with REMOTE_AGENT permissions via getRemoteAgentPermissions. A user who has REMOTE_AGENT_VIEWER but no AGENT_VIEW can invoke the shared agent, but all owner-attached context files get filtered out, so tools like file_search/context-backed retrieval silently stop working for remote consumers; the same regression is also introduced in responses.js with the identical dbMethods wiring.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right — this was a regression I introduced when I refactored the DB-method bundle into a shared dbMethods object. Fixed in 6ce3462: both API controllers now omit filterFilesByAgentAccess from dbMethods (and drop the now-unused import). The in-app chat initialize.js keeps it because that path legitimately authorizes at the AGENT level.

This restores the pre-refactor API behavior — primeResources sees filterFiles: undefined and skips the per-file ACL check, so REMOTE_AGENT_VIEWER-only callers get the owner-attached context files the route's getRemoteAgentPermissions has already decided they can see.

Addresses codex pass-13 P1 on #12740. When I refactored the API
controllers' DB-method bundle, I inadvertently started forwarding
`filterFilesByAgentAccess` into `initializeAgent`. That helper calls
`checkPermission` with `resourceType: ResourceType.AGENT`, but these
routes authorize callers through `REMOTE_AGENT` (via
`getRemoteAgentPermissions`). A user granted `REMOTE_AGENT_VIEWER` on
a shared agent but lacking direct `AGENT_VIEW` could invoke the agent
yet all its owner-attached context files would get silently filtered
out — breaking `file_search`/context retrieval for remote consumers.

Drop `filterFilesByAgentAccess` from the OpenAI-compat and Responses
controllers' `dbMethods` (and remove the now-unused import). The chat
UI's `initialize.js` keeps it since that path legitimately authorizes
at the AGENT level. No functional change inside the helper — passing
`undefined` simply tells `primeResources` to skip the per-file ACL
filter, restoring the pre-refactor API behavior.
@github-actions
Copy link
Copy Markdown
Contributor

GitNexus: 🚀 deployed

The LibreChat-pr-12740 index is now live on the MCP server.
Deploy run

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Swish!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@danny-avila
Copy link
Copy Markdown
Owner Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6ce34624e8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +419 to +421
for (const agentId of [...agentConfigs.keys()]) {
if (!reachable.has(agentId) && !referencedByEdge.has(agentId)) {
agentConfigs.delete(agentId);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Prune unreachable co-sources from surviving edges

When an upstream agent is skipped, a multi-source edge can still reference a source that is no longer reachable (for example A->C, skipped X->B, and ['B','C']->D). This loop keeps B solely because it is still referenced by edges, so createRun includes B even though the reachability pass excluded it. In MultiAgentGraph.analyzeGraph, that leaves B with no incoming edges and it executes as an unintended parallel start node, which changes workflow behavior after permission/orphan filtering.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — fixed in 718b000. Instead of keeping unreachable co-sources via the referencedByEdge fallback (which then propped B up as an incoming-less parallel root), surviving multi-source edges now get their from array sanitized to drop any unreachable co-source. [B,C] -> D in the scenario you described becomes [C] -> D, every endpoint of every surviving edge is guaranteed to be in reachable, and the agent prune collapses to a strict reachable.has(agentId) check. Regression test strips unreachable co-sources from surviving multi-source edges (no stray parallel root) asserts B is absent from every surviving edge endpoint; all 22 prior discovery tests still pass unchanged.

Addresses codex pass-14 P1 on #12740. The earlier pass-8 fix kept any
agent referenced as an endpoint of a surviving edge (via a
`referencedByEdge` fallback) to avoid the SDK's `validateEdgeAgents`
failing on missing nodes. But that fallback propped up unreachable
co-sources too: with `[A -> C, X -> B, [B,C] -> D]` and X skipped,
`X -> B` gets filtered, the `[B,C] -> D` fan-in survives because C is
reachable, and B stays in `agentConfigs` solely because the fan-in
still lists it. `MultiAgentGraph.analyzeGraph` then sees B with no
incoming edge and runs it as an unintended parallel root.

Sanitize surviving edges instead: for a kept edge whose `from` is an
array, filter out any co-source that isn't reachable. The SDK's
per-source `addEdge` fires independently, so dropping an unreachable
co-source doesn't invalidate the remaining routes — in the scenario
above `[B,C] -> D` becomes `[C] -> D`, every endpoint of every
surviving edge is now reachable, and the agent prune collapses to a
strict `reachable.has(agentId)` check. No more referenced-by-edge
fallback.

Regression test added: `strips unreachable co-sources from surviving
multi-source edges (no stray parallel root)` — asserts B is absent
from every surviving edge endpoint and the fan-in's `from` is just
`['C']`. All 22 prior discovery tests still pass unchanged.
@github-actions
Copy link
Copy Markdown
Contributor

GitNexus: 🚀 deployed

The LibreChat-pr-12740 index is now live on the MCP server.
Deploy run

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@danny-avila
Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Another round soon, please!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@danny-avila
Copy link
Copy Markdown
Owner Author

Codex has converged — two consecutive @codex review requests on the current tip (718b000) cleared without surfacing new findings. All CI green. Final tally of landed fixes from the review loop:

Initial bug fix

  • e9e435b — Main fix: shared discoverConnectedAgents helper + wired into both OpenAI-compat controllers.
  • 5cc74e3 — CI TypeScript check on the test helper type.

Codex passes 1–6 (initial six iterations)

Pass Commit Severity Fix
1 7418dd8 P1 filterOrphanedEdges drops edges whose from endpoint is skipped (not just to).
2 e491a23 P1 API routes pass ResourceType.REMOTE_AGENT so sub-agents honor the remote-sharing boundary.
3 7ec1ba2 P1 Sub-agents init with endpoint: agents so allowedProviders always enforces.
4 4cbfffc P1 Prune agents unreachable via surviving edges so stranded sub-agents don't become stray roots.
5 e54440b P1 initialize.js copies agentConfigs from the pruned helper result instead of via the pre-pruning callback.
6 e000354 P1 Reachability-based edge prune also filters the returned edges list (caught by external audit + codex).

External audit follow-up

  • e000354 — Findings 1/2/4/5/7: filter stale edges, gate getModelsConfig behind edges?.length, clone userMCPAuthMap, drop dead ?? [].
  • 6450aa8 — Audit NITs: clone in the else branch for symmetry, drop dead !edges guard in initialize.js.

Codex passes 7–14 (second wave)

Pass Commit Severity Fix
7 4982f1c P1 (superseded by pass-8) Initial attempt at fan-in reachability.
8 6879f45 P1 SDK treats multi-source edges as per-source addEdge (OR) — reverted pass-7 to match.
9 222716d P2 filterOrphanedEdges strips skipped co-members from arrays instead of dropping whole edges.
10 0de3684 P1 API controllers use getRemoteAgentPermissions so AGENT owners (SHARE bit) aren't silently skipped.
11 7cc7aed P2 Preserve user-defined parallel-start branches disconnected from the primary.
12 54de02c P1 Tighten parallel-start seed to "no incoming edge in pre-filter graph" — downstream orphans don't get promoted.
13 6ce3462 P1 Remove filterFilesByAgentAccess from API dbMethods (AGENT ACL → REMOTE_AGENT ACL regression).
14 718b000 P1 Sanitize unreachable co-sources from surviving multi-source edges (no stray parallel roots).

Tests

  • 23 discovery tests covering every scenario codex raised.
  • 30 edges tests covering the sanitized orphan filter.
  • 24 controller/service tests passing for openai, responses, initialize.
  • Every CI job green.

The companion agents-SDK change (danny-avila/agents#111) was clean from codex in a single pass and remains unchanged.

@chatgpt-codex-connector
Copy link
Copy Markdown

Summary

No follow-up code changes are needed based on this trigger. The PR appears converged (two consecutive clean @codex review runs on 718b0000b with green CI), so I did not modify files, create a commit, or open a new PR.

Checks

  • git status --short --branch (working tree is clean on branch work)

View task →

@danny-avila danny-avila merged commit d2cbd55 into dev Apr 20, 2026
11 checks passed
@danny-avila danny-avila deleted the claude/relaxed-colden-78db39 branch April 20, 2026 06:20
krgokul pushed a commit to syedhabib39/LibreChat that referenced this pull request Apr 20, 2026
* 🤝 fix: load handoff sub-agents on OpenAI-compat endpoints (danny-avila#12726)

Extracts the BFS discovery + ACL-gated initialization of handoff sub-agents
into a shared `discoverConnectedAgents` helper in `@librechat/api` and
wires it into the OpenAI-compatible `/v1/chat/completions` and Open
Responses `/v1/responses` controllers. These endpoints previously only
passed the primary agent config to `createRun` while keeping
`primaryConfig.edges` intact, which forced `MultiAgentGraph` into
multi-agent mode without loading the referenced sub-agents and caused
StateGraph to throw "Found edge ending at unknown node <id>".

The discovery helper also filters orphaned edges (deleted sub-agents or
those the caller lacks VIEW permission on), so API users see the same
graceful fallback the chat UI already had.

* 🧪 fix: use ServerRequest in discovery spec helpers

CI `tsc --noEmit -p packages/api/tsconfig.json` caught that the test
helpers typed `req` as `express.Request`, which is not assignable to
`DiscoverConnectedAgentsParams.req` (typed as `ServerRequest` whose
`user` is `IUser`). Local jest passed because ts-jest is transpile-only,
but the CI typecheck uses the full compiler.

* 🪲 fix: drop orphan edges on both endpoints, not just `to`

Addresses the P1 codex finding on danny-avila#12740: `filterOrphanedEdges`
previously only removed edges whose `to` referenced a skipped agent.
Edges whose `from` was a skipped agent — the symmetric case in a
bidirectional graph like `A <-> B` where `B` is deleted or the user
lacks VIEW on it — leaked through to `createRun` and re-triggered
`Found edge ending at unknown node <id>` at StateGraph compile time.

The filter now drops an edge if either endpoint references a skipped
id, and the existing `to`-only test cases were updated to reflect the
stricter behavior. Adds a bidirectional-graph regression test in
`discovery.spec.ts`.

* 🔒 fix: enforce REMOTE_AGENT ACL on handoff sub-agents for API routes

Addresses the second P1 codex finding on danny-avila#12740: the OpenAI-compat
`/v1/chat/completions` and Open Responses `/v1/responses` routes gate
the primary agent on `REMOTE_AGENT` (via `createCheckRemoteAgentAccess`),
but `discoverConnectedAgents` was checking handoff sub-agents against
the looser in-app `AGENT` resource type. That allowed a remote caller
who could reach the orchestrator but had only in-app visibility on a
sub-agent to invoke it via the API — bypassing the remote-sharing
boundary.

Adds an optional `resourceType` param to `discoverConnectedAgents`
(defaulting to `AGENT` for the chat UI path) and passes
`ResourceType.REMOTE_AGENT` from both API controllers so every
discovered sub-agent clears the same sharing boundary enforced at
route entry.

* 🧯 fix: enforce allowedProviders for discovered sub-agents

Addresses the third P1 codex finding on danny-avila#12740: `discoverConnectedAgents`
forwarded the caller's `endpointOption` verbatim into `initializeAgent`,
but on the OpenAI-compat routes that option's `endpoint` is the primary
agent's provider (e.g. `openai`), not `agents`. `initializeAgent` only
enforces `allowedProviders` when `isAgentsEndpoint(endpointOption.endpoint)`
is true, so handoff sub-agents silently bypassed the provider allowlist
configured under `endpoints.agents.allowedProviders`.

Override `endpointOption.endpoint` to `EModelEndpoint.agents` for every
per-sub-agent init call. The primary agent still uses the caller's
endpointOption as before — this only affects the BFS-loaded handoff
targets. Regression test asserts the override.

* ✂️ fix: prune unreachable sub-agents after orphan-edge filtering

Addresses the fourth P1 codex finding on danny-avila#12740: BFS eagerly initializes
every sub-agent referenced in the primary's edge scan, but once
`filterOrphanedEdges` drops edges whose endpoints were skipped, some of
those sub-agents end up disconnected from the primary. In an `A -> B ->
C` graph (edges stored directly on A) where B is skipped (missing or
no VIEW), both edges are filtered, but C was already loaded and would
still be passed to `createRun` — which flips into multi-agent mode on
`agents.length > 1` and turns C into an unintended parallel start node.

After filtering edges, compute the set of agent ids reachable from the
primary through the surviving edge set and prune `agentConfigs` to that
set. Two regression tests added: one for the pruning case, one that
confirms agents connected via surviving edges are still kept.

* 🔁 fix: don't seed initialize.js agentConfigs from the pre-pruning callback

Addresses the fifth P1 codex finding on danny-avila#12740: `onAgentInitialized`
fires during BFS, BEFORE the helper prunes agents that become
disconnected once `filterOrphanedEdges` runs. Writing the sub-agent
straight into the outer `agentConfigs` there and then only additively
merging the pruned `discoveredConfigs` left stranded entries in the
outer map, and `AgentClient` would still hand them to `createRun` as
extra parallel start nodes (the exact failure mode the pass-4 prune
was meant to eliminate for the API controllers).

Drop the `agentConfigs.set` from the callback and replace the additive
merge with a direct copy from `discoveredConfigs`, which is now the
single authoritative source of what the run should see. The
per-agent tool context map is still populated during BFS — stale
entries there are harmless because they're only read by closure inside
`ON_TOOL_EXECUTE` and are unreachable once the agent is not in
`agentConfigs`.

* 🔬 fix: address audit findings on discovery helper

Resolves findings from a comprehensive external audit of danny-avila#12740.

**Finding 1 (CRITICAL) — stale edges survive the reachability prune.**
The pass-4 prune removed unreachable agents from `agentConfigs` but left
matching edges in the return value. In an `A -> B -> C -> D` graph (all
edges stored on A) where B is skipped, `filterOrphanedEdges` drops A->B
and B->C but keeps C->D (neither endpoint is skipped). The caller then
sees `agentConfigs` without C/D but `edges` still references them,
flipping `createRun` into multi-agent mode with mismatched agents/edges
— the exact crash this PR is supposed to fix. Now filter the edge list
to the reachable set in the same pass, so the returned shape is
self-consistent: every edge endpoint is either the primary id or a key
of `agentConfigs`. New regression test covers A->B->C->D with B skipped.

**Finding 2 (MAJOR) — unconditional `getModelsConfig` on every API
request.** The OpenAI-compat and Responses controllers called
`getModelsConfig(req)` and `discoverConnectedAgents` even when the
primary agent had no edges (the common single-agent API case). Gate
both behind `primaryConfig.edges?.length > 0` so single-agent runs
don't pay that cost.

**Finding 5 (MINOR) — silent mutation of caller's
`primaryConfig.userMCPAuthMap`.** The helper aliased that object and
then `Object.assign`'d sub-agent entries into it, changing the caller's
config in-place. Shallow-clone up front so the returned merged map is
the only destination.

**Finding 7 (NIT) — dead `?? []` coalescing.**
`filterOrphanedEdges` always returns a concrete array, so the
`discoveredEdges ?? []` fallback was never reached. Simplified the
`primaryConfig.edges = …` assignment.

Also adds a test that verifies `primaryConfig.userMCPAuthMap` is not
mutated in-place.

* 🧹 chore: address audit NITs on discovery helper

Addresses two NIT findings from the post-fix audit:

**F1** — the shallow clone on `primaryConfig.userMCPAuthMap` was only
applied on the primary side; the `else` branch (hit when the primary
had no MCP auth and the first sub-agent seeds the map) assigned the
sub-agent's `config.userMCPAuthMap` directly, so a later sub-agent's
`Object.assign` mutated the first one's map in place. Harmless in
practice (per-request ephemeral objects) but asymmetric. Clone in the
else branch too. Test added.

**F2** — `initialize.js` had a defensive `if (agentConfigs.size > 0 &&
!edges) edges = []` normalizer. Pre-existing dead code: the helper now
always returns a concrete array from `filteredEdges.filter(...)`.
Removed for clarity.

* 🕸 fix: require all sources reachable when traversing fan-in edges

Addresses the seventh P1 codex finding on danny-avila#12740: the reachability BFS
advanced through an edge as soon as any of its `from` endpoints matched
the current frontier node (`sources.includes(current)`), but the
subsequent edge filter required ALL sources to be reachable (`every`).
The two-semantics mismatch let a fan-in edge like `{from: ['A','B'],
to: 'C'}` mark C reachable purely via A even when B had no path from
the primary, then drop the edge itself at filter time. Result: C
survived in `agentConfigs` with no surviving edge connecting it to A,
so `createRun` flipped into multi-agent mode on `agents.length > 1`
and C ran as an unintended parallel root.

Replace the BFS with a fixed-point iteration keyed on the same
all-sources-reachable predicate used by the filter, so traversal and
filtering stay aligned and multi-source edges only fire once every
source is in the reachable set.

Two regression tests added:
- `{from: ['A','B'], to: 'C'}` with B having no incoming path — asserts
  neither B nor C leak into the result.
- `A -> B`, `A -> C`, `['B','C'] -> D` — asserts the fan-in edge fires
  and D becomes reachable once both B and C are.

* 🔀 fix: match SDK OR semantics for multi-source edge reachability

Reverts the all-sources-required reachability gate from 4982f1c and
replaces it with an any-source-reachable model, which matches how
`@librechat/agents`'s `MultiAgentGraph.createWorkflow` actually wires
multi-source edges at runtime (per-source `builder.addEdge(source,
destination)`). With the previous `every` gate, a legitimate handoff
edge `{ from: ['A', 'B'], to: 'C' }` where B had no incoming path was
pruned along with C, regressing OR-semantics routing that the SDK
would otherwise handle correctly.

New behavior:

1. Reachability: an edge advances when ANY of its `from` endpoints is
   already reachable. Fixed-point iteration over `filteredEdges`.
2. Edge filter: keep an edge when it has at least one reachable source
   AND all destinations are reachable (a missing destination would
   still crash `StateGraph.compile` with `Found edge ending at unknown
   node`).
3. Agent prune: keep agents that are reachable OR referenced on any
   endpoint of a surviving edge. The second clause preserves co-sources
   in multi-source edges (B in `{ from: ['A','B'], to: 'C' }` when
   nothing else reaches B) so the SDK's per-source `addEdge` — and the
   `validateEdgeAgents` safety-net I added to the SDK in danny-avila#111 — still
   finds B as a node.

The pass-audit A->B->C->D regression test continues to pass: with B
skipped, `filterOrphanedEdges` drops both B-adjacent edges, reachability
never expands past A, C->D has no reachable source so it gets filtered,
and C/D are pruned because they're neither reachable nor referenced.

* ✂️ fix: strip skipped co-members from multi-source/multi-dest edges

Addresses codex pass-9 P2 on danny-avila#12740. `filterOrphanedEdges` previously
dropped an edge whenever any `from` id was skipped, which was correct
for scalar edges but over-aggressive for multi-source ones: the agents
SDK adds one `builder.addEdge(source, destination)` per source, so
`{ from: ['A','B'], to: 'C' }` with B skipped still has a valid
`A -> C` route that was being thrown away.

Now sanitize each endpoint:
- Scalar skipped → drop the whole edge (no route survives).
- Array with some skipped → strip the skipped ids, keep the edge with
  the surviving members. If the array empties out, drop the edge.

Symmetric handling for `to` covers multi-destination fan-out when one
co-destination is skipped. Tests updated/added:
- `strips skipped co-sources from multi-source edges…`
- `strips skipped co-destinations from multi-destination edges`
- `drops multi-member edges only when every member on a side is skipped`
- Discovery-side: `preserves valid routes when one co-source of a
  multi-source edge is skipped` asserts the end-to-end behavior —
  skipped co-source B gets stripped from the edge, A->C routing
  survives, and C remains in `agentConfigs`.

* 🔓 fix: respect SHARE-on-AGENT fallback for handoff ACL on API routes

Addresses codex pass-10 P1 on danny-avila#12740. The API controllers were handing
`discoverConnectedAgents` a raw `PermissionService.checkPermission` call
against `ResourceType.REMOTE_AGENT`, but the route-level middleware
(`createCheckRemoteAgentAccess`) authorizes the primary agent via
`getRemoteAgentPermissions`, which first consults the AGENT ACL and
treats owners with the SHARE bit as remotely authorized even without
an explicit REMOTE_AGENT grant. The mismatch meant a user could open
the primary via `/v1/chat/completions` or `/v1/responses`, but their
own owned handoff sub-agents were silently skipped — breaking
multi-agent handoffs for the common "owner runs their own multi-agent
orchestrator" case.

Both controllers now pass `discoverConnectedAgents` a `checkPermission`
wrapper that delegates to `getRemoteAgentPermissions` (with
`getEffectivePermissions` injected from `PermissionService`) and
compares the returned bitmask against the required permission via
`hasPermissions`. Sub-agents are now authorized by the exact same
rules the route middleware applies to the primary.

* 🌱 fix: preserve user-defined parallel-start branches

Addresses codex pass-11 P2 on danny-avila#12740. The post-filter reachability
prune seeded only from `primaryConfig.id`, which killed
`MultiAgentGraph`'s legitimate multi-start pattern — a user-defined
edge like `X -> Y` where X has no incoming path (X is an intentional
parallel starting node, run alongside the primary) was being dropped
because neither X nor Y was reachable from the primary.

Reconcile the tension with pass-4 ("prune accidental orphans when an
intermediate is skipped") by using pre-filter reachability as the
signal:

- An agent that WAS reachable from the primary via the original
  (pre-filter) edges but loses that path when `filterOrphanedEdges`
  runs is an accidental orphan (a skipped hop broke the chain) — prune.
- An agent that was NEVER reachable from the primary, even pre-filter,
  is an intentional parallel start — seed it into post-filter
  reachability so its component survives.

Surviving-edge endpoint references still keep an agent (co-sources in
multi-source edges). New test `preserves user-defined parallel-start
branches disconnected from the primary` covers the pass-11 scenario;
the existing `A->B->C->D, B skipped` regression test continues to
pass because C/D were pre-filter reachable through B and lose that
reachability after filtering.

* 🎯 fix: tighten parallel-start seed criterion to 'no pre-filter incoming edge'

Addresses codex pass-12 P1 on danny-avila#12740. The pass-11 seed heuristic — 'agent
is in `agentConfigs` but was not pre-filter reachable from the primary' —
was too permissive. A downstream agent like Y in `X -> Y` where X gets
skipped (missing / no VIEW) was never pre-filter reachable from the
primary either, so the old rule promoted Y to a parallel start node and
discovery returned `agents: [primary, Y]` with no connecting edge. The
SDK then ran Y as an unintended parallel root — exactly the orphan
behavior pass-4 wanted to prevent.

Tighter criterion: seed a post-filter reachability root only when the
agent had NO incoming edge in the pre-filter graph. That matches
`MultiAgentGraph.analyzeGraph`'s "no-incoming-edge" definition of a
start node applied to the user's original declared topology, so:

- `A -> B` plus a user-defined `X -> Y` parallel branch: X has no
  incoming pre-filter → seeded → X and Y both survive.
- `A -> B` plus `X -> Y` with X skipped: Y had an incoming pre-filter
  (`X -> Y`) → NOT seeded → Y is pruned as the orphan it is.
- `A -> B -> C` with B skipped: C had an incoming pre-filter (`B -> C`)
  → NOT seeded → C is pruned.

New test `does not promote a downstream orphan to a parallel start when
its only upstream is skipped` locks in the pass-12 scenario. The pass-11
`preserves user-defined parallel-start branches` test continues to hold.

* 📁 fix: don't enforce AGENT-only file ACL on REMOTE_AGENT API callers

Addresses codex pass-13 P1 on danny-avila#12740. When I refactored the API
controllers' DB-method bundle, I inadvertently started forwarding
`filterFilesByAgentAccess` into `initializeAgent`. That helper calls
`checkPermission` with `resourceType: ResourceType.AGENT`, but these
routes authorize callers through `REMOTE_AGENT` (via
`getRemoteAgentPermissions`). A user granted `REMOTE_AGENT_VIEWER` on
a shared agent but lacking direct `AGENT_VIEW` could invoke the agent
yet all its owner-attached context files would get silently filtered
out — breaking `file_search`/context retrieval for remote consumers.

Drop `filterFilesByAgentAccess` from the OpenAI-compat and Responses
controllers' `dbMethods` (and remove the now-unused import). The chat
UI's `initialize.js` keeps it since that path legitimately authorizes
at the AGENT level. No functional change inside the helper — passing
`undefined` simply tells `primeResources` to skip the per-file ACL
filter, restoring the pre-refactor API behavior.

* 🪓 fix: strip unreachable co-sources from surviving multi-source edges

Addresses codex pass-14 P1 on danny-avila#12740. The earlier pass-8 fix kept any
agent referenced as an endpoint of a surviving edge (via a
`referencedByEdge` fallback) to avoid the SDK's `validateEdgeAgents`
failing on missing nodes. But that fallback propped up unreachable
co-sources too: with `[A -> C, X -> B, [B,C] -> D]` and X skipped,
`X -> B` gets filtered, the `[B,C] -> D` fan-in survives because C is
reachable, and B stays in `agentConfigs` solely because the fan-in
still lists it. `MultiAgentGraph.analyzeGraph` then sees B with no
incoming edge and runs it as an unintended parallel root.

Sanitize surviving edges instead: for a kept edge whose `from` is an
array, filter out any co-source that isn't reachable. The SDK's
per-source `addEdge` fires independently, so dropping an unreachable
co-source doesn't invalidate the remaining routes — in the scenario
above `[B,C] -> D` becomes `[C] -> D`, every endpoint of every
surviving edge is now reachable, and the agent prune collapses to a
strict `reachable.has(agentId)` check. No more referenced-by-edge
fallback.

Regression test added: `strips unreachable co-sources from surviving
multi-source edges (no stray parallel root)` — asserts B is absent
from every surviving edge endpoint and the fan-in's `from` is just
`['C']`. All 22 prior discovery tests still pass unchanged.
krgokul pushed a commit to syedhabib39/LibreChat that referenced this pull request Apr 21, 2026
* 🤝 fix: load handoff sub-agents on OpenAI-compat endpoints (danny-avila#12726)

Extracts the BFS discovery + ACL-gated initialization of handoff sub-agents
into a shared `discoverConnectedAgents` helper in `@librechat/api` and
wires it into the OpenAI-compatible `/v1/chat/completions` and Open
Responses `/v1/responses` controllers. These endpoints previously only
passed the primary agent config to `createRun` while keeping
`primaryConfig.edges` intact, which forced `MultiAgentGraph` into
multi-agent mode without loading the referenced sub-agents and caused
StateGraph to throw "Found edge ending at unknown node <id>".

The discovery helper also filters orphaned edges (deleted sub-agents or
those the caller lacks VIEW permission on), so API users see the same
graceful fallback the chat UI already had.

* 🧪 fix: use ServerRequest in discovery spec helpers

CI `tsc --noEmit -p packages/api/tsconfig.json` caught that the test
helpers typed `req` as `express.Request`, which is not assignable to
`DiscoverConnectedAgentsParams.req` (typed as `ServerRequest` whose
`user` is `IUser`). Local jest passed because ts-jest is transpile-only,
but the CI typecheck uses the full compiler.

* 🪲 fix: drop orphan edges on both endpoints, not just `to`

Addresses the P1 codex finding on danny-avila#12740: `filterOrphanedEdges`
previously only removed edges whose `to` referenced a skipped agent.
Edges whose `from` was a skipped agent — the symmetric case in a
bidirectional graph like `A <-> B` where `B` is deleted or the user
lacks VIEW on it — leaked through to `createRun` and re-triggered
`Found edge ending at unknown node <id>` at StateGraph compile time.

The filter now drops an edge if either endpoint references a skipped
id, and the existing `to`-only test cases were updated to reflect the
stricter behavior. Adds a bidirectional-graph regression test in
`discovery.spec.ts`.

* 🔒 fix: enforce REMOTE_AGENT ACL on handoff sub-agents for API routes

Addresses the second P1 codex finding on danny-avila#12740: the OpenAI-compat
`/v1/chat/completions` and Open Responses `/v1/responses` routes gate
the primary agent on `REMOTE_AGENT` (via `createCheckRemoteAgentAccess`),
but `discoverConnectedAgents` was checking handoff sub-agents against
the looser in-app `AGENT` resource type. That allowed a remote caller
who could reach the orchestrator but had only in-app visibility on a
sub-agent to invoke it via the API — bypassing the remote-sharing
boundary.

Adds an optional `resourceType` param to `discoverConnectedAgents`
(defaulting to `AGENT` for the chat UI path) and passes
`ResourceType.REMOTE_AGENT` from both API controllers so every
discovered sub-agent clears the same sharing boundary enforced at
route entry.

* 🧯 fix: enforce allowedProviders for discovered sub-agents

Addresses the third P1 codex finding on danny-avila#12740: `discoverConnectedAgents`
forwarded the caller's `endpointOption` verbatim into `initializeAgent`,
but on the OpenAI-compat routes that option's `endpoint` is the primary
agent's provider (e.g. `openai`), not `agents`. `initializeAgent` only
enforces `allowedProviders` when `isAgentsEndpoint(endpointOption.endpoint)`
is true, so handoff sub-agents silently bypassed the provider allowlist
configured under `endpoints.agents.allowedProviders`.

Override `endpointOption.endpoint` to `EModelEndpoint.agents` for every
per-sub-agent init call. The primary agent still uses the caller's
endpointOption as before — this only affects the BFS-loaded handoff
targets. Regression test asserts the override.

* ✂️ fix: prune unreachable sub-agents after orphan-edge filtering

Addresses the fourth P1 codex finding on danny-avila#12740: BFS eagerly initializes
every sub-agent referenced in the primary's edge scan, but once
`filterOrphanedEdges` drops edges whose endpoints were skipped, some of
those sub-agents end up disconnected from the primary. In an `A -> B ->
C` graph (edges stored directly on A) where B is skipped (missing or
no VIEW), both edges are filtered, but C was already loaded and would
still be passed to `createRun` — which flips into multi-agent mode on
`agents.length > 1` and turns C into an unintended parallel start node.

After filtering edges, compute the set of agent ids reachable from the
primary through the surviving edge set and prune `agentConfigs` to that
set. Two regression tests added: one for the pruning case, one that
confirms agents connected via surviving edges are still kept.

* 🔁 fix: don't seed initialize.js agentConfigs from the pre-pruning callback

Addresses the fifth P1 codex finding on danny-avila#12740: `onAgentInitialized`
fires during BFS, BEFORE the helper prunes agents that become
disconnected once `filterOrphanedEdges` runs. Writing the sub-agent
straight into the outer `agentConfigs` there and then only additively
merging the pruned `discoveredConfigs` left stranded entries in the
outer map, and `AgentClient` would still hand them to `createRun` as
extra parallel start nodes (the exact failure mode the pass-4 prune
was meant to eliminate for the API controllers).

Drop the `agentConfigs.set` from the callback and replace the additive
merge with a direct copy from `discoveredConfigs`, which is now the
single authoritative source of what the run should see. The
per-agent tool context map is still populated during BFS — stale
entries there are harmless because they're only read by closure inside
`ON_TOOL_EXECUTE` and are unreachable once the agent is not in
`agentConfigs`.

* 🔬 fix: address audit findings on discovery helper

Resolves findings from a comprehensive external audit of danny-avila#12740.

**Finding 1 (CRITICAL) — stale edges survive the reachability prune.**
The pass-4 prune removed unreachable agents from `agentConfigs` but left
matching edges in the return value. In an `A -> B -> C -> D` graph (all
edges stored on A) where B is skipped, `filterOrphanedEdges` drops A->B
and B->C but keeps C->D (neither endpoint is skipped). The caller then
sees `agentConfigs` without C/D but `edges` still references them,
flipping `createRun` into multi-agent mode with mismatched agents/edges
— the exact crash this PR is supposed to fix. Now filter the edge list
to the reachable set in the same pass, so the returned shape is
self-consistent: every edge endpoint is either the primary id or a key
of `agentConfigs`. New regression test covers A->B->C->D with B skipped.

**Finding 2 (MAJOR) — unconditional `getModelsConfig` on every API
request.** The OpenAI-compat and Responses controllers called
`getModelsConfig(req)` and `discoverConnectedAgents` even when the
primary agent had no edges (the common single-agent API case). Gate
both behind `primaryConfig.edges?.length > 0` so single-agent runs
don't pay that cost.

**Finding 5 (MINOR) — silent mutation of caller's
`primaryConfig.userMCPAuthMap`.** The helper aliased that object and
then `Object.assign`'d sub-agent entries into it, changing the caller's
config in-place. Shallow-clone up front so the returned merged map is
the only destination.

**Finding 7 (NIT) — dead `?? []` coalescing.**
`filterOrphanedEdges` always returns a concrete array, so the
`discoveredEdges ?? []` fallback was never reached. Simplified the
`primaryConfig.edges = …` assignment.

Also adds a test that verifies `primaryConfig.userMCPAuthMap` is not
mutated in-place.

* 🧹 chore: address audit NITs on discovery helper

Addresses two NIT findings from the post-fix audit:

**F1** — the shallow clone on `primaryConfig.userMCPAuthMap` was only
applied on the primary side; the `else` branch (hit when the primary
had no MCP auth and the first sub-agent seeds the map) assigned the
sub-agent's `config.userMCPAuthMap` directly, so a later sub-agent's
`Object.assign` mutated the first one's map in place. Harmless in
practice (per-request ephemeral objects) but asymmetric. Clone in the
else branch too. Test added.

**F2** — `initialize.js` had a defensive `if (agentConfigs.size > 0 &&
!edges) edges = []` normalizer. Pre-existing dead code: the helper now
always returns a concrete array from `filteredEdges.filter(...)`.
Removed for clarity.

* 🕸 fix: require all sources reachable when traversing fan-in edges

Addresses the seventh P1 codex finding on danny-avila#12740: the reachability BFS
advanced through an edge as soon as any of its `from` endpoints matched
the current frontier node (`sources.includes(current)`), but the
subsequent edge filter required ALL sources to be reachable (`every`).
The two-semantics mismatch let a fan-in edge like `{from: ['A','B'],
to: 'C'}` mark C reachable purely via A even when B had no path from
the primary, then drop the edge itself at filter time. Result: C
survived in `agentConfigs` with no surviving edge connecting it to A,
so `createRun` flipped into multi-agent mode on `agents.length > 1`
and C ran as an unintended parallel root.

Replace the BFS with a fixed-point iteration keyed on the same
all-sources-reachable predicate used by the filter, so traversal and
filtering stay aligned and multi-source edges only fire once every
source is in the reachable set.

Two regression tests added:
- `{from: ['A','B'], to: 'C'}` with B having no incoming path — asserts
  neither B nor C leak into the result.
- `A -> B`, `A -> C`, `['B','C'] -> D` — asserts the fan-in edge fires
  and D becomes reachable once both B and C are.

* 🔀 fix: match SDK OR semantics for multi-source edge reachability

Reverts the all-sources-required reachability gate from 4982f1c and
replaces it with an any-source-reachable model, which matches how
`@librechat/agents`'s `MultiAgentGraph.createWorkflow` actually wires
multi-source edges at runtime (per-source `builder.addEdge(source,
destination)`). With the previous `every` gate, a legitimate handoff
edge `{ from: ['A', 'B'], to: 'C' }` where B had no incoming path was
pruned along with C, regressing OR-semantics routing that the SDK
would otherwise handle correctly.

New behavior:

1. Reachability: an edge advances when ANY of its `from` endpoints is
   already reachable. Fixed-point iteration over `filteredEdges`.
2. Edge filter: keep an edge when it has at least one reachable source
   AND all destinations are reachable (a missing destination would
   still crash `StateGraph.compile` with `Found edge ending at unknown
   node`).
3. Agent prune: keep agents that are reachable OR referenced on any
   endpoint of a surviving edge. The second clause preserves co-sources
   in multi-source edges (B in `{ from: ['A','B'], to: 'C' }` when
   nothing else reaches B) so the SDK's per-source `addEdge` — and the
   `validateEdgeAgents` safety-net I added to the SDK in danny-avila#111 — still
   finds B as a node.

The pass-audit A->B->C->D regression test continues to pass: with B
skipped, `filterOrphanedEdges` drops both B-adjacent edges, reachability
never expands past A, C->D has no reachable source so it gets filtered,
and C/D are pruned because they're neither reachable nor referenced.

* ✂️ fix: strip skipped co-members from multi-source/multi-dest edges

Addresses codex pass-9 P2 on danny-avila#12740. `filterOrphanedEdges` previously
dropped an edge whenever any `from` id was skipped, which was correct
for scalar edges but over-aggressive for multi-source ones: the agents
SDK adds one `builder.addEdge(source, destination)` per source, so
`{ from: ['A','B'], to: 'C' }` with B skipped still has a valid
`A -> C` route that was being thrown away.

Now sanitize each endpoint:
- Scalar skipped → drop the whole edge (no route survives).
- Array with some skipped → strip the skipped ids, keep the edge with
  the surviving members. If the array empties out, drop the edge.

Symmetric handling for `to` covers multi-destination fan-out when one
co-destination is skipped. Tests updated/added:
- `strips skipped co-sources from multi-source edges…`
- `strips skipped co-destinations from multi-destination edges`
- `drops multi-member edges only when every member on a side is skipped`
- Discovery-side: `preserves valid routes when one co-source of a
  multi-source edge is skipped` asserts the end-to-end behavior —
  skipped co-source B gets stripped from the edge, A->C routing
  survives, and C remains in `agentConfigs`.

* 🔓 fix: respect SHARE-on-AGENT fallback for handoff ACL on API routes

Addresses codex pass-10 P1 on danny-avila#12740. The API controllers were handing
`discoverConnectedAgents` a raw `PermissionService.checkPermission` call
against `ResourceType.REMOTE_AGENT`, but the route-level middleware
(`createCheckRemoteAgentAccess`) authorizes the primary agent via
`getRemoteAgentPermissions`, which first consults the AGENT ACL and
treats owners with the SHARE bit as remotely authorized even without
an explicit REMOTE_AGENT grant. The mismatch meant a user could open
the primary via `/v1/chat/completions` or `/v1/responses`, but their
own owned handoff sub-agents were silently skipped — breaking
multi-agent handoffs for the common "owner runs their own multi-agent
orchestrator" case.

Both controllers now pass `discoverConnectedAgents` a `checkPermission`
wrapper that delegates to `getRemoteAgentPermissions` (with
`getEffectivePermissions` injected from `PermissionService`) and
compares the returned bitmask against the required permission via
`hasPermissions`. Sub-agents are now authorized by the exact same
rules the route middleware applies to the primary.

* 🌱 fix: preserve user-defined parallel-start branches

Addresses codex pass-11 P2 on danny-avila#12740. The post-filter reachability
prune seeded only from `primaryConfig.id`, which killed
`MultiAgentGraph`'s legitimate multi-start pattern — a user-defined
edge like `X -> Y` where X has no incoming path (X is an intentional
parallel starting node, run alongside the primary) was being dropped
because neither X nor Y was reachable from the primary.

Reconcile the tension with pass-4 ("prune accidental orphans when an
intermediate is skipped") by using pre-filter reachability as the
signal:

- An agent that WAS reachable from the primary via the original
  (pre-filter) edges but loses that path when `filterOrphanedEdges`
  runs is an accidental orphan (a skipped hop broke the chain) — prune.
- An agent that was NEVER reachable from the primary, even pre-filter,
  is an intentional parallel start — seed it into post-filter
  reachability so its component survives.

Surviving-edge endpoint references still keep an agent (co-sources in
multi-source edges). New test `preserves user-defined parallel-start
branches disconnected from the primary` covers the pass-11 scenario;
the existing `A->B->C->D, B skipped` regression test continues to
pass because C/D were pre-filter reachable through B and lose that
reachability after filtering.

* 🎯 fix: tighten parallel-start seed criterion to 'no pre-filter incoming edge'

Addresses codex pass-12 P1 on danny-avila#12740. The pass-11 seed heuristic — 'agent
is in `agentConfigs` but was not pre-filter reachable from the primary' —
was too permissive. A downstream agent like Y in `X -> Y` where X gets
skipped (missing / no VIEW) was never pre-filter reachable from the
primary either, so the old rule promoted Y to a parallel start node and
discovery returned `agents: [primary, Y]` with no connecting edge. The
SDK then ran Y as an unintended parallel root — exactly the orphan
behavior pass-4 wanted to prevent.

Tighter criterion: seed a post-filter reachability root only when the
agent had NO incoming edge in the pre-filter graph. That matches
`MultiAgentGraph.analyzeGraph`'s "no-incoming-edge" definition of a
start node applied to the user's original declared topology, so:

- `A -> B` plus a user-defined `X -> Y` parallel branch: X has no
  incoming pre-filter → seeded → X and Y both survive.
- `A -> B` plus `X -> Y` with X skipped: Y had an incoming pre-filter
  (`X -> Y`) → NOT seeded → Y is pruned as the orphan it is.
- `A -> B -> C` with B skipped: C had an incoming pre-filter (`B -> C`)
  → NOT seeded → C is pruned.

New test `does not promote a downstream orphan to a parallel start when
its only upstream is skipped` locks in the pass-12 scenario. The pass-11
`preserves user-defined parallel-start branches` test continues to hold.

* 📁 fix: don't enforce AGENT-only file ACL on REMOTE_AGENT API callers

Addresses codex pass-13 P1 on danny-avila#12740. When I refactored the API
controllers' DB-method bundle, I inadvertently started forwarding
`filterFilesByAgentAccess` into `initializeAgent`. That helper calls
`checkPermission` with `resourceType: ResourceType.AGENT`, but these
routes authorize callers through `REMOTE_AGENT` (via
`getRemoteAgentPermissions`). A user granted `REMOTE_AGENT_VIEWER` on
a shared agent but lacking direct `AGENT_VIEW` could invoke the agent
yet all its owner-attached context files would get silently filtered
out — breaking `file_search`/context retrieval for remote consumers.

Drop `filterFilesByAgentAccess` from the OpenAI-compat and Responses
controllers' `dbMethods` (and remove the now-unused import). The chat
UI's `initialize.js` keeps it since that path legitimately authorizes
at the AGENT level. No functional change inside the helper — passing
`undefined` simply tells `primeResources` to skip the per-file ACL
filter, restoring the pre-refactor API behavior.

* 🪓 fix: strip unreachable co-sources from surviving multi-source edges

Addresses codex pass-14 P1 on danny-avila#12740. The earlier pass-8 fix kept any
agent referenced as an endpoint of a surviving edge (via a
`referencedByEdge` fallback) to avoid the SDK's `validateEdgeAgents`
failing on missing nodes. But that fallback propped up unreachable
co-sources too: with `[A -> C, X -> B, [B,C] -> D]` and X skipped,
`X -> B` gets filtered, the `[B,C] -> D` fan-in survives because C is
reachable, and B stays in `agentConfigs` solely because the fan-in
still lists it. `MultiAgentGraph.analyzeGraph` then sees B with no
incoming edge and runs it as an unintended parallel root.

Sanitize surviving edges instead: for a kept edge whose `from` is an
array, filter out any co-source that isn't reachable. The SDK's
per-source `addEdge` fires independently, so dropping an unreachable
co-source doesn't invalidate the remaining routes — in the scenario
above `[B,C] -> D` becomes `[C] -> D`, every endpoint of every
surviving edge is now reachable, and the agent prune collapses to a
strict `reachable.has(agentId)` check. No more referenced-by-edge
fallback.

Regression test added: `strips unreachable co-sources from surviving
multi-source edges (no stray parallel root)` — asserts B is absent
from every surviving edge endpoint and the fan-in's `from` is just
`['C']`. All 22 prior discovery tests still pass unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Agents API (Chat Completions + Responses) fails for multi-agent handoffs

2 participants