fix: Cross-provider tool-call ID + thinking-block compatibility#140
fix: Cross-provider tool-call ID + thinking-block compatibility#140danny-avila wants to merge 15 commits intodevfrom
Conversation
Cross-provider conversations carrying OpenAI Responses-style tool-call IDs (e.g. `fc_...|call_...`) hit a 400 from Anthropic on replay because the IDs contain `|` and can exceed 64 chars, violating Anthropic's `^[a-zA-Z0-9_-]+$` and length constraints. Adds `normalizeAnthropicToolCallId` and applies it at the four sites that emit IDs to the wire — `_convertLangChainToolCallToAnthropic` and the three `tool_result.tool_use_id` constructions in `_ensureMessageContents`. The function is pure and deterministic so paired tool_use and tool_result IDs stay matched without a session map. Server tool IDs (`srvtoolu_` prefix) are left untouched.
When a conversation crosses from Anthropic to OpenAI, prior `thinking`
and `redacted_thinking` content blocks were forwarded verbatim, which
OpenAI rejects with a 400 ("Invalid value: 'thinking'") and which loses
the reasoning narrative even when the request happens to succeed.
`_convertMessagesToOpenAIParams` now:
- flattens non-empty `thinking` blocks to `<thinking>...</thinking>`
text so the reasoning narrative survives as in-band context for the
next model
- drops empty `thinking` blocks (some providers reject empty content)
- drops `redacted_thinking` entirely (encrypted and useless to
non-Anthropic models)
- continues to pass thinking blocks through verbatim for Claude served
via OpenAI-shaped surfaces (e.g. OpenRouter), detected by model name
Also adds `src/scripts/poc-live-x-provider.ts` — a manual reproducer
that exercises the fix paths against real APIs and confirmed the
Anthropic ID normalization end-to-end.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 02da1c7989
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return id | ||
| .replace(/[^a-zA-Z0-9_-]/g, '_') | ||
| .slice(0, ANTHROPIC_TOOL_USE_ID_MAX_LENGTH); |
There was a problem hiding this comment.
Preserve uniqueness when normalizing truncated tool-call IDs
The normalization strategy can collapse distinct tool-call IDs into the same 64-char value because it only replaces invalid characters and truncates (slice(0, 64)) without adding any disambiguator. Anthropic treats tool_use.id as a unique identifier, so two long IDs with a shared prefix can become duplicates and either be rejected (tool_use ids must be unique) or mis-associated with tool_result blocks in multi-tool turns. Consider appending a short deterministic hash suffix before truncation to preserve uniqueness.
Useful? React with 👍 / 👎.
Codex review caught a real collision risk in `normalizeAnthropicToolCallId`:
two long Responses-style IDs that share a 64-char prefix (or two short IDs
whose only difference is an invalid char like `|` vs `.`) would sanitize +
truncate to the same 64-char value, producing duplicate `tool_use.id`
entries that Anthropic would reject ("tool_use ids must be unique") or
mis-associate with `tool_result` blocks in multi-tool turns.
Replace the plain truncation with `<sanitized-prefix>_<10-hex sha256>`,
hashing the original (pre-sanitization) input so two IDs that differ only
past the cutoff still map to distinct outputs. Output is bounded to 64
chars and the hash domain (40 bits) is more than sufficient within a
single conversation. Behavior for already-compliant IDs is unchanged.
|
@codex review |
Adds TEST 4 to the live PoC — two 96-char tool-call IDs sharing the first 80 chars, which would collapse to identical 64-char values under the previous truncate-only normalization. Confirms Anthropic accepts both blocks after the hash-suffix fix and the model correctly uses both tool results in its response.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d9d4898f5c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| }) | ||
| .filter(<T>(m: T | null): m is T => m !== null); |
There was a problem hiding this comment.
Avoid emitting empty assistant content arrays
Filtering out thinking/redacted_thinking blocks can leave content as [] for assistant messages that have no tool_calls (for example, a message containing only redacted_thinking or only empty thinking). This value is then forwarded unchanged, but Chat Completions requires assistant content arrays to contain at least one part (text or a single refusal), so this can still trigger a 400 on the OpenAI path. You already guard this case in the tool-calls branch by converting empty arrays to ''; the same normalization is needed for non-tool-call assistant messages.
Useful? React with 👍 / 👎.
Replaces gpt-4o-mini in the new cross-provider tests and live PoC. Pre-existing deepseek-v4-pro and anthropic/claude-sonnet-4-5 references are unchanged — those exercise different code paths (DeepSeek reasoning_content, Claude-via-OpenRouter detection).
Codex review caught a second case: an assistant message with no tool_calls but only thinking/redacted_thinking blocks would, after filtering for non-Claude targets, leave content as `[]`. Chat Completions requires assistant content arrays to contain at least one part, so this still 400s. Pre-normalize the filtered content: when thinking-block filtering empties the array, fall back to '' before assigning to the request payload. This subsumes the prior tool-calls-only guard, which is now removed as redundant.
|
@codex review |
|
Codex Review: Didn't find any major issues. 🚀 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Comprehensive review caught real gaps and cleanups: - **Responses API parity (MAJOR)**: `_convertMessagesToOpenAIResponsesParams` silently dropped `thinking` and `redacted_thinking` blocks via its `return []` fallthrough. Now flattens to `output_text` for non-Claude targets and drops empty/redacted variants — same rationale as Chat Completions, applied to the Responses API content shape. - **Dedupe `isClaudeModel`**: extract a single `isClaudeModel(model?)` helper and reuse it across all three call sites in the OpenAI converter (was triplicated). Hoist the per-call check above `messages.flatMap` since `model` is constant. - **Tighter test coverage**: add cases for `model=undefined` defaulting to flatten/drop, interleaved thinking + text + redacted_thinking ordering, the Responses API path (two cases), and the empty-string tool-call ID edge. - **Typed error shape in PoC**: replace `Record<string, unknown>` cast with an explicit `SdkLikeError` shape per AGENTS.md guidance.
The previous fix lived inside `_convertMessagesToOpenAIParams`, but live verification revealed that the wrapper's `_generate` and `_streamResponseChunks` short-circuit to LangChain's parent class when neither `includeReasoningContent` nor `includeReasoningDetails` is set — which is the typical case. So Anthropic-shaped thinking blocks never reached our converter on the default path and OpenAI still 400ed. Pre-process messages with `flattenAnthropicThinkingForOpenAI` at the entry of both `_generate` and `_streamResponseChunks`, before the short-circuit. That way the rewrite happens regardless of which downstream converter ultimately serializes the request, and the fix applies to every OpenAI invocation through this wrapper. Verified live: TEST 2 (non-empty thinking → OpenAI) and TEST 3 (empty thinking → OpenAI) in `poc-live-x-provider.ts` now succeed where they previously hit `Invalid value: 'thinking'`.
Address follow-up audit findings on the cross-provider thinking-block fix: **F1 (MAJOR) — Wrapper coverage**: The previous commit wired the pre-processor into `LibreChatOpenAICompletions` only, leaving Anthropic → Azure OpenAI / Responses API / DeepSeek / xAI flows still forwarding raw thinking blocks. Move the pre-processor up to the outer wrapper classes (`ChatOpenAI`, `AzureChatOpenAI`, `ChatDeepSeek`, `ChatXAI`) by overriding `_generate` and `_streamResponseChunks`. `ChatMoonshot` extends `ChatOpenAI`, so it inherits the coverage. The inner `LibreChatOpenAICompletions` override is reverted as redundant. **F2 — Responses empty-content fallback**: The Responses API path now mirrors the Chat Completions guard — if filtering reduces assistant content to `[]`, fall back to `''` so the request still validates. **F3 — Dead-code in flatten helper**: `flattenAnthropicThinkingForOpenAI` previously did the rewrite work for non-AI messages and discarded it, flagging `mutated=true` while returning the unrewritten message. Bail out earlier for non-AI / non-array-content messages. **F4 — Responses path tests**: Added Claude-via-OpenRouter passthrough and empty-content fallback cases. **F5 — Field preservation test**: Expanded to seed all 7 AIMessage fields (`response_metadata`, `name`, `id`, `invalid_tool_calls`, `usage_metadata` plus the original three) and assert each survives the rewrite. Also added a non-AI passthrough test. **F7 — Parameter reassignment**: Eliminated naturally by hoisting the pre-processor to the outer classes; the inner `messages = ...` reassignment in `LibreChatOpenAICompletions` is gone. All 22 unit tests pass and live verification (TEST 1-4 in `poc-live-x-provider.ts`) still confirms end-to-end behavior.
Replace `messages = flattenAnthropicThinkingForOpenAI(...)` with `const normalized = ...` and reference `normalized` in both the streaming early-return and non-streaming conversion. Brings DeepSeek in line with the immutable-parameter pattern used by the other three wrappers.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: de2c0c0c44
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| runManager?: CallbackManagerForLLMRun | ||
| ): Promise<ChatResult> { | ||
| return super._generate( | ||
| flattenAnthropicThinkingForOpenAI(messages, this.model), |
There was a problem hiding this comment.
Skip pre-flattening for reasoning-enabled tool-call turns
Passing flattenAnthropicThinkingForOpenAI(...) into super._generate here causes a regression for models that use _convertMessagesToOpenAIParams with reasoning options (for example ChatMoonshot via includeReasoningContent): once thinking blocks are pre-converted to plain text, _convertMessagesToOpenAIParams no longer sets hasAnthropicThinkingBlock, and later clears assistant content on tool-call messages (completionParam.content = hasAnthropicThinkingBlock ? content : ''), so the <thinking>...</thinking> context is silently dropped whenever an assistant turn includes tool_calls.
Useful? React with 👍 / 👎.
Codex caught a regression introduced by hoisting the pre-processor: when the outer wrapper rewrites `thinking` blocks to `<thinking>...</thinking>` text, `_convertMessagesToOpenAIParams` no longer detects any raw `thinking` block, so its tool-call branch — which previously kept content only when `hasAnthropicThinkingBlock` was true — clears the flattened text. This silently dropped the reasoning narrative on every tool-call turn for models that route through this converter with reasoning options set (e.g. ChatMoonshot via `includeReasoningContent`). Replace the flag-based gate with a content-shape check: preserve non-empty array content, clear string content (OpenAI's standard for tool-call messages). Array content always represents work the upstream deliberately produced — pass-through thinking blocks for Claude targets, or pre-flattened `<thinking>` text from the outer wrapper — and should not be discarded. The now-unused `hasAnthropicThinkingBlock` flag and its setters are removed. Also adds two regression tests: pre-flattened text survives a tool-call turn, and string content is still cleared per OpenAI convention.
|
@codex review |
|
Codex Review: Didn't find any major issues. Keep them coming! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Replaces the inner .map().filter() with .flatMap returning [] for dropped blocks and [result] for kept ones. Eliminates the throwaway intermediate array allocation per AI message and brings the inner walker in line with the cross-provider library (pi-mono) it's loosely modeled on. Behavior is unchanged. Two tests added FIRST to pin behavior before the swap: - "preserves block order across interleaved thinking, text, and redacted_thinking" — locks the ordering contract that flatMap-vs-map could in principle disturb if returns were misordered - "keeps untouched AI messages reference-equal when other AI messages in the array were rewritten" — pins the per-message ref-equality fast path (when blockChanged stays false, the same msg reference passes through), independent of the whole-array fast path that was already covered Both tests pass on the .map().filter() implementation and continue to pass after the .flatMap swap, confirming behavioral equivalence.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3ffe05cf5d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| export function isClaudeModel(model?: string): boolean { | ||
| return ( | ||
| model?.includes('claude') === true || model?.includes('anthropic') === true | ||
| ); |
There was a problem hiding this comment.
Make Claude target detection case-insensitive
isClaudeModel only checks lowercase substrings, so model IDs like Anthropic/Claude-Sonnet-4.5 are misclassified as non-Claude targets. In that case the new rewrite paths (flattenAnthropicThinkingForOpenAI and message conversion) will flatten/drop thinking and redacted_thinking blocks instead of preserving them, which can break Claude tool-call continuation flows that require those blocks to be replayed unchanged. This regression is introduced by the new detection helper and affects any mixed-case model IDs coming from user/provider config.
Useful? React with 👍 / 👎.
Codex caught a P1 bug: \`isClaudeModel\` only matched lowercase \`claude\`/\`anthropic\` substrings, so provider configs that ship IDs like \`Anthropic/Claude-Sonnet-4.5\` or \`CLAUDE-3-5-SONNET\` were misclassified as non-Claude. The new flatten and Responses-conversion paths would drop signed \`thinking\` and \`redacted_thinking\` blocks for those models — exactly the blocks Claude requires replayed unchanged for tool-call continuation. Lower-case the input once before substring matching so any casing of the provider/model identifier is recognized. Test added FIRST: \`it.each\` parametrized over four mixed-case IDs (Anthropic/Claude-Sonnet-4.5, CLAUDE-3-5-SONNET, Claude-Opus-4.1, ANTHROPIC/CLAUDE-HAIKU). All four fail on the case-sensitive implementation and pass after the lowercase fix.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 036fb7779c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return false; | ||
| } | ||
| const normalized = model.toLowerCase(); | ||
| return normalized.includes('claude') || normalized.includes('anthropic'); |
There was a problem hiding this comment.
Detect Claude targets without model-name heuristics
The new Claude check relies only on model containing "claude" or "anthropic", so any OpenAI-compatible deployment that aliases a Claude backend to a custom name (for example, a gateway/deployment ID without those substrings) will be misclassified as non-Claude. In that case the new cross-provider path rewrites/drops thinking/redacted_thinking blocks, which can break Claude extended-thinking replay and cause follow-up tool turns to fail. This should be driven by an explicit provider capability/config flag instead of substring matching.
Useful? React with 👍 / 👎.
Codex follow-up (P2): the model-name heuristic in \`isClaudeModel\` misclassifies aliased Claude backends — gateways/deployments that expose Claude under names without \`claude\`/\`anthropic\` substrings. Those would have signed thinking and redacted_thinking blocks rewritten or dropped, breaking Claude tool-call continuation. Add an explicit \`claudeBackend?: boolean\` override that wins over the substring heuristic in both directions: - \`true\`: force pass-through on any model name (e.g. corporate alias \`internal-llm-v3\` that routes to Claude) - \`false\`: force flatten/drop even on a Claude-named model (e.g. \`claude-3-passthrough-shim\` that's actually a non-Claude passthrough) - undefined: fall back to the existing case-insensitive substring match — no behavior change for anyone not opting in Plumbed through: - New \`ClaudeBackendOptions\` interface; \`ConvertMessagesOptions\` extends it - \`flattenAnthropicThinkingForOpenAI\` and \`_convertMessagesToOpenAIResponsesParams\` accept an options arg - \`LibreChatOpenAIFields\` and \`LibreChatAzureOpenAIFields\` carry the field; stored on outer wrappers (\`ChatOpenAI\`, \`AzureChatOpenAI\`) and the inner \`LibreChatOpenAICompletions\`; threaded into the helper and converter calls DeepSeek and xAI wrappers are intentionally not extended — they're bound to providers that don't realistically route Claude. Follow-up if needed. Tests added FIRST (TDD): \`claudeBackend=true\` preserves thinking blocks on a non-Claude-named model; \`claudeBackend=false\` flattens on a Claude-named model. Both fail before the helper accepts the override, both pass after.
Previous commit scoped the override to ChatOpenAI/AzureChatOpenAI on the rationale that the other wrappers are 'tightly bound' to their providers. That's only true at the class name level — these wrappers are thin transports over the OpenAI wire shape and any of them can be pointed at any backend (Claude included) via baseURL. Restricting the escape hatch to a subset is asymmetric and leaves real users without recourse. Add `claudeBackend?: boolean` to ChatDeepSeek and ChatXAI constructor types, store on the instance, thread through both `flattenAnthropicThinkingForOpenAI` calls and (for DeepSeek) the `_convertDeepSeekMessages` -> `_convertMessagesToOpenAIParams` call. Behavior unchanged when the field is omitted. The helper-level test contract (added in the previous commit) already covers the override semantics for any caller; no new tests needed since these wrappers are pure pass-throughs of the same field.
|
@codex review |
|
Codex Review: Didn't find any major issues. More of your lovely PRs please. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Summary
Two narrowly-scoped fixes for cross-provider 400 errors observed when conversation history crosses provider boundaries. Both bugs were confirmed live against real APIs before fixing — see
src/scripts/poc-live-x-provider.tsfor the reproducer.Fix 1 — Tool-call IDs survive an OpenAI Responses → Anthropic handoff
OpenAI Responses-style tool-call IDs (e.g.
fc_...|call_...) routinely contain|and exceed 64 chars. Anthropic enforces^[a-zA-Z0-9_-]+$and a 64-char cap and rejects with 400 on replay.Added
normalizeAnthropicToolCallId(pure, deterministic) and applied at the four wire-bound sites insrc/llm/anthropic/utils/message_inputs.ts:_convertLangChainToolCallToAnthropic(assistant tool_use.id)tool_result.tool_use_idconstructions in_ensureMessageContentsServer tool IDs (
srvtoolu_prefix) are left untouched. Anthropic-native IDs that already comply pass through unchanged. Determinism means pairedtool_use.idandtool_result.tool_use_idstay matched without any session-scoped Map.Fix 2 — Anthropic thinking blocks survive a → OpenAI handoff
thinkingandredacted_thinkingcontent blocks were being forwarded verbatim to OpenAI Chat Completions, which 400s with Invalid value: 'thinking'. Even when accepted (server may strip), the reasoning narrative is invisible to the next model._convertMessagesToOpenAIParamsnow:thinkingblocks to<thinking>...</thinking>text so the reasoning narrative survives as in-band contextthinkingblocks (empty content is rejected by some providers)redacted_thinkingentirely (encrypted, useless to non-Anthropic models)Test plan
Unit tests added — all passing:
src/llm/anthropic/utils/tool-id-normalization.test.ts— 9 tests covering helper + round-trip through_convertMessagesToAnthropicPayloadsrc/llm/openai/utils/messages.test.ts— 5 new test cases under "cross-provider thinking block handling"src/llm/anthropic/llm.spec.ts"streaming mode with a signal" reproduces on baseline9b2081b— unrelated)Live verification:
bun ./src/scripts/poc-live-x-provider.ts— Anthropic accepted the normalized payload and returned a valid response