Skip to content

fix(agent): render OpenAI tool-call arguments as a mapping for chat templates#2063

Open
EazyReal wants to merge 1 commit into
THUDM:mainfrom
EazyReal:upstream-pr/agent-toolcall-args-mapping
Open

fix(agent): render OpenAI tool-call arguments as a mapping for chat templates#2063
EazyReal wants to merge 1 commit into
THUDM:mainfrom
EazyReal:upstream-pr/agent-toolcall-args-mapping

Conversation

@EazyReal

@EazyReal EazyReal commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Problem

In the OpenAI HTTP adapter (slime/agent/adapters/openai.py), _normalize_tool_call builds the assistant message that _translate_chat_messages collects into chain.chat_messages, which render_token_ids then feeds to tokenizer.apply_chat_template. That message is the render boundary. It serialized tool_calls[].function.arguments to a JSON string via json_arguments. Qwen-family chat templates iterate tool_call.arguments.items(), which expects a mapping — given a string they either iterate it character-by-character or raise a template error, so the rendered prompt diverges from the tokens the policy actually generated. In a token-capturing rollout that is a rollout/train token desync: the policy is trained on tokens it never sampled.

Before vs After

Same input — an assistant turn whose tool call carries arguments as the JSON string echoed on the OpenAI wire:

openai._normalize_tool_call(
    {"function": {"name": "lookup", "arguments": '{"q": "slime"}'}}
)

Beforearguments stays a JSON string, so the rendered assistant message is:

{"type": "function",
 "function": {"name": "lookup", "arguments": '{"q": "slime"}'}}   # str

The Qwen template's arguments.items() then runs on a string → AttributeError: 'str' object has no attribute 'items' (or, on templates that tolerate it, per-character iteration that emits a garbled tool block). Either way the rendered prompt no longer matches the sampled tokens.

After — the wire string is decoded back to the mapping the template expects:

{"type": "function",
 "function": {"name": "lookup", "arguments": {"q": "slime"}}}     # dict

arguments.items() now yields [("q", "slime")] and the tool call renders correctly.

Hostile inputs that are valid JSON but not a mapping are wrapped so .items() can never crash:

openai._normalize_tool_call(
    {"function": {"name": "lookup", "arguments": "[1, 2]"}}
)
# arguments -> {"_raw_arguments": [1, 2]}

Falsy non-dict values are real argument payloads, not "no arguments", and are preserved the same way — only None (or an empty wire string) maps to {}:

dict_arguments(0)     # -> {"_raw_arguments": 0}
dict_arguments([])    # -> {"_raw_arguments": []}
dict_arguments(None)  # -> {}

Fix

Add a small pure helper dict_arguments(value) -> dict in slime/agent/adapters/common.py that decodes echoed wire strings to a mapping at the render boundary:

  • dict passes through unchanged;
  • str is json.loads-decoded (an empty wire string → {}: the OpenAI wire encodes "no arguments" as an empty / "{}" string);
  • None{} (no arguments); every other non-dict outcome — including falsy values like 0, False, and [] — funnels through a {"_raw_arguments": ...} sentinel (already a convention in slime/agent/parsing.py), so the -> dict contract holds for all inputs, .items() can never crash, and no argument payload is ever silently dropped.

Switch the single render-boundary call site in _normalize_tool_call from json_arguments to dict_arguments.

Why this is the right fix

  • Default-path safe. The dict-passthrough and empty/None{} branches mean callers already passing a mapping (or no arguments) get bit-identical output; only echoed JSON-string arguments change, and they now match what the chat template expects.
  • Lossless. Truthiness is the wrong gate for "no arguments": 0, False, and [] are payloads the model emitted. Gating the sentinel on is None instead of truthiness means the normalization never rewrites a real payload to an empty call.
  • Outbound wire contract preserved. _openai_tool_calls (the response path) is untouched and still emits function.arguments as a JSON string, exactly as the OpenAI spec requires. Only the internal render boundary changed.
  • No new abstraction. It is one pure function reusing the existing {"_raw_arguments": ...} sentinel from parsing.py; the -> dict return type guarantees .items() is always safe.
  • CI-verifiable. tests/test_agent_adapters.py::test_openai_render_tool_call_arguments_are_dicts (marked unit) asserts the rendered arguments are the decoded mapping {"q": "slime"}, that "[1, 2]" wraps to {"_raw_arguments": [1, 2]}, and that falsy non-dict values are preserved losslessly (0{"_raw_arguments": 0}, []{"_raw_arguments": []}, None{}). The file already runs in the CPU agent-adapter-test job (num_gpus: 0), so the test executes with no workflow change.

@EazyReal EazyReal force-pushed the upstream-pr/agent-toolcall-args-mapping branch 2 times, most recently from b91e834 to f794e94 Compare June 12, 2026 06:21
…emplates

The OpenAI adapter's `_normalize_tool_call` builds the assistant message that is
later fed to `tokenizer.apply_chat_template`. It serialized
`tool_calls[].function.arguments` to a JSON *string* (via `json_arguments`)
before rendering. Qwen-family chat templates iterate `tool_call.arguments.items()`,
so a string argument mis-renders (per-character iteration / template error) and the
rendered prompt diverges from what the model actually saw. In a token-capturing
rollout this yields a rollout/train token desync.

Fix: add a small pure helper `dict_arguments(value) -> dict` in
`adapters/common.py` that decodes echoed wire strings back to a mapping at the
render boundary:
  - dict passthrough;
  - `json.loads` for strings (empty wire string -> `{}`);
  - only `None` means "no arguments"; every other non-dict value -- including
    falsy ones like `0`, `False`, `[]` -- funnels through the
    `{"_raw_arguments": ...}` sentinel (already a repo convention, see
    `agent/parsing.py`), so the `-> dict` contract holds for all inputs without
    silently dropping argument payloads.
Switch the single render-boundary call site in `_normalize_tool_call` from
`json_arguments` to `dict_arguments`.

The outbound wire path (`_openai_tool_calls`) is unchanged and still emits
`arguments` as a JSON string, as the OpenAI spec requires.

Verifiable via `tests/test_agent_adapters.py::test_openai_render_tool_call_arguments_are_dicts`
(marked `unit`), which asserts the rendered tool-call `arguments` is the decoded
mapping `{"q": "slime"}`, that non-dict values wrap in the sentinel, and that
falsy non-dict values (`0`, `[]`) are preserved rather than dropped.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@EazyReal EazyReal force-pushed the upstream-pr/agent-toolcall-args-mapping branch from f794e94 to 0b6a708 Compare June 12, 2026 08:25
@jingshenghang

Copy link
Copy Markdown
Collaborator

Hi, our refactoring of the agent framework has been merged into the main branch. The Codex/OpenAI testing and validation were insufficient; we welcome pull requests based on our latest refactor.

#2005

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants