fix(agent): render OpenAI tool-call arguments as a mapping for chat templates#2063
Open
EazyReal wants to merge 1 commit into
Open
fix(agent): render OpenAI tool-call arguments as a mapping for chat templates#2063EazyReal wants to merge 1 commit into
EazyReal wants to merge 1 commit into
Conversation
b91e834 to
f794e94
Compare
…emplates
The OpenAI adapter's `_normalize_tool_call` builds the assistant message that is
later fed to `tokenizer.apply_chat_template`. It serialized
`tool_calls[].function.arguments` to a JSON *string* (via `json_arguments`)
before rendering. Qwen-family chat templates iterate `tool_call.arguments.items()`,
so a string argument mis-renders (per-character iteration / template error) and the
rendered prompt diverges from what the model actually saw. In a token-capturing
rollout this yields a rollout/train token desync.
Fix: add a small pure helper `dict_arguments(value) -> dict` in
`adapters/common.py` that decodes echoed wire strings back to a mapping at the
render boundary:
- dict passthrough;
- `json.loads` for strings (empty wire string -> `{}`);
- only `None` means "no arguments"; every other non-dict value -- including
falsy ones like `0`, `False`, `[]` -- funnels through the
`{"_raw_arguments": ...}` sentinel (already a repo convention, see
`agent/parsing.py`), so the `-> dict` contract holds for all inputs without
silently dropping argument payloads.
Switch the single render-boundary call site in `_normalize_tool_call` from
`json_arguments` to `dict_arguments`.
The outbound wire path (`_openai_tool_calls`) is unchanged and still emits
`arguments` as a JSON string, as the OpenAI spec requires.
Verifiable via `tests/test_agent_adapters.py::test_openai_render_tool_call_arguments_are_dicts`
(marked `unit`), which asserts the rendered tool-call `arguments` is the decoded
mapping `{"q": "slime"}`, that non-dict values wrap in the sentinel, and that
falsy non-dict values (`0`, `[]`) are preserved rather than dropped.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
f794e94 to
0b6a708
Compare
Collaborator
|
Hi, our refactoring of the agent framework has been merged into the main branch. The Codex/OpenAI testing and validation were insufficient; we welcome pull requests based on our latest refactor. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
In the OpenAI HTTP adapter (
slime/agent/adapters/openai.py),_normalize_tool_callbuilds the assistant message that_translate_chat_messagescollects intochain.chat_messages, whichrender_token_idsthen feeds totokenizer.apply_chat_template. That message is the render boundary. It serializedtool_calls[].function.argumentsto a JSON string viajson_arguments. Qwen-family chat templates iteratetool_call.arguments.items(), which expects a mapping — given a string they either iterate it character-by-character or raise a template error, so the rendered prompt diverges from the tokens the policy actually generated. In a token-capturing rollout that is a rollout/train token desync: the policy is trained on tokens it never sampled.Before vs After
Same input — an assistant turn whose tool call carries
argumentsas the JSON string echoed on the OpenAI wire:Before —
argumentsstays a JSON string, so the rendered assistant message is:{"type": "function", "function": {"name": "lookup", "arguments": '{"q": "slime"}'}} # strThe Qwen template's
arguments.items()then runs on a string →AttributeError: 'str' object has no attribute 'items'(or, on templates that tolerate it, per-character iteration that emits a garbled tool block). Either way the rendered prompt no longer matches the sampled tokens.After — the wire string is decoded back to the mapping the template expects:
{"type": "function", "function": {"name": "lookup", "arguments": {"q": "slime"}}} # dictarguments.items()now yields[("q", "slime")]and the tool call renders correctly.Hostile inputs that are valid JSON but not a mapping are wrapped so
.items()can never crash:Falsy non-dict values are real argument payloads, not "no arguments", and are preserved the same way — only
None(or an empty wire string) maps to{}:Fix
Add a small pure helper
dict_arguments(value) -> dictinslime/agent/adapters/common.pythat decodes echoed wire strings to a mapping at the render boundary:dictpasses through unchanged;strisjson.loads-decoded (an empty wire string →{}: the OpenAI wire encodes "no arguments" as an empty /"{}"string);None→{}(no arguments); every other non-dict outcome — including falsy values like0,False, and[]— funnels through a{"_raw_arguments": ...}sentinel (already a convention inslime/agent/parsing.py), so the-> dictcontract holds for all inputs,.items()can never crash, and no argument payload is ever silently dropped.Switch the single render-boundary call site in
_normalize_tool_callfromjson_argumentstodict_arguments.Why this is the right fix
None→{}branches mean callers already passing a mapping (or no arguments) get bit-identical output; only echoed JSON-string arguments change, and they now match what the chat template expects.0,False, and[]are payloads the model emitted. Gating the sentinel onis Noneinstead of truthiness means the normalization never rewrites a real payload to an empty call._openai_tool_calls(the response path) is untouched and still emitsfunction.argumentsas a JSON string, exactly as the OpenAI spec requires. Only the internal render boundary changed.{"_raw_arguments": ...}sentinel fromparsing.py; the-> dictreturn type guarantees.items()is always safe.tests/test_agent_adapters.py::test_openai_render_tool_call_arguments_are_dicts(markedunit) asserts the rendered arguments are the decoded mapping{"q": "slime"}, that"[1, 2]"wraps to{"_raw_arguments": [1, 2]}, and that falsy non-dict values are preserved losslessly (0→{"_raw_arguments": 0},[]→{"_raw_arguments": []},None→{}). The file already runs in the CPUagent-adapter-testjob (num_gpus: 0), so the test executes with no workflow change.