Skip to content

agent/tools: fix shorthand-schema bug that aborts agent turns mid-task#63

Merged
pufit merged 1 commit into
ClickHouse:mainfrom
alex-fedotyev:claude/fix-tool-schema-shorthand
May 13, 2026
Merged

agent/tools: fix shorthand-schema bug that aborts agent turns mid-task#63
pufit merged 1 commit into
ClickHouse:mainfrom
alex-fedotyev:claude/fix-tool-schema-shorthand

Conversation

@alex-fedotyev
Copy link
Copy Markdown
Contributor

Summary

The Claude Agent SDK's _build_schema (claude_agent_sdk/__init__.py:393-413) converts shorthand input_schema dicts by forcing every property into required: list(properties.keys()) and silently dropping descriptions and defaults via _python_type_to_json_schema. Most of nerve/agent/tools.py uses the shorthand form, so 23 tools advertise optional-with-default fields as required to the model.

This PR adds a thin wrapper around claude_agent_sdk.tool that pre-promotes shorthand dicts to the explicit JSON Schema form the SDK passes through unchanged. properties are kept intact (so descriptions and defaults survive) and required only lists fields that have no default. The four explicit schemas already in the file (_NOTIFY_SCHEMA, _ASK_USER_SCHEMA, _REACT_SCHEMA, _SEND_STICKER_SCHEMA) are untouched.

The fix is one place; every existing and future tool that uses the shorthand decorator picks it up automatically.

Symptoms (when this has occurred)

When the model trusted a documented default and omitted the field on a call, the bundled CLI threw McpToolCallError instead of returning a normal is_error: true tool result. The exception propagates up the streaming connection and ends the agent's turn early. From the user's POV the assistant just stopped mid-task. Recurring log signature:

[WARN] Streaming stall detected: NN.Ns gap between events (stall #1)
[ERROR] MCP server "nerve" Input validation error: '<field>' is a required property
[WARN] [Stall] tool_dispatch_end tool=mcp__nerve__<tool> outcome=error durationMs=...
[ERROR] McpToolCallError: McpToolCallError: Input validation error: '<field>' is a required property
[WARN] Streaming completed with 1 stall(s), total stall time: NN.Ns

After the stream completes, Nerve's engine registers a background-task tracker, so the session looks "done" even though the model intended to continue. Tools that have repro'd this in the wild (recent logs) include mcp__nerve__plan_propose and mcp__nerve__memorize. Any tool with at least one default-annotated field is exposed:

task_search, task_create, task_list, task_update, task_done, memory_recall, conversation_history, memory_records_by_date, memorize, memory_update, category_update, sync_status, list_sources, poll_source, poll_all_sources, read_source, plan_propose, plan_list, plan_decline, skill_create, skill_run_script, hoa_execute.

Side benefit

The shorthand path also routes property values through _python_type_to_json_schema, which has no dict branch and falls back to {"type": "string"} — so descriptions and defaults are stripped before the model ever sees them. Promoting shorthand to explicit schema fixes that as well; the model now receives the parameter descriptions Nerve has been writing for months.

Test plan

  • pytest tests/test_tools_schemas.py -v — 78 passed (4 wrapper behaviour cases + parametrised audit of every tool in ALL_TOOLS for type/properties/required correctness and "no defaulted field is required").
  • pytest tests/test_session_mcp.py tests/test_engine.py — 26 passed, no regressions.
  • pytest -q (full suite, excluding network-bound CLI tests) — 509 passed; 2 pre-existing failures in test_bootstrap.py and test_services.py related to the docker-mcp spike, unrelated to this change.

Out of scope

Bug 2 from the original analysis — that McpToolCallError is raised at all on validation failures instead of being returned as is_error: true — lives in the bundled Claude Code CLI (/$bunfs/root/src/entrypoints/cli.js:4894), not Nerve. This PR removes the most common trigger; the upstream fix would prevent any future schema mismatch from terminating a turn.

The Claude Agent SDK's _build_schema (claude_agent_sdk/__init__.py:393-413)
converts shorthand input_schema dicts by forcing every property into
required: list(properties.keys()) and silently dropping descriptions
and defaults via _python_type_to_json_schema. Almost every tool in
this file uses the shorthand form, so 23 tools advertised optional-
with-default fields as required to the model. When the model trusted
a documented default and omitted the field, the SDK threw
McpToolCallError, which propagates up the stream and ends the agent's
turn early; Nerve's engine then logs "Streaming completed with N
stall(s)" and registers a background-task tracker, leaving users with
the impression that the assistant stopped mid-task.

The wrapper pre-promotes shorthand dicts to the explicit JSON Schema
form the SDK passes through unchanged. The four explicit schemas
already in the file (_NOTIFY_SCHEMA, _ASK_USER_SCHEMA, _REACT_SCHEMA,
_SEND_STICKER_SCHEMA) are untouched. Side benefit: parameter
descriptions now reach the model for every tool that previously
relied on the shorthand.

Adds tests/test_tools_schemas.py with 78 assertions covering wrapper
behaviour and a parametrised audit of every tool in ALL_TOOLS.
@alex-fedotyev
Copy link
Copy Markdown
Contributor Author

Bumping for review. This is one of five small reliability fixes (#63, #64, #65, #66, #67) that compound: each closes one cause of chats that wedge mid-turn, lose context after restart, or get routed into the wrong session. Happy to split, rebase, or land any of them in any order; they don't depend on each other.

In daily use the chat surface is the single biggest friction right now, so any of these merging would help. Let me know if you want changes.

Copy link
Copy Markdown
Member

@pufit pufit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix!

@pufit pufit merged commit 4edf91b into ClickHouse:main May 13, 2026
pufit pushed a commit that referenced this pull request May 13, 2026
User-reported glitch: "I ask for something and there is no response, then
I ask again and it answers to the previous question." Five reliability
PRs (#63 shorthand-schema, #64 synthetic done, #65 stale sdk_session_id,
#66 idle timeout, #67 sticky session) each close one underlying cause.
Two gaps remain that none of those PRs cover.

Gap 1: client-side send silently drops payloads.
web/src/api/websocket.ts checked readyState === OPEN and no-op'd
otherwise. The 3s reconnect window leaves a hole: send() returns to the
caller and chatStore.sendMessage has already optimistically appended the
user message and set isStreaming=true. The user thinks the agent is
thinking but the message never reached the server, so the next reply
lands against a stale prompt.

Track readyState explicitly. CONNECTING or reconnect-scheduled now queues
the payload (bounded to 5 entries; oldest evicted) and flushes from
onopen. CLOSED-without-reconnect and CLOSING return 'dropped' so the
caller can revert. chatStore.sendMessage pops the optimistic user message
on 'dropped' and surfaces an inline assistant error so the user can
retry.

Gap 2: gateway initial-bind never replayed the broadcaster buffer.
The switch_session handler already shipped session_status with
buffered_events on session switch, but the initial-connect handshake at
server.py:286-311 didn't. Reload mid-turn (or a transient 3s WS drop)
and the in-flight stream was lost from the client's view even though
the events sat in broadcaster._session_buffers waiting to be replayed.

Lift the duplicated send-status construction into _send_session_status
and call it from both branches. Initial-bind gates on
broadcaster.is_buffering so idle sessions stay silent; switch_session
calls unconditionally so the client refreshes is_running/status on
every selection. The frontend handleSessionStatus already restores
streamingBlocks, panels, todos, and interaction state from the buffer
(handled by #69), so this is purely additive at the gateway.

Tests:
- 9 new asserts in tests/test_gateway_ws.py covering the helper output,
  the initial-bind gate, the switch_session regression path, and a
  load-fidelity check for buffer ordering.
- Full pytest run: 444 pass, 2 skip, 2 pre-existing failures unrelated
  (test_bootstrap docker-env detection and test_cli_upgrade docker
  mode, both noted in notes/repo-conventions/nerve.md).
- web/ tsc --noEmit clean, npm run build clean.

Out of scope (followups, not blocking):
- Stale-listener cleanup on swallowed send_json exceptions
  (server.py:298-301).
- Application-level message_received ack from engine after
  sessions.add_message.
- _session_locks TTL on session archive.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants