Fix silent null-arg tool dispatch causing runaway tool-call loops#5790
Fix silent null-arg tool dispatch causing runaway tool-call loops#5790laran wants to merge 1 commit intospring-projects:mainfrom
Conversation
Related: spring-projects#5754 Related: spring-projects#3333 Related: spring-projects#2383 Related: spring-projects#4464 Related: spring-projects#4617 The "Handle the possible null parameter situation in streaming mode" logic added in b059cdf silently replaced null or empty tool-call arguments with "{}". When a tool has required parameters, the downstream MethodToolCallback then called the method with null for every required argument and silently returned whatever the tool produced. Many tool implementations return a valid-looking but empty result in that case, which the model interprets as a transient failure and retries — often with the identical call. Combined with the absence of any iteration limit on Spring AI's tool-call recursion (spring-projects#3333), this can produce multi-million-token runaway loops in a single turn. Fix: 1. DefaultToolCallingManager: when tool arguments are null or empty, raise a ToolExecutionException and route it through the standard ToolExecutionExceptionProcessor so the resulting error becomes a proper tool response. The model can then adjust its approach rather than retry blindly. Well-formed tool calls in the same batch still execute normally. 2. MethodToolCallback.buildMethodArguments: when a required parameter (default: @ToolParam.required = true) is missing from the tool input map, throw a ToolExecutionException with a clear "Missing required parameter" message. Previously, buildTypedArgument silently returned null for missing values, allowing method invocation to proceed with null arguments. Optional parameters marked with @ToolParam(required = false) still pass null through unchanged, and zero-parameter tools are unaffected. 3. Sync/AsyncMcpToolCallback: apply the same fix as (1) for MCP callbacks that had the identical silent-"{}" fallback. Tests: - DefaultToolCallingManagerTest: the two tests that previously asserted the buggy behavior (shouldHandleNullArgumentsInStreamMode, shouldHandleEmptyArgumentsInStreamMode) are rewritten to assert that the tool callback is NOT invoked and that the conversation history contains a tool response with the error from the exception processor. A new test (shouldExecuteValidToolsWhileReturningErrorForMalformedTool) verifies that a batch containing a malformed call and a valid call processes both independently. - MethodToolCallbackExceptionHandlingTest: new tests verify that (a) missing required parameters throw ToolExecutionException and the underlying method is never invoked, (b) optional parameters are still allowed to be null, and (c) zero-param tools remain callable with "{}". - SyncMcpToolCallbackTests / AsyncMcpToolCallbackTest: rewritten null/empty input tests to assert the new error path and verify that the MCP client is never invoked. - DefaultToolCallingManagerTests#whenMixedMethodToolCallsInChatResponse ThenExecute was implicitly relying on the silent-null behavior by calling TestGenericClass.call(String) with an empty args map. Updated to supply a value for the required parameter. Loop safety of the new tests: every new test invokes its callback exactly once and asserts on the single result. There is no recursion, retry loop, or chat model involved, so the tests cannot themselves reproduce the runaway loop they guard against. Signed-off-by: Laran Evans <laran@laranevans.com>
|
The fix is correct and well-scoped — routing null/empty args through One trust dimension worth adding to the exception handling: agent identity context on the exception. When a tool execution fails due to null/missing arguments, the exception currently contains the tool name and error type. In multi-agent Spring AI deployments where tool calls originate from different agents, the exception should also carry the calling agent identity — this is essential for diagnosing whether null-arg failures are concentrated in specific agents (suggesting a model-side bug or prompt issue) vs. distributed across all callers (suggesting a tool registration problem). A minimal addition to public class ToolExecutionException extends RuntimeException {
private final String toolName;
private final String callerAgentId; // which agent made this call
// ...
}This also matters for behavioral trust: an agent that consistently produces null-arg tool calls has a measurable behavioral signature — high null-arg error rate across sessions is an anomaly signal. If |
…lity Per recommendation in spring-projects#5790 (comment) In multi-agent Spring AI deployments, null-argument tool-call failures are difficult to attribute to a specific calling agent without a separate identity-propagation layer. This change makes ToolExecutionException a natural carrier for that identity. Changes: - ToolExecutionException: add @nullable callerAgentId field, a new (ToolDefinition, String, Throwable) constructor, and getCallerAgentId(). The existing two-arg constructor delegates with null, so all call sites remain backward-compatible. - ToolContext: add TOOL_CALLER_AGENT_ID constant — the agreed-upon key for multi-agent deployments to supply caller identity via the context map. - DefaultToolCallingManager: in the null-args error path, extract TOOL_CALLER_AGENT_ID from the ToolContext and pass it to the new constructor, so observability tooling that inspects ToolExecutionException can attribute the failure to a specific agent without extra plumbing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Good callout. I updated to include a callerAgentId. |
|
@tzolov I'd like to encourage this fix be prioritized. I encountered this issue when running locally, using Opus 4.6, in an application that uses tools heavily. Because of the bug this PR fixes I ended up experiencing an infinite loop when calling the Anthropic API that cost me $125 in input tokens alone in a matter of 10 minutes. I've added several safeguards to protect myself since. But ultimately, fixing this bug is what's needed. Until it's fixed I'm just running a locally patched version. Please prioritize this ASAP. The bug has the potential to cause catastrophic API cost. |
Related: #5754
Related: #3333
Related: #2383
Related: #4464
Related: #4617
The "Handle the possible null parameter situation in streaming mode" logic added in b059cdf silently replaced null or empty tool-call arguments with "{}". When a tool has required parameters, the downstream MethodToolCallback then called the method with null for every required argument and silently returned whatever the tool produced. Many tool implementations return a valid-looking but empty result in that case, which the model interprets as a transient failure and retries — often with the identical call. Combined with the absence of any iteration limit on Spring AI's tool-call recursion (#3333), this can produce multi-million-token runaway loops in a single turn.
Fix:
DefaultToolCallingManager: when tool arguments are null or empty, raise a ToolExecutionException and route it through the standard ToolExecutionExceptionProcessor so the resulting error becomes a proper tool response. The model can then adjust its approach rather than retry blindly. Well-formed tool calls in the same batch still execute normally.
MethodToolCallback.buildMethodArguments: when a required parameter (default: @ToolParam.required = true) is missing from the tool input map, throw a ToolExecutionException with a clear "Missing required parameter" message. Previously, buildTypedArgument silently returned null for missing values, allowing method invocation to proceed with null arguments. Optional parameters marked with @ToolParam(required = false) still pass null through unchanged, and zero-parameter tools are unaffected.
Sync/AsyncMcpToolCallback: apply the same fix as (1) for MCP callbacks that had the identical silent-"{}" fallback.
Tests:
DefaultToolCallingManagerTest: the two tests that previously asserted the buggy behavior (shouldHandleNullArgumentsInStreamMode, shouldHandleEmptyArgumentsInStreamMode) are rewritten to assert that the tool callback is NOT invoked and that the conversation history contains a tool response with the error from the exception processor. A new test (shouldExecuteValidToolsWhileReturningErrorForMalformedTool) verifies that a batch containing a malformed call and a valid call processes both independently.
MethodToolCallbackExceptionHandlingTest: new tests verify that (a) missing required parameters throw ToolExecutionException and the underlying method is never invoked, (b) optional parameters are still allowed to be null, and (c) zero-param tools remain callable with "{}".
SyncMcpToolCallbackTests / AsyncMcpToolCallbackTest: rewritten null/empty input tests to assert the new error path and verify that the MCP client is never invoked.
DefaultToolCallingManagerTests#whenMixedMethodToolCallsInChatResponse ThenExecute was implicitly relying on the silent-null behavior by calling TestGenericClass.call(String) with an empty args map. Updated to supply a value for the required parameter.
Loop safety of the new tests: every new test invokes its callback exactly once and asserts on the single result. There is no recursion, retry loop, or chat model involved, so the tests cannot themselves reproduce the runaway loop they guard against.
Thank you for taking time to contribute this pull request!
You might have already read the contributor guide, but as a reminder, please make sure to:
git commit -s) per the DCOmainbranch and squash your commitsFor more details, please check the contributor guide.
Thank you upfront!