feat(agent): persist extended-thinking output on outbound messages#1267
Merged
feat(agent): persist extended-thinking output on outbound messages#1267
Conversation
Captures the LLM's ``ThinkingBlock`` content from the final response in the agent loop and stores it on ``Message.thinking_text``. Until now the content was extracted by the SDK but silently discarded by ``get_response_text``, so admins debugging "why did the agent reply this way" had no audit surface short of re-running the call. Encrypted at rest under ``EncryptedString`` like ``body`` and ``tool_interactions_json``: thinking blocks routinely quote user content back at length, so they need the same treatment as the message body. The capture point is the FINAL response in the agent loop (the one whose ``reply_text`` becomes ``Message.body``). Earlier rounds produce tool calls and their thinking justifies a tool decision rather than the user-visible reply; keeping the persisted record aligned with what the user saw is the load-bearing semantic for the admin view that consumes this column (clawbolt-premium #456). Heartbeat-driven outbounds also pick up thinking via the same ``AgentResponse.thinking_text`` field, so the admin activity pane covers proactive replies too. Migration 033 adds the column as ``NOT NULL`` with a server default of empty string, which keeps existing rows readable and lets ORM ``insert`` calls that don't list the column (raw SQL paths in tests, the ``messages_001..`` migration end-to-end test) keep working.
Two cleanups from the self-review of #1267: 1. Migration 033 docstring claimed "Nullable / default empty" but the column is ``NOT NULL`` with a server default of empty string. The behavior is correct and intentional (raw-SQL inserts in older migration tests don't list the column and need the server default to backfill); only the docstring drifted. Brought the docstring in line with the actual ``op.add_column`` call. 2. The error-stop branch in ``ClawboltAgent.run`` returned ``AgentResponse(is_error_fallback=True, ...)`` with an implicit empty ``thinking_text`` even when the LLM produced thinking blocks before hitting the error stop. ``persist_outbound`` short-circuits on ``is_error_fallback`` so this rides along the in-memory response only today, but downstream observers (and any future policy that records error fallbacks for triage) now get the reasoning that led to the bail-out.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Captures the LLM's extended-thinking blocks (Anthropic
ThinkingBlock) on the final agent response and persists them on a newMessage.thinking_textcolumn. Today the SDK returns thinking content alongside text and tool calls;get_response_textextracts only text andparse_tool_callsonly tool uses, so the reasoning stream is silently discarded. With the column populated, the assistant message row carries the reasoning that produced its body.The motivation is the admin-side activity-pane request in clawbolt-premium#456: "give me a dropdown for each agent response that lets me expand and see the reasoning." Premium has nothing to surface today because the data isn't persisted; this PR fills that gap.
Capture point: the FINAL response in the agent loop, the one whose
reply_textbecomesMessage.body. Earlier rounds produce tool calls and their thinking justifies a tool decision rather than the user-visible reply, so keeping the persisted record aligned with what the user saw is the load-bearing semantic for the admin view that consumes this column. Heartbeat-driven outbound also picks up thinking via the sameAgentResponse.thinking_textfield, so proactive replies are covered too.Encryption:
thinking_textrides throughEncryptedStringlikebodyandtool_interactions_json. Thinking blocks routinely quote user content back at length (names, addresses, integration payloads), so they need the same at-rest treatment as the message body.Architecture
alembic/versions/033_add_thinking_text_to_messages.py: new column onmessagesasNOT NULLwithserver_default=""so existing rows remain readable and raw-SQL inserts in older migration tests keep working.backend/app/models.py:Message.thinking_textmapped throughEncryptedString; the model declaresserver_default=""soBase.metadata.create_all()(used by tests) builds an identical schema to what the migration produces.backend/app/agent/llm_parsing.py: newget_response_thinking()joinsThinkingBlockcontent across response blocks. Empty thinking strings are skipped so we don't render stray separators; thesignaturefield is intentionally not surfaced (it has no audit value to a human reader and is only meaningful for replaying the block back to Anthropic).backend/app/agent/core.py: capture from the FINAL response only, in both exit paths (clean break with no more tool calls AND the max-rounds branch).AgentResponse.thinking_textcarries the value out to the persistence layer.backend/app/agent/router.py+heartbeat.py: threadresponse.thinking_textintosession_store.add_message.backend/app/agent/session_db.py:add_message[_async]andadd_message_by_session_id[_async]acceptthinking_textkwarg; included in_MESSAGE_UPDATABLE_FIELDSso future updates stay consistent.backend/app/agent/dto.py:StoredMessage.thinking_textdefaults to""so existingStoredMessage(direction=..., body=...)constructors stay compatible.Tests
tests/test_llm_parsing.py(5 new cases): single block, multi-block join, empty when absent, skips empty blocks, empty content list.tests/test_session_db_async.py(1 new case): outbound thinking round-trips through the encryption decorator and is visible on the reloadedStoredMessage.Full OSS suite: 2396 passed, 2 skipped, 13 deselected. ruff/format/ty all clean.
Type
Checklist
uv run pytest -v)ruff check backend/ && ruff format --check backend/)AI Usage