feat(agent): persist extended-thinking output on outbound messages by njbrake · Pull Request #1267 · mozilla-ai/clawbolt

njbrake · 2026-05-07T18:37:37Z

Description

Captures the LLM's extended-thinking blocks (Anthropic ThinkingBlock) on the final agent response and persists them on a new Message.thinking_text column. Today the SDK returns thinking content alongside text and tool calls; get_response_text extracts only text and parse_tool_calls only tool uses, so the reasoning stream is silently discarded. With the column populated, the assistant message row carries the reasoning that produced its body.

The motivation is the admin-side activity-pane request in clawbolt-premium#456: "give me a dropdown for each agent response that lets me expand and see the reasoning." Premium has nothing to surface today because the data isn't persisted; this PR fills that gap.

Capture point: the FINAL response in the agent loop, the one whose reply_text becomes Message.body. Earlier rounds produce tool calls and their thinking justifies a tool decision rather than the user-visible reply, so keeping the persisted record aligned with what the user saw is the load-bearing semantic for the admin view that consumes this column. Heartbeat-driven outbound also picks up thinking via the same AgentResponse.thinking_text field, so proactive replies are covered too.

Encryption: thinking_text rides through EncryptedString like body and tool_interactions_json. Thinking blocks routinely quote user content back at length (names, addresses, integration payloads), so they need the same at-rest treatment as the message body.

Architecture

alembic/versions/033_add_thinking_text_to_messages.py: new column on messages as NOT NULL with server_default="" so existing rows remain readable and raw-SQL inserts in older migration tests keep working.
backend/app/models.py: Message.thinking_text mapped through EncryptedString; the model declares server_default="" so Base.metadata.create_all() (used by tests) builds an identical schema to what the migration produces.
backend/app/agent/llm_parsing.py: new get_response_thinking() joins ThinkingBlock content across response blocks. Empty thinking strings are skipped so we don't render stray separators; the signature field is intentionally not surfaced (it has no audit value to a human reader and is only meaningful for replaying the block back to Anthropic).
backend/app/agent/core.py: capture from the FINAL response only, in both exit paths (clean break with no more tool calls AND the max-rounds branch). AgentResponse.thinking_text carries the value out to the persistence layer.
backend/app/agent/router.py + heartbeat.py: thread response.thinking_text into session_store.add_message.
backend/app/agent/session_db.py: add_message[_async] and add_message_by_session_id[_async] accept thinking_text kwarg; included in _MESSAGE_UPDATABLE_FIELDS so future updates stay consistent.
backend/app/agent/dto.py: StoredMessage.thinking_text defaults to "" so existing StoredMessage(direction=..., body=...) constructors stay compatible.

Tests

tests/test_llm_parsing.py (5 new cases): single block, multi-block join, empty when absent, skips empty blocks, empty content list.
tests/test_session_db_async.py (1 new case): outbound thinking round-trips through the encryption decorator and is visible on the reloaded StoredMessage.

Full OSS suite: 2396 passed, 2 skipped, 13 deselected. ruff/format/ty all clean.

Type

Checklist

Tests pass (uv run pytest -v)
Lint passes (ruff check backend/ && ruff format --check backend/)
New tests added for new functionality
Bug fixes include regression tests

AI Usage

AI-assisted (Claude Code drafted the implementation, tests, and PR text after a planning conversation; reviewed by njbrake)
No AI used

Captures the LLM's ``ThinkingBlock`` content from the final response in the agent loop and stores it on ``Message.thinking_text``. Until now the content was extracted by the SDK but silently discarded by ``get_response_text``, so admins debugging "why did the agent reply this way" had no audit surface short of re-running the call. Encrypted at rest under ``EncryptedString`` like ``body`` and ``tool_interactions_json``: thinking blocks routinely quote user content back at length, so they need the same treatment as the message body. The capture point is the FINAL response in the agent loop (the one whose ``reply_text`` becomes ``Message.body``). Earlier rounds produce tool calls and their thinking justifies a tool decision rather than the user-visible reply; keeping the persisted record aligned with what the user saw is the load-bearing semantic for the admin view that consumes this column (clawbolt-premium #456). Heartbeat-driven outbounds also pick up thinking via the same ``AgentResponse.thinking_text`` field, so the admin activity pane covers proactive replies too. Migration 033 adds the column as ``NOT NULL`` with a server default of empty string, which keeps existing rows readable and lets ORM ``insert`` calls that don't list the column (raw SQL paths in tests, the ``messages_001..`` migration end-to-end test) keep working.

Two cleanups from the self-review of #1267: 1. Migration 033 docstring claimed "Nullable / default empty" but the column is ``NOT NULL`` with a server default of empty string. The behavior is correct and intentional (raw-SQL inserts in older migration tests don't list the column and need the server default to backfill); only the docstring drifted. Brought the docstring in line with the actual ``op.add_column`` call. 2. The error-stop branch in ``ClawboltAgent.run`` returned ``AgentResponse(is_error_fallback=True, ...)`` with an implicit empty ``thinking_text`` even when the LLM produced thinking blocks before hitting the error stop. ``persist_outbound`` short-circuits on ``is_error_fallback`` so this rides along the in-memory response only today, but downstream observers (and any future policy that records error fallbacks for triage) now get the reasoning that led to the bail-out.

njbrake added 2 commits May 7, 2026 18:36

njbrake merged commit 59f0aa4 into main May 8, 2026
8 checks passed

njbrake deleted the feat/persist-llm-thinking branch May 8, 2026 11:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent): persist extended-thinking output on outbound messages#1267

feat(agent): persist extended-thinking output on outbound messages#1267
njbrake merged 2 commits intomainfrom
feat/persist-llm-thinking

njbrake commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

njbrake commented May 7, 2026

Description

Architecture

Tests

Type

Checklist

AI Usage

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant