Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 41 additions & 1 deletion strands-py/src/strands/agent/agent.py
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds a first-class SDK option to stream only the final answer, eliminating the need for consumer-side buffering.

What's the use case for this versus agent.invoke? The events are buffered as is so I'm not clear why you would use this instead of agent.invoke which provides the completed message as well

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is a fair question. you are right that text events for the final turn are buffered until EventLoopStopEvent arrives with end_turn, then flushed as a batch. So for the text content alone, the user experience is similar to invoke(). however I think the parts of differences are:

  1. Non-text events still stream in real-time throughout the whole execution

with invoke(), the caller is blocked until the entire agent loop completes, which they won't have the visibility into tool calls, lifecycle events, reasoning, or progress. With stream_async(stream_final_turn_only=True), the consumer still receives:

  • start_event_loop per turn (progress indicator)
  • current_tool_use events
  • etc
  1. users wrapping agents in SSE endpoints already use stream_async for everything else. at least for my project I tend to stick with stream_async pattern (that's whats on the AWS example/ documentation as well). asking them to special-case invoke() for the final answer requires user to do extra research.

Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,15 @@
from ..tools.registry import ToolRegistry
from ..tools.structured_output._structured_output_context import StructuredOutputContext
from ..tools.watcher import ToolWatcher
from ..types._events import AgentResultEvent, EventLoopStopEvent, InitEventLoopEvent, ModelStreamChunkEvent, TypedEvent
from ..types._events import (
AgentResultEvent,
EventLoopStopEvent,
InitEventLoopEvent,
ModelStreamChunkEvent,
StartEventLoopEvent,
TextStreamEvent,
TypedEvent,
)
from ..types.agent import AgentInput, ConcurrentInvocationMode
from ..types.content import ContentBlock, Message, Messages, SystemContentBlock
from ..types.exceptions import ConcurrencyException, ContextWindowOverflowException
Expand Down Expand Up @@ -776,6 +784,7 @@ async def stream_async(
invocation_state: dict[str, Any] | None = None,
structured_output_model: type[BaseModel] | None = None,
structured_output_prompt: str | None = None,
stream_final_turn_only: bool = False,
**kwargs: Any,
) -> AsyncIterator[Any]:
"""Process a natural language prompt and yield events as an async iterator.
Expand All @@ -795,6 +804,11 @@ async def stream_async(
invocation_state: Additional parameters to pass through the event loop.
structured_output_model: Pydantic model type(s) for structured output (overrides agent default).
structured_output_prompt: Custom prompt for forcing structured output (overrides agent default).
stream_final_turn_only: When True, buffers text events from intermediate turns and only yields
text events from the final turn (where stop_reason is "end_turn"). Non-text events such as
lifecycle, tool use, reasoning, and citation events are yielded normally regardless of this
setting. When False (default), all events are yielded as they are produced with no change
in behavior.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The docstring says "Non-text events such as lifecycle, tool use, reasoning, and citation events are yielded normally regardless of this setting." While accurate, this creates an asymmetry that may confuse users: reasoning text from intermediate turns passes through (it's a ReasoningTextStreamEvent, not a TextStreamEvent), but regular text from those same turns does not. For agents using extended thinking, users would see intermediate reasoning but not intermediate text.

Suggestion: Consider calling this out explicitly in the docstring with a brief note, e.g.:

Note: Reasoning events from intermediate turns are still yielded since they are distinct
from text stream events. Only {"data": ...} text events are buffered/filtered.

**kwargs: Additional parameters to pass to the event loop.[Deprecating]

Yields:
Expand All @@ -811,11 +825,21 @@ async def stream_async(
Exception: Any exceptions from the agent invocation will be propagated to the caller.

Example:
Stream all events (default behavior):

```python
async for event in agent.stream_async("Analyze this data"):
if "data" in event:
yield event["data"]
```

Stream only the final answer (skip intermediate tool-use turns):

```python
async for event in agent.stream_async("Analyze this data", stream_final_turn_only=True):
if "data" in event:
yield event["data"] # Only receives final turn text
```
"""
# Conditionally acquire lock based on concurrent_invocation_mode
# Using threading.Lock instead of asyncio.Lock because run_async() creates
Expand Down Expand Up @@ -855,9 +879,25 @@ async def stream_async(
try:
events = self._run_loop(messages, merged_state, structured_output_model, structured_output_prompt)

text_event_buffer: list[dict[str, Any]] = []

async for event in events:
event.prepare(invocation_state=merged_state)

if stream_final_turn_only:
if isinstance(event, StartEventLoopEvent):
text_event_buffer.clear()
elif isinstance(event, TextStreamEvent):
text_event_buffer.append(event.as_dict())
continue
elif isinstance(event, EventLoopStopEvent):
stop_reason = event["stop"][0]
if stop_reason == "end_turn":
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: When stream_final_turn_only=True and the final turn ends with a non-end_turn stop reason (e.g., max_tokens, cancelled, content_filtered), all buffered text from that turn is silently discarded. In production, this means if a model hits its token limit on the final turn, the user receives zero text output with no indication of what happened.

Suggestion: Consider flushing buffered text for any stop reason that is not tool_use (since tool_use is the only reason that indicates "this isn't the final turn"). For example:

elif isinstance(event, EventLoopStopEvent):
    stop_reason = event["stop"][0]
    if stop_reason != "tool_use":
        for buffered in text_event_buffer:
            callback_handler(**buffered)
            yield buffered
    text_event_buffer.clear()

This way, if the agent is cancelled or hits max_tokens on the final turn, the partial text is still delivered to the caller. If you decide to keep the current behavior, please document explicitly in the docstring that text is only delivered for end_turn stop reasons (not just "final turn").

for buffered in text_event_buffer:
callback_handler(**buffered)
yield buffered
text_event_buffer.clear()

if event.is_callback_event:
as_dict = event.as_dict()
callback_handler(**as_dict)
Expand Down
Loading
Loading