fix(apps): resolve streaming error handling for 2-minute timeout & content stream not allowed#453
fix(apps): resolve streaming error handling for 2-minute timeout & content stream not allowed#453Athosone wants to merge 10 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Introduces a reusable async rate limiter and integrates it into HttpStream to pace outbound activity sends, with additional tests covering pacing and coalescing behavior.
Changes:
- Added
make_limiter()utility for async rate limiting. - Updated
HttpStreamto enforce a per-stream minimum send interval and optionally coalesce informative updates. - Added/updated tests to validate pacing, ordering, and coalescing semantics.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/apps/src/microsoft_teams/apps/utils/limiter.py | New async limiter utility based on monotonic time and asyncio.sleep(). |
| packages/apps/src/microsoft_teams/apps/utils/init.py | Exposes make_limiter via package exports. |
| packages/apps/src/microsoft_teams/apps/http_stream.py | Adds pacing/coalescing knobs and enforces pacing via limiter in _send_activity. |
| packages/apps/tests/test_limiter.py | New unit test validating limiter spacing semantics. |
| packages/apps/tests/test_http_stream.py | Updates existing tests to use shorter intervals; adds new pacing/coalescing tests. |
| from typing import Awaitable, Callable | ||
|
|
||
|
|
||
| def make_limiter(rate: int, period: float = 1.0) -> Callable[[], Awaitable[None]]: |
There was a problem hiding this comment.
Already handled: make_limiter raises ValueError for rate < 1 and period < 0, so rate == 0 (ZeroDivisionError) and negative periods fail fast at construction. period == 0 is allowed on purpose and means "no pacing" (interval 0), which is how callers disable the limiter.
HttpStream could send streaming activities faster than the Teams 1 request/second per-stream limit, which gets the stream throttled and returned as 403 ContentStreamNotAllowed (surfaced to the user as a red error toast). Two causes: _flush() sent every queued informative update back to back plus the text chunk, and the only pacing was a 0.5s reschedule that was dropped once the queue drained, so the next emit() fired immediately. Add a per-stream leaky-slot limiter (make_limiter) and gate every chunk send through it in _send_activity. Because the limiter state is shared across flushes, it paces both the in-flush burst and the next post-drain emit. min_send_interval is exposed (default 1.0s) so callers can buffer toward the docs' 1.5-2s advice. A burst of informative updates in one flush coalesces to the latest by default (coalesce_informative_updates, matching the docs' one-reused-message guidance); set it False to pace out every update instead.
813b322 to
402286b
Compare
|
@microsoft-github-policy-service agree company="Michelin" |
Acquire the per-stream limiter in _send instead of _send_activity so every HTTP attempt is paced, including retries and close()'s final send. This keeps a retry or the final message from landing within the interval of the last chunk and tripping the Teams 1 req/s throttle. acquire() only waits when a send would actually be too soon, so close() pays no latency after an idle gap; min_send_interval=0 disables pacing. Tests that don't assert pacing construct with min_send_interval=0 to stay fast. Addresses Copilot review feedback on microsoft#453.
|
Thanks a lot for taking the time to review @lilyydu I will come back to you soon. Btw I noticed two other issues while playing with it:
However I wonder if the httpstream should carry that responsability alone, are these constraints only tight to it? |
…ault, simpler limiter - Plumb stream_min_send_interval and stream_coalesce_informative_updates through AppOptions -> ActivitySender -> HttpStream - Flip coalesce_informative_updates default to False so informative updates are paced, not silently dropped; coalescing is now opt-in - Simplify make_limiter(rate, period) to make_limiter(interval) and reword its docstring as a fixed-interval gate (token bucket of size 1)
|
Thanks for the push here! Some thoughts:
Can we still use the same retry path and the same send pacing/limiter for informative updates, text chunks, retries, and final close. |
Yeah these are outlined in the official streaming docs. Admittedly, the streaming system here could help by just closing the stream and just doing regular updates. I reached out to the folks that implemented streaming to ask what their preferred recommendation would be for these cases. |
|
Hi ! If this is a platform limit that you cannot increase then yes we could fallback updating the message (this is what I am doing on my app at the moment but the experience is not as nice as with a stream :p). For your comment on this PR I totally agree and will update the code, removing the flag in app config and coalesce the informative update and applying the limiter to different code path that emit bytes to the platform. It matches my experience in the sense of, the user doesnt really care for the next narrative update of an agent or the result of a tool call if you already have the answer ready to be sent. |
…stream modes Informative updates are status replacements, not cumulative content, so a burst now always collapses to the latest one (skipped count logged at debug level) instead of being paced out one-by-one or gated behind a flag. The stream now has two explicit modes: informative mode sends the latest pending informative update per flush; once text streaming starts the stream permanently switches to text mode and informative updates are dropped. Text landing in the same flush as informative updates supersedes them. Removes the stream_min_send_interval / stream_coalesce_informative_updates plumbing from AppOptions and ActivitySender; pacing stays fixed at the Teams 1 req/s limit and all sends keep going through the same limiter/retry path.
…eping An informative-only flush after text streaming starts used to fall through and re-send the unchanged cumulative text, burning a paced limiter slot under the flush lock for a no-op request. The chunk send is now guarded on text actually having been added by this flush. Also from the cleanup pass: track the latest informative update with a scalar and counter instead of building a list that only ever yields its last element, drop the unreachable start_length branch, use the typed ChannelData.stream_type field instead of getattr, trim a comment that restated the limiter docs, and deduplicate test boilerplate behind _await_pending_flush/_track_send_times helpers (tests with no timing assertions now run with pacing disabled).
|
Hey @Athosone, We're still waiting on guidance from internal folks, but I'm gonna test out your implementation in the meantime. Thanks for responding to our requests! |
|
Hey @Athosone, Can you share what error message you get back for the 403 you are receiving from the 1 request /sec limitation? There's a couple that fall into that bucket here https://learn.microsoft.com/en-us/microsoftteams/platform/bots/streaming-ux?tabs=csharp#error-codes and want to make sure we cover our bases |
@Athosone soft bump so we can unblock you! |
|
SOrry about that, totally missed the notification, I got different error messages: And For the first one I morph the stream to a message For the second I implemented a chunker so that my adapter abstract the complexity to my main code. |
|
Hey @Athosone , First, thanks for all the investigation and work you've done here! Streaming can definitely be improved so we highly appreciate your feedback and suggestions. A number of issues have been raised so we think it's best to isolate each into its own respective PR In regard to the bursty informative updates, I repro'd the code snippets you attached, and it does result in 429s but these are handled via our retry logic. However, we do agree that we should have some mechanism to control the bursty behaviour. I tried out the fix you're proposing here with 100 informative updates but it coalesced into one single msg - not sure if that's the ideal experience too. Let's continue bouncing ideas in #452 for this The 403s you were receiving are due to the 2 errors you attached above (2 minute limitation and chunk size). We currently return all 403s as "cancelled" which does not properly showcase the error table I linked above. I'm going to update this PR to surface the 2 minute error instead and then open up additional PRs following after to surface the other errors. Apologies for the delay with resolving this but glad we are surfacing and inching closer to a better streaming exp :) |
Removes the per-stream leaky-bucket limiter and associated changes, resetting the affected files to their main baseline. This provides a clean starting point for a new implementation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Updated the description: PR has evolved due to a number of raised issues wrt streaming error handling. This PR solely focuses on handling the 403 for the 2-minute timeout, and surfacing the "Content stream is not allowed" error. This table highlights all the different error codes, we are concerned with the 403s:
TO BE DISCUSSED:
|
| stream_min_send_interval: float = 1.0, | ||
| stream_coalesce_informative_updates: bool = False, |
There was a problem hiding this comment.
Let's stream_coalesce_informative_updates should be true always. let's not expose this.
| def __init__( | ||
| self, | ||
| client: Client, | ||
| stream_min_send_interval: float = 1.0, |
There was a problem hiding this comment.
also don't think we need this.
| if message != "Content stream was cancelled by user.": | ||
| if message == "Content stream finished due to exceeded streaming time.": | ||
| self._timed_out = True | ||
| logger.warning("Teams encountered an error while streaming. Sending as a regular message.") | ||
| raise StreamCancelledError(message) from e |
There was a problem hiding this comment.
This will raise the StreamCancelled error to the handler. We should probably raise a different error here.
Can we just do a switch statement here for various types of canceled errors? Also point to the learn docs that describe these.
| if e.response.status_code == 403: | ||
| error = e.response.json().get("error", {}) | ||
| message = error.get("message", "") | ||
| if message != "Content stream was cancelled by user.": |
There was a problem hiding this comment.
let's also make sure that the backend has tests for these particular strings
There was a problem hiding this comment.
(and maybe a more resilient error could be to just check for phrases like "canceled", "exceeded streaming time" etc.)
PR has evolved due to a number of raised issues wrt streaming error handling. This PR solely focuses on handling the 403 for the 2-minute timeout, and surfacing the "Content stream is not allowed" error.
This table highlights all the different error codes, we are concerned with the 403s:
https://learn.microsoft.com/en-us/microsoftteams/platform/bots/streaming-ux?tabs=csharp#error-codes
TO BE DISCUSSED:
OLD DESCRIPTION:
Fixes #452
Problem
HttpStreamcan send streaming activities faster than the Teams 1 request/secondper-stream limit. Teams then throttles the stream and returns
403 ContentStreamNotAllowed, which_sendconverts toStreamCancelledErrorand the user sees as a red error toast.
Two causes in
http_stream.py:_flush()sent every queued informative update back to back, then the textchunk. A burst of
update()calls became a burst of POSTs in one flush.call_later(0.5, ...)reschedule, armed only when abacklog survived a flush. Once the queue drained,
_timeoutwasNone, so thenext
emit()fired an immediate flush with no pacing.Fix
A per-stream leaky-slot limiter (
make_limiterinutils/limiter.py), gatingevery chunk send in
_send_activity. Itsnext_slotis instance state sharedacross flushes, so it paces both the in-flush burst and the next
emit()afterthe queue drains. The first send is not delayed, so the progress bar still
appears promptly.
min_send_intervalonHttpStream.__init__(default 1.0s) lets callers buffertoward the docs' 1.5 to 2s advice.
coalesce_informative_updatesdefaults toTrue: a burst of informativeupdates in one flush collapses to the latest, matching the docs' "one
informative message, reused for each update". Set it
Falseto pace out everyupdate at 1 req/s instead (see Notes).
Tests
Limiter spacing and input validation; rapid
update()burst coalesces to thelatest by default; with
coalesce_informative_updates=Falseevery update is sentin order, none dropped, paced at the interval; emits are paced across flushes (the
post-drain case); the interval is configurable; the flag never drops text.
Existing
test_http_stream.pybehaviour is unchanged. Fullpackages/appssuitepasses, ruff and pyright clean.
Notes
_send, so every HTTP attempt is paced: retries andclose()'s final send included.acquire()only waits when a send wouldactually be too soon, so close pays no latency unless it lands within the
interval of the last chunk. Pass
min_send_interval=0to disable pacing.coalesce_informative_updates=False,_flushholds its lock across thepaced sleeps, so a very long informative burst could keep the lock long enough
that
close()times out. The default (coalesce on) avoids this.