fix(warm-transfer): capture job context for post-merge caller-room cleanup#1896
fix(warm-transfer): capture job context for post-merge caller-room cleanup#1896toubatbrian wants to merge 2 commits into
Conversation
pause() cleared the entire native AudioSource queue, permanently dropping up to queueSizeMs of generated-but-unplayed audio. On a false interruption (pause then resume) those frames were never replayed, so up to ~1s of agent speech was lost mid-sentence from both the live call and the recording. Keep a rolling window of recently pushed frames, capture the unplayed tail on pause(), and replay it on resume(), while discarding it on a real interruption (clearBuffer()). Also cap the default room output queue to 200ms to match Python. Co-authored-by: Cursor <cursoragent@cursor.com>
…eanup The post-merge ParticipantDisconnected listener runs from a native rtc-node FFI callback whose AsyncLocalStorage context is pinned to FfiClient-singleton creation, not the job context, so getJobContext() read an empty/stale store and threw as an unhandled rejection, leaving the caller room undeleted on hangup. Capture the JobContext eagerly in onEnter() and use jobCtx.deleteRoom() in the late handler (which also forwards the job's API credentials instead of relying on environment variables). Closes #1895. Co-authored-by: Cursor <cursoragent@cursor.com>
🦋 Changeset detectedLatest commit: 908767d The changes in this PR will be included in the next version bump. This PR includes changesets to release 35 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 908767da1e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (interrupted) { | ||
| this.replayFrames = []; |
There was a problem hiding this comment.
Do not carry replay frames into the next utterance
When a false interruption happens after the last TTS frame has already been captured, pause() stores replayFrames and clears the native queue, and forwardAudio still calls audioOutput.flush() when the TTS stream ends (agents/src/voice/generation.ts:907). This non-interrupted finish clears only the rolling window, so the saved tail survives and the next reply's first captureFrame() will replay stale audio from the previous utterance before the new audio instead of resuming it in the original segment.
Useful? React with 👍 / 👎.
| if (interrupted) { | ||
| this.replayFrames = []; | ||
| } |
There was a problem hiding this comment.
🟡 Unplayed audio tail from a previous response can leak into the next response
Saved replay frames are not discarded when a segment finishes without interruption (waitForPlayoutTask at agents/src/voice/room_io/_output.ts:546), so stale audio from one response can replay at the start of the next.
Impact: A listener may briefly hear the tail of the previous agent response at the beginning of the next response, causing a noticeable audio glitch.
Trigger: false interruption during playout after TTS flush
- TTS finishes generating, all frames are pushed via
captureFrame, andflush()startswaitForPlayoutTaskwhich awaits playout. - While audio drains, a false interruption fires:
pause()captures the unplayed tail intoreplayFrames(_output.ts:419-427) and callsclearQueue()(_output.ts:430). - The queue is now empty, so
waitForPlayout()resolves.waitForPlayoutTaskseesinterrupted=false(noclearBufferwas called) and skips thereplayFrames = []branch (_output.ts:546-548). recentFramesis cleared butreplayFramesretains stale frames from the finished segment.- When the next segment's first
captureFrameruns, it replays the stale frames (_output.ts:465-472) before pushing the new audio, injecting old content into the new response.
The conditional clear at _output.ts:546-548 should be unconditional — once a segment finishes (regardless of how), leftover replay frames are no longer valid.
| if (interrupted) { | |
| this.replayFrames = []; | |
| } | |
| this.replayFrames = []; |
Was this helpful? React with 👍 or 👎 to provide feedback.
| // Match Python (_output.py: queue_size_ms=200). The rtc-node AudioSource | ||
| // default is 1000ms; a smaller prebuffer keeps the playout queue close to | ||
| // realtime so interruptions take effect promptly. | ||
| queueSizeMs: 200, | ||
| }; |
There was a problem hiding this comment.
🚩 Behavioral change: default audio queue reduced from 1000ms to 200ms
The PR changes DEFAULT_ROOM_OUTPUT_OPTIONS.queueSizeMs from undefined (which fell through to the rtc-node AudioSource default of 1000ms) to 200 at agents/src/voice/room_io/room_io.ts:143. This is a 5x reduction in the audio prebuffer. The comment says it matches the Python SDK (_output.py: queue_size_ms=200). While this helps interruptions take effect more promptly, it also reduces the tolerance for TTS timing jitter — if TTS frames arrive in bursts with gaps >200ms, the smaller queue is more likely to underrun, potentially causing audible gaps. Existing users who relied on the 1000ms default may notice different behavior.
Was this helpful? React with 👍 or 👎 to provide feedback.
|
Closing in favor of a fresh branch (brian/warm-transfer-fix) rebased on the 1.5.0 js-ification work. Same fix will be re-applied there. |
Summary
WarmTransferTaskregisters aRoomEvent.ParticipantDisconnectedlistener on the caller room afterconnect_to_callermerges the human rep in. That handler calledgetJobContext()to build aRoomServiceClientand delete the caller room when a party hangs up.The handler runs from a native rtc-node FFI callback, whose
AsyncLocalStoragecontext is pinned toFfiClient-singleton creation — not the job's context. SogetJobContext()reads an empty (or stale) store and throws, surfacing as an unhandled promise rejection and leaving the 2-party SIP room undeleted after the agent has left.Runtime evidence
Probes against the real
@livekit/rtc-nodeconfirmed the mechanism:FfiClientcreated inside the job context → ALS store present in the native callback.FfiClientcreated outside the job context (the real worker case: the jobRoomis built beforerunWithJobContextAsync, and a pooled child process creates the singleton before this job) → store absent in the native callback, regardless of where the listener was registered or the event emitted.Post-fix probe in the same failing scenario:
broken_getJobContext_wouldThrow: truevsfixed_usable: trueinside the identical native callback.Fix
JobContexteagerly inonEnter()(this._jobCtx), where the live context is provably available.ParticipantDisconnectedhandler now usesthis._jobCtx.deleteRoom(callerRoomName)instead ofgetJobContext()at emit time.JobContext.deleteRoom()also forwards the job's API key/secret (the oldnew RoomServiceClient(url)dropped credentials and relied on env) and no-ops on fake jobs.mergeCalls()now sources itsRoomServiceClientURL + credentials from the captured context.Closes #1895.
Test plan
pnpm build:agents(incl.tscdeclarations)