Skip to content

fix(warm-transfer): capture job context for post-merge caller-room cleanup#1896

Closed
toubatbrian wants to merge 2 commits into
mainfrom
fix/warm-transfer-job-context
Closed

fix(warm-transfer): capture job context for post-merge caller-room cleanup#1896
toubatbrian wants to merge 2 commits into
mainfrom
fix/warm-transfer-job-context

Conversation

@toubatbrian

@toubatbrian toubatbrian commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Summary

WarmTransferTask registers a RoomEvent.ParticipantDisconnected listener on the caller room after connect_to_caller merges the human rep in. That handler called getJobContext() to build a RoomServiceClient and delete the caller room when a party hangs up.

The handler runs from a native rtc-node FFI callback, whose AsyncLocalStorage context is pinned to FfiClient-singleton creation — not the job's context. So getJobContext() reads an empty (or stale) store and throws, surfacing as an unhandled promise rejection and leaving the 2-party SIP room undeleted after the agent has left.

Runtime evidence

Probes against the real @livekit/rtc-node confirmed the mechanism:

  • FfiClient created inside the job context → ALS store present in the native callback.
  • FfiClient created outside the job context (the real worker case: the job Room is built before runWithJobContextAsync, and a pooled child process creates the singleton before this job) → store absent in the native callback, regardless of where the listener was registered or the event emitted.

Post-fix probe in the same failing scenario: broken_getJobContext_wouldThrow: true vs fixed_usable: true inside the identical native callback.

Fix

  • Capture the JobContext eagerly in onEnter() (this._jobCtx), where the live context is provably available.
  • The late ParticipantDisconnected handler now uses this._jobCtx.deleteRoom(callerRoomName) instead of getJobContext() at emit time. JobContext.deleteRoom() also forwards the job's API key/secret (the old new RoomServiceClient(url) dropped credentials and relied on env) and no-ops on fake jobs.
  • mergeCalls() now sources its RoomServiceClient URL + credentials from the captured context.

Closes #1895.

Test plan

  • pnpm build:agents (incl. tsc declarations)
  • ESLint + Prettier on changed file
  • Mechanism + post-fix runtime probes (described above)
  • Live SIP end-to-end: run a warm transfer, then hang up the caller (and separately the rep) after the bridge; confirm the caller room is deleted and there is no "no job context found" unhandled rejection in the worker logs.

toubatbrian and others added 2 commits June 25, 2026 15:47
pause() cleared the entire native AudioSource queue, permanently dropping
up to queueSizeMs of generated-but-unplayed audio. On a false interruption
(pause then resume) those frames were never replayed, so up to ~1s of agent
speech was lost mid-sentence from both the live call and the recording.

Keep a rolling window of recently pushed frames, capture the unplayed tail
on pause(), and replay it on resume(), while discarding it on a real
interruption (clearBuffer()). Also cap the default room output queue to
200ms to match Python.

Co-authored-by: Cursor <cursoragent@cursor.com>
…eanup

The post-merge ParticipantDisconnected listener runs from a native rtc-node
FFI callback whose AsyncLocalStorage context is pinned to FfiClient-singleton
creation, not the job context, so getJobContext() read an empty/stale store and
threw as an unhandled rejection, leaving the caller room undeleted on hangup.

Capture the JobContext eagerly in onEnter() and use jobCtx.deleteRoom() in the
late handler (which also forwards the job's API credentials instead of relying
on environment variables). Closes #1895.

Co-authored-by: Cursor <cursoragent@cursor.com>
@changeset-bot

changeset-bot Bot commented Jun 26, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 908767d

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 35 packages
Name Type
@livekit/agents Patch
@livekit/agents-plugin-anam Patch
@livekit/agents-plugin-assemblyai Patch
@livekit/agents-plugin-baseten Patch
@livekit/agents-plugin-bey Patch
@livekit/agents-plugin-cartesia Patch
@livekit/agents-plugin-cerebras Patch
@livekit/agents-plugin-deepgram Patch
@livekit/agents-plugin-did Patch
@livekit/agents-plugin-elevenlabs Patch
@livekit/agents-plugin-fishaudio Patch
@livekit/agents-plugin-google Patch
@livekit/agents-plugin-hedra Patch
@livekit/agents-plugin-hume Patch
@livekit/agents-plugin-inworld Patch
@livekit/agents-plugin-lemonslice Patch
@livekit/agents-plugin-liveavatar Patch
@livekit/agents-plugin-livekit Patch
@livekit/agents-plugin-minimax Patch
@livekit/agents-plugin-mistral Patch
@livekit/agents-plugin-mistralai Patch
@livekit/agents-plugin-neuphonic Patch
@livekit/agents-plugin-openai Patch
@livekit/agents-plugin-perplexity Patch
@livekit/agents-plugin-phonic Patch
@livekit/agents-plugin-resemble Patch
@livekit/agents-plugin-rime Patch
@livekit/agents-plugin-runway Patch
@livekit/agents-plugin-sarvam Patch
@livekit/agents-plugin-silero Patch
@livekit/agents-plugin-soniox Patch
@livekit/agents-plugin-tavus Patch
@livekit/agents-plugins-test Patch
@livekit/agents-plugin-trugen Patch
@livekit/agents-plugin-xai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 908767da1e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +546 to +547
if (interrupted) {
this.replayFrames = [];

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Do not carry replay frames into the next utterance

When a false interruption happens after the last TTS frame has already been captured, pause() stores replayFrames and clears the native queue, and forwardAudio still calls audioOutput.flush() when the TTS stream ends (agents/src/voice/generation.ts:907). This non-interrupted finish clears only the rolling window, so the saved tail survives and the next reply's first captureFrame() will replay stale audio from the previous utterance before the new audio instead of resuming it in the original segment.

Useful? React with 👍 / 👎.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

Open in Devin Review

Comment on lines +546 to +548
if (interrupted) {
this.replayFrames = [];
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Unplayed audio tail from a previous response can leak into the next response

Saved replay frames are not discarded when a segment finishes without interruption (waitForPlayoutTask at agents/src/voice/room_io/_output.ts:546), so stale audio from one response can replay at the start of the next.

Impact: A listener may briefly hear the tail of the previous agent response at the beginning of the next response, causing a noticeable audio glitch.

Trigger: false interruption during playout after TTS flush
  1. TTS finishes generating, all frames are pushed via captureFrame, and flush() starts waitForPlayoutTask which awaits playout.
  2. While audio drains, a false interruption fires: pause() captures the unplayed tail into replayFrames (_output.ts:419-427) and calls clearQueue() (_output.ts:430).
  3. The queue is now empty, so waitForPlayout() resolves. waitForPlayoutTask sees interrupted=false (no clearBuffer was called) and skips the replayFrames = [] branch (_output.ts:546-548).
  4. recentFrames is cleared but replayFrames retains stale frames from the finished segment.
  5. When the next segment's first captureFrame runs, it replays the stale frames (_output.ts:465-472) before pushing the new audio, injecting old content into the new response.

The conditional clear at _output.ts:546-548 should be unconditional — once a segment finishes (regardless of how), leftover replay frames are no longer valid.

Suggested change
if (interrupted) {
this.replayFrames = [];
}
this.replayFrames = [];
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +140 to 144
// Match Python (_output.py: queue_size_ms=200). The rtc-node AudioSource
// default is 1000ms; a smaller prebuffer keeps the playout queue close to
// realtime so interruptions take effect promptly.
queueSizeMs: 200,
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Behavioral change: default audio queue reduced from 1000ms to 200ms

The PR changes DEFAULT_ROOM_OUTPUT_OPTIONS.queueSizeMs from undefined (which fell through to the rtc-node AudioSource default of 1000ms) to 200 at agents/src/voice/room_io/room_io.ts:143. This is a 5x reduction in the audio prebuffer. The comment says it matches the Python SDK (_output.py: queue_size_ms=200). While this helps interruptions take effect more promptly, it also reduces the tolerance for TTS timing jitter — if TTS frames arrive in bursts with gaps >200ms, the smaller queue is more likely to underrun, potentially causing audible gaps. Existing users who relied on the 1000ms default may notice different behavior.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@toubatbrian

Copy link
Copy Markdown
Contributor Author

Closing in favor of a fresh branch (brian/warm-transfer-fix) rebased on the 1.5.0 js-ification work. Same fix will be re-applied there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Node WarmTransferTask post-merge cleanup calls getJobContext after job context is gone

2 participants