Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/assemblyai-mode-silence.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
'@livekit/agents-plugin-assemblyai': patch
---

Respect AssemblyAI `mode` presets when defaulting turn silence and remap deprecated `u3-pro` to `universal-3-5-pro`.
2 changes: 1 addition & 1 deletion plugins/assemblyai/src/models.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ export type STTModels =
| 'u3-rt-pro'
| 'u3-rt-pro-beta-1'
| 'universal-3-5-pro'
// Deprecated alias — AssemblyAI maps this to `u3-rt-pro` server-side, but the
// Deprecated alias — AssemblyAI maps this to `universal-3-5-pro`, but the
// Python plugin emits a warning and rewrites it. Kept here so TS users don't
// break if they already pass it.
| 'u3-pro';
Expand Down
25 changes: 14 additions & 11 deletions plugins/assemblyai/src/stt.ts

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 updateOptions does not re-derive minTurnSilence when mode changes post-construction

If a user constructs new STT() (no mode → constructor defaults minTurnSilence to 100) and later calls updateOptions({ mode: 'balanced' }), the baked-in minTurnSilence=100 persists because updateOptions (plugins/assemblyai/src/stt.ts:181-190) just spreads new opts over old ones without re-running the constructor's conditional defaulting logic. On the next WebSocket reconnect, min_turn_silence=100 will still be sent, potentially overriding the server's mode-based silence tuning. This is a pre-existing design limitation (updateOptions never re-derived dependent defaults), but the new mode-awareness makes the interaction slightly more surprising. Practically, mode is documented as connect-time only, so changing it via updateOptions is an unusual path.

(Refers to lines 181-190)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ import type { RawData } from 'ws';
import { WebSocket } from 'ws';
import type { STTEncoding, STTModels, VoiceFocus } from './models.js';

// Speech models in the Universal-3 Pro family, which share the same parameter support.
const U3_PRO_MODELS = ['u3-rt-pro', 'u3-rt-pro-beta-1', 'universal-3-5-pro'] as const;

function isU3ProModel(model: STTModels): boolean {
Expand Down Expand Up @@ -72,11 +73,11 @@ export interface STTOptions {
maxTurnSilence?: number;
formatTurns?: boolean;
keytermsPrompt?: string[];
/** Only supported with the `u3-rt-pro` model family. */
/** Only supported with the Universal-3 Pro model family. */
prompt?: string;
/** Only supported with the `u3-rt-pro` model family. */
/** Only supported with the Universal-3 Pro model family. */
agentContext?: string;
/** Only supported with the `u3-rt-pro` model family. Set at connection time only. */
/** Only supported with the Universal-3 Pro model family. Set at connection time only. */
previousContextNTurns?: number;
vadThreshold?: number;
/**
Expand All @@ -95,8 +96,8 @@ export interface STTOptions {
/** Background audio suppression aggressiveness, from 0.0 to 1.0. Connect-time only. */
voiceFocusThreshold?: number;
/**
* Accuracy/latency preset for u3-rt-pro: `min_latency`, `balanced`, or `max_accuracy`.
* Explicit silence, partials, or VAD options still take precedence over mode defaults.
* Accuracy/latency preset for the Universal-3 Pro model family: `min_latency`, `balanced`,
* or `max_accuracy`. Explicit turn-silence values still take precedence over mode defaults.
*/
mode?: 'min_latency' | 'balanced' | 'max_accuracy';
baseUrl: string;
Expand Down Expand Up @@ -132,8 +133,8 @@ export class STT extends stt.STT {
});

if (opts.speechModel === 'u3-pro') {
log().warn("'u3-pro' is deprecated, use 'u3-rt-pro' instead.");
opts.speechModel = 'u3-rt-pro';
log().warn("'u3-pro' is deprecated, use 'universal-3-5-pro' instead.");
opts.speechModel = 'universal-3-5-pro';
}

const speechModel = opts.speechModel ?? defaultSTTOptions.speechModel;
Expand Down Expand Up @@ -161,8 +162,8 @@ export class STT extends stt.STT {
);
}

// Minimize latency; matches LK's end-of-turn detector well.
const minTurnSilence = opts.minTurnSilence ?? 100;
// Minimize latency by default, but let AssemblyAI's mode preset control silence tuning.
const minTurnSilence = opts.minTurnSilence ?? (opts.mode === undefined ? 100 : undefined);

this.#opts = {
...defaultSTTOptions,
Expand Down Expand Up @@ -296,11 +297,13 @@ export class SpeechStream extends stt.SpeechStream {
}

async #connectWS(): Promise<WebSocket> {
// u3-rt-pro family models default both min and max silence to 100ms when unset.
// Universal-3 Pro family models default both min and max silence to 100ms when unset.
// When a mode preset is selected, leave them unset unless explicitly provided so the
// server's per-mode silence tuning is not overridden by the latency-optimized default.
let minSilence = this.#opts.minTurnSilence;
let maxSilence = this.#opts.maxTurnSilence;
if (isU3ProModel(this.#opts.speechModel)) {
if (minSilence === undefined) minSilence = 100;
if (minSilence === undefined && this.#opts.mode === undefined) minSilence = 100;
if (maxSilence === undefined) maxSilence = minSilence;
}

Expand Down
Loading