Skip to content

fix: enable longform chunking for GigaAM via VadChunked#1362

Closed
android6 wants to merge 1 commit intocjpais:mainfrom
android6:fix/gigaam-longform-chunking
Closed

fix: enable longform chunking for GigaAM via VadChunked#1362
android6 wants to merge 1 commit intocjpais:mainfrom
android6:fix/gigaam-longform-chunking

Conversation

@android6
Copy link
Copy Markdown

@android6 android6 commented May 2, 2026

Summary

Wires up the existing VadChunked utility from transcribe-rs for GigaAM-v3 to fix gibberish output and silent failures on recordings >60 seconds. No changes to transcribe-rs — uses its public API. UI unchanged. Tested on a 4:25 Russian recording (previously a deterministic silent fail) — now produces a coherent transcript.

How it works

Before:

Handy → gigaam_engine.transcribe(audio)
                       ↓
                       single ONNX inference over the full audio buffer
                       ↓
                       gibberish (>60s) or silent fail (>4min)

After (this PR):

Handy → chunker.transcribe(gigaam_engine, audio)
                ↓
                split audio at silence boundaries (Silero VAD)
                ↓
                for each chunk (≤30s):
                    gigaam_engine.transcribe(chunk)
                ↓
                join results with " "
                ↓
                coherent transcript

GigaAM-v3 was trained on segments ≤25-30 s. The new logic lives entirely between Handy and the model — using the existing VadChunked from transcribe-rs. VadChunked has been available in transcribe-rs since 0.3.4; the current default 0.3.8 already includes it. Whisper has its own chunking in C++, so only GigaAM is affected.

Changes

  • src-tauri/Cargo.toml: enable the vad-silero feature on transcribe-rs (the Silero VAD model is already shipped with Handy).
  • src-tauri/src/managers/transcription.rs:
    • LoadedEngine::GigaAM now holds (GigaAMModel, VadChunked).
    • On GigaAM load, builds the pipeline SileroVad → SmoothedVad → VadChunked with parameters tuned for the longform CTC use-case.
    • On transcribe, calls chunker.transcribe(model, &audio) instead of model.transcribe(...) directly.
  • Whisper, Parakeet, Moonshine, SenseVoice, Canary, Cohere — all unaffected.

@cjpais
Copy link
Copy Markdown
Owner

cjpais commented May 3, 2026

#1173

@cjpais cjpais closed this May 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants