feat: add custom transcription prompt setting for Whisper models#1227
Open
egsok wants to merge 1 commit intocjpais:mainfrom
Open
feat: add custom transcription prompt setting for Whisper models#1227egsok wants to merge 1 commit intocjpais:mainfrom
egsok wants to merge 1 commit intocjpais:mainfrom
Conversation
Token-aware budget (112 tokens, half of Whisper's 224-token initial_prompt window) with per-script estimator (CJK ~2.2, Cyrillic ~0.5, Latin ~0.25). 10-language presets with native punctuation conventions, shortened to ≤69% of budget to leave room for Custom Words. Progress bar with color coding (gray → yellow at 80% → red at 95%) and shared-budget hint. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
b3e8c4f to
d32fbfa
Compare
Owner
|
I think this is probably a good idea. I'm just not 100% sure about the UI for this yet. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Whisper often drops punctuation entirely, producing a wall of unformatted text — especially for non-English languages. This is a well-known issue in the community:
initial_promptto steer styleThe documented solution: pass a well-punctuated paragraph as
initial_prompt. Whisper doesn't follow instructions — it copies the style of the prompt. A paragraph full of commas, question marks, and em-dashes nudges the decoder to keep producing punctuation.What I tested
I tested this in my fork with a hardcoded Russian punctuation prompt, and it works remarkably well. Russian transcriptions went from completely unpunctuated maybe 5% of the time to basically always properly formatted output with commas, periods, question marks, em-dashes, and quotation marks — consistently, across different recording lengths.
Why not just hardcode a default prompt
The prompt also acts as a language hint. A hardcoded prompt in one language biases Whisper's language detector — e.g., an English prompt causes Russian speech to be transcribed as English when the language is set to "Auto". So pre-filling a default prompt would break the experience for anyone using auto-detection with a non-English language.
Worth noting: a prompt in one language doesn't prevent Whisper from recognizing words in another. For example, a Russian prompt with language set to "Auto" works fine for mixed Russian/English speech — English words are still transcribed correctly.
In the future, we could consider auto-populating the prompt based on the selected transcription language — but for now, empty by default is the safe choice.
Solution
A new Transcription Prompt setting under Settings → Advanced that lets users provide a sample text to guide Whisper's output style.
Key design decisions:
initial_promptwindow is 224 tokens. Custom Words are prepended and share that budget, so the prompt is capped at 112 estimated tokens (half). A per-script token estimator (CJK ~2.2 tok/char, Cyrillic ~0.5, Latin ~0.25) enforces the limit in real time. A progress bar with color coding (gray → yellow at 80% → red at 95%) replaces the old character counter. A hint below the bar explains the shared budget: "a shorter prompt leaves more room for custom words."Changes
Backend (Rust):
settings.rs— newtranscription_prompt: Option<String>fieldtranscription.rs— concatenate prompt after custom words intoinitial_prompt. Custom prompt is placed last — Whisper truncates from the left, so dictionary words (lower priority) get truncated firstshortcut/mod.rs— newupdate_transcription_promptcommandlib.rs— register the commandFrontend (TypeScript/React):
TranscriptionPrompt.tsxcomponent with preset dropdown, textarea, token-aware progress bar, and contextual warningsAdvancedSettings.tsx— include the new componentsettingsStore.ts— wire up the setting to the backend commandbindings.ts— addtranscription_prompttoAppSettingsandupdateTranscriptionPromptcommanden/translation.json— all user-facing stringsTest plan