feat: add custom transcription prompt setting for Whisper models by egsok · Pull Request #1227 · cjpais/Handy

egsok · 2026-04-05T09:49:16Z

Problem

Whisper often drops punctuation entirely, producing a wall of unformatted text — especially for non-English languages. This is a well-known issue in the community:

openai/whisper#557 openai/whisper#194 — punctuation loss discussions
OpenAI Whisper Prompting Guide — official guidance on using initial_prompt to steer style

The documented solution: pass a well-punctuated paragraph as initial_prompt. Whisper doesn't follow instructions — it copies the style of the prompt. A paragraph full of commas, question marks, and em-dashes nudges the decoder to keep producing punctuation.

What I tested

I tested this in my fork with a hardcoded Russian punctuation prompt, and it works remarkably well. Russian transcriptions went from completely unpunctuated maybe 5% of the time to basically always properly formatted output with commas, periods, question marks, em-dashes, and quotation marks — consistently, across different recording lengths.

Why not just hardcode a default prompt

The prompt also acts as a language hint. A hardcoded prompt in one language biases Whisper's language detector — e.g., an English prompt causes Russian speech to be transcribed as English when the language is set to "Auto". So pre-filling a default prompt would break the experience for anyone using auto-detection with a non-English language.

Worth noting: a prompt in one language doesn't prevent Whisper from recognizing words in another. For example, a Russian prompt with language set to "Auto" works fine for mixed Russian/English speech — English words are still transcribed correctly.

In the future, we could consider auto-populating the prompt based on the selected transcription language — but for now, empty by default is the safe choice.

Solution

A new Transcription Prompt setting under Settings → Advanced that lets users provide a sample text to guide Whisper's output style.

Key design decisions:

Empty by default — no language bias introduced for users who don't need it. When Transcription Language is set to "Auto", a non-empty prompt in a specific language can reduce language detection accuracy, so opting in is intentional.
10-language preset dropdown (EN, ES, FR, DE, PT, IT, RU, JA, ZH-CN, ZH-TW) — each preset uses native punctuation conventions (Russian «ёлочки», German „Gänsefüßchen", French « guillemets », Japanese 「括弧」, Chinese ""引号""). Users can also write their own prompt.
Token-aware budget — Whisper's initial_prompt window is 224 tokens. Custom Words are prepended and share that budget, so the prompt is capped at 112 estimated tokens (half). A per-script token estimator (CJK ~2.2 tok/char, Cyrillic ~0.5, Latin ~0.25) enforces the limit in real time. A progress bar with color coding (gray → yellow at 80% → red at 95%) replaces the old character counter. A hint below the bar explains the shared budget: "a shorter prompt leaves more room for custom words."
Whisper-only — this setting only affects Whisper models. When a non-Whisper model is selected (Parakeet, GigaAM, Moonshine, Canary, Cohere, SenseVoice), a warning is shown. The setting remains visible and persisted so users don't lose their prompt when switching models.

Changes

Backend (Rust):

settings.rs — new transcription_prompt: Option<String> field
transcription.rs — concatenate prompt after custom words into initial_prompt. Custom prompt is placed last — Whisper truncates from the left, so dictionary words (lower priority) get truncated first
shortcut/mod.rs — new update_transcription_prompt command
lib.rs — register the command

Frontend (TypeScript/React):

New TranscriptionPrompt.tsx component with preset dropdown, textarea, token-aware progress bar, and contextual warnings
AdvancedSettings.tsx — include the new component
settingsStore.ts — wire up the setting to the backend command
bindings.ts — add transcription_prompt to AppSettings and updateTranscriptionPrompt command
en/translation.json — all user-facing strings

Test plan

Token-aware budget (112 tokens, half of Whisper's 224-token initial_prompt window) with per-script estimator (CJK ~2.2, Cyrillic ~0.5, Latin ~0.25). 10-language presets with native punctuation conventions, shortened to ≤69% of budget to leave room for Custom Words. Progress bar with color coding (gray → yellow at 80% → red at 95%) and shared-budget hint. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cjpais · 2026-04-07T00:59:08Z

I think this is probably a good idea. I'm just not 100% sure about the UI for this yet.

egsok force-pushed the pr/transcription-prompt branch from b3e8c4f to d32fbfa Compare April 5, 2026 11:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add custom transcription prompt setting for Whisper models#1227

feat: add custom transcription prompt setting for Whisper models#1227
egsok wants to merge 1 commit intocjpais:mainfrom
egsok:pr/transcription-prompt

egsok commented Apr 5, 2026 •

edited

Loading

Uh oh!

cjpais commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

egsok commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

What I tested

Why not just hardcode a default prompt

Solution

Changes

Test plan

Uh oh!

cjpais commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

egsok commented Apr 5, 2026 •

edited

Loading