AI-native macOS menu-bar dictation for developer text.
Muninn records speech, transcribes it, runs the transcript through a configurable text pipeline, and injects the final text into the active app. The default pipeline is designed for code-adjacent dictation: commands, flags, package names, file paths, environment variables, acronyms, and other tokens that general-purpose dictation often changes.
- What Muninn does
- Quick start
- Install and run
- Configure Muninn
- Transcription providers
- Pipeline model
- Streaming transcription
- Contextual profiles and voices
- External control
- Privacy, replay, and debugging
- Development
- Current limits
- Source map
Default recorded-mode flow:
hotkey or tray click
-> record temporary WAV
-> resolve transcription provider route
-> transcribe with the first usable provider
-> run the refine step
-> run optional external filters
-> inject final text into the active appMuninn includes:
- a macOS menu-bar app with a live tray indicator
- global hotkeys for push-to-talk, done-mode toggle, and cancel
- microphone capture to a temporary WAV, defaulting to 16 kHz mono
- a local-first transcription route across Apple Speech, whisper.cpp, Deepgram, OpenAI, and Google recorded transcription
- an optional streaming mode for providers that support live transcription in this codebase
- a built-in
refinestep that applies a conservative developer-dictation prompt - external Unix filter support for custom pipeline steps
- keyboard-event text injection into the current app
- optional external control through
muninn://URLs and a localhost MCP server - optional replay artifacts for debugging utterances
Default controls:
| Action | Default |
|---|---|
| Push-to-talk | ctrl with double_tap trigger and a 300 ms double-tap window |
| Done-mode toggle | ctrl + shift + d |
| Cancel active capture | ctrl + shift + x |
| Tray left click | Toggle: start when idle, stop when recording |
Hotkey changes are parsed from config, but live config reload does not replace active hotkey bindings. Restart Muninn after changing hotkeys.
Use this path when you want to run Muninn from this repository.
- macOS
- Rust 1.88.0 or newer
- Xcode command line tools for local builds
- macOS permissions for Microphone, Accessibility, and Input Monitoring
- Optional cloud provider keys when you use Deepgram, OpenAI, Google, or the default OpenAI-backed
refinestep
cargo build --release --bin muninnMuninn reads config in this order:
MUNINN_CONFIG$XDG_CONFIG_HOME/muninn/config.toml~/.config/muninn/config.toml
If the resolved config file is missing, Muninn creates a launchable default config. To start from the sample config instead:
CONFIG_DIR="${XDG_CONFIG_HOME:-$HOME/.config}/muninn"
mkdir -p "$CONFIG_DIR"
cp configs/config.sample.toml "$CONFIG_DIR/config.toml"Muninn loads ./.env from the current working directory by default. Existing shell environment variables override .env and config values.
Create .env only with the keys you use:
OPENAI_API_KEY=<OPENAI_API_KEY>
DEEPGRAM_API_KEY=<DEEPGRAM_API_KEY>
GOOGLE_API_KEY=<GOOGLE_API_KEY>
GOOGLE_STT_TOKEN=<GOOGLE_STT_TOKEN>Set MUNINN_LOAD_DOTENV=0, false, or no to disable .env loading.
cargo run --release --bin muninnExpected result: Muninn appears in the macOS menu bar with an M tray indicator.
Grant these permissions to Muninn itself:
| Permission | Why Muninn needs it | System Settings path |
|---|---|---|
| Microphone | Record your speech | Privacy & Security > Microphone |
| Accessibility | Inject final text into the active app | Privacy & Security > Accessibility |
| Input Monitoring | Listen for global hotkeys while another app is active | Privacy & Security > Input Monitoring |
To verify the app:
- Focus a text field in another app.
- Click the Muninn tray icon to start recording.
- Speak a short phrase.
- Click the tray icon again to stop recording.
Expected result: Muninn transcribes the utterance, runs the pipeline, and types the final text into the focused app.
If macOS stops showing a permission prompt, reset the affected TCC service and relaunch Muninn:
tccutil reset ListenEvent
tccutil reset Accessibility
tccutil reset Microphonecargo install muninn-speech-to-text
muninnThe package name is muninn-speech-to-text; the binary name is muninn.
MUNINN_CONFIG="$PWD/configs/config.sample.toml" cargo run --release --bin muninnThis is useful for local development because it avoids changing your user config.
The release workflow builds tar archives for:
aarch64-apple-darwinx86_64-apple-darwin
After extracting a release archive, keep the binary at a stable path before granting macOS permissions:
mkdir -p "$HOME/.local/bin"
mv muninn "$HOME/.local/bin/muninn"
chmod +x "$HOME/.local/bin/muninn"
"$HOME/.local/bin/muninn"macOS permissions attach to the exact app or binary identity. Moving or replacing a raw binary can require granting permissions again.
Use the app bundle when you want a stable app identity, muninn:// URL handling, and normal Login Items behavior.
cargo build --release --bin muninn
bash scripts/package-macos-app.sh
open dist/Muninn.appThe packaging script creates dist/Muninn.app, signs it ad hoc by default, and creates dist/Muninn.app.zip when ditto is available. Set CODESIGN_IDENTITY to use a Developer ID certificate, or set CODESIGN_APP=0 to skip signing.
Recommended app-bundle setup:
- Move
dist/Muninn.appto/Applications/Muninn.app. - Launch it once and grant permissions to
Muninn. - Add it under System Settings > General > Login Items.
- Keep
[app].autostart = falsewhen using Login Items.
Finder and Login Items do not inherit your shell environment. Store credentials in config or make sure Muninn's working directory contains the .env file you expect it to read.
Set [app].autostart = true to let Muninn write a LaunchAgent for the current executable path.
Behavior:
- Muninn writes
~/Library/LaunchAgents/com.bnomei.muninn.plistwhen it starts or reloads config. - Changes take effect on the next macOS login.
- The LaunchAgent includes
MUNINN_CONFIG. - The LaunchAgent does not inherit interactive shell exports.
- When using
Muninn.app, prefer macOS Login Items over this raw-binary LaunchAgent path.
The canonical sample is configs/config.sample.toml. The root schema lives in src/config.rs.
| Section | Purpose |
|---|---|
[app] |
Default profile, strict step contract, raw-binary autostart |
[hotkeys.*] |
Push-to-talk, done-mode toggle, and cancel bindings |
[indicator] |
Tray indicator visibility and colors |
[recording] |
WAV capture format, default mono = true and sample_rate_khz = 16 |
[transcription] |
Recorded versus streaming mode and ordered provider route |
[pipeline] |
Pipeline deadline, payload format, and post-transcription steps |
[transcript] |
Base prompt and prompt append text for the built-in refine step |
[refine] |
OpenAI-compatible refine endpoint, model, temperature, and guardrails |
[voices.*] |
Named refine behavior and optional one-letter tray glyph |
[profiles.*] |
Context-specific overrides for recording, route, pipeline, transcript, or refine |
[[profile_rules]] |
Ordered matchers for the frontmost app and window title |
[external_control] |
URL scheme and MCP recording-control settings |
[logging] |
Replay artifacts, retention, and debug detail |
[providers.*] |
Provider credentials, endpoints, models, and streaming settings |
The default provider route is local-first:
[transcription]
providers = ["apple_speech", "whisper_cpp", "deepgram", "openai", "google"]Profiles can override only the route:
[profiles.mail.transcription]
providers = ["deepgram", "openai", "google"]If you still have explicit stt_* steps in pipeline.steps, Muninn accepts them and infers the route from that order. New configs should prefer [transcription].providers.
Each pipeline step has:
idcmd- optional
args - optional
io_mode timeout_mson_error
Supported io_mode values:
| Value | Behavior |
|---|---|
auto |
Built-ins use envelope JSON; external commands default to text filtering |
envelope_json |
Step reads and writes the full JSON envelope |
text_filter |
Step reads transcript text and writes replacement text |
Supported on_error values:
| Value | Behavior |
|---|---|
continue |
Keep the previous envelope and run later steps |
fallback_raw |
Substitute transcript.raw_text and continue |
abort |
Stop the pipeline and surface the failure |
Example:
[transcription]
providers = ["apple_speech", "whisper_cpp", "deepgram", "openai", "google"]
[[pipeline.steps]]
id = "refine"
cmd = "refine"
timeout_ms = 2500
on_error = "continue"
[[pipeline.steps]]
id = "uppercase"
cmd = "/usr/bin/tr"
args = ["[:lower:]", "[:upper:]"]
timeout_ms = 250
on_error = "continue"transcript.system_prompt and transcript.system_prompt_append steer the built-in refine step. They do not change the speech-to-text provider, and Muninn does not parse appended JSON into provider-native adaptation APIs.
[transcript]
system_prompt = "Prefer minimal corrections. Focus on technical terms, developer tools, package names, commands, flags, file names, paths, env vars, acronyms, and obvious dictation errors. If uncertain, keep the original wording."
system_prompt_append = """
Vocabulary JSON:
{"terms":["Muninn","whisper.cpp","Deepgram","Cargo.toml"],"commands":["cargo test --all-targets","rg --files"],"paths":["src/config.rs",".env"]}
"""| Provider | Recorded mode | Streaming mode | Credentials | Notes |
|---|---|---|---|---|
| Apple Speech | Yes | No | None | Local macOS 26+ provider. Uses Apple-managed Speech assets for the selected locale. |
| whisper.cpp | Yes | No | None | Local provider. Defaults to tiny.en, stored under ~/.local/share/muninn/models, with device = "auto". |
| Deepgram | Yes | Yes | DEEPGRAM_API_KEY or providers.deepgram.api_key |
Recorded uploads use /v1/listen; streaming uses the live WebSocket API. |
| OpenAI | Yes | Yes | OPENAI_API_KEY or providers.openai.api_key |
Recorded uploads are preflighted against OpenAI's 25 MB audio limit; streaming uses Realtime transcription. |
| Yes | Not currently callable | GOOGLE_API_KEY, GOOGLE_STT_TOKEN, or config values |
Recorded REST transcription works through the configured endpoint. The Google streaming adapter builds Speech-to-Text v2 requests, but the pinned google-cloud-speech-v2 1.12.0 dependency does not expose a callable streaming RPC, so Muninn reports google_official_client_streaming_rpc_unavailable. |
The refine step is not an STT provider. It uses the [refine] config and OpenAI-compatible chat completions by default.
| Concern | Variables |
|---|---|
| Config path | MUNINN_CONFIG |
.env loading |
MUNINN_LOAD_DOTENV |
| Deepgram | DEEPGRAM_API_KEY, DEEPGRAM_STT_ENDPOINT, DEEPGRAM_STT_MODEL, DEEPGRAM_STT_LANGUAGE, MUNINN_DEEPGRAM_STUB_TEXT |
| OpenAI transcription and refine | OPENAI_API_KEY, MUNINN_OPENAI_STUB_TEXT, MUNINN_REFINE_STUB_TEXT |
| Google recorded transcription | GOOGLE_API_KEY, GOOGLE_STT_TOKEN, GOOGLE_STT_ENDPOINT, GOOGLE_STT_MODEL, MUNINN_GOOGLE_STUB_TEXT |
Stub variables are intended for local smoke checks and tests. They bypass live provider calls for the matching step.
Default behavior:
providers.whisper_cpp.modelunset resolves totiny.entiny.enresolves toggml-tiny.en.bin- default model directory is
~/.local/share/muninn/models - Muninn auto-downloads known canonical models on first use
- explicit custom model paths must already exist
device = "auto"uses Metal on supported Apple Silicon builds and CPU otherwise
Pre-warm the default model cache:
mkdir -p "$HOME/.local/share/muninn/models"
curl -L \
-o "$HOME/.local/share/muninn/models/ggml-tiny.en.bin" \
"https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.en.bin"If a local-only route points at a missing custom model, Muninn records a missing_whisper_cpp_model diagnostic and injects nothing unless another provider later produces transcript.raw_text.
Muninn passes an envelope through every built-in and external step. Built-in STT steps fill transcript.raw_text; transform steps such as refine write output.final_text. Injection prefers output.final_text and can fall back to transcript.raw_text.
Built-in step commands:
| Command | Purpose |
|---|---|
stt_apple_speech |
Completed-recording Apple Speech transcription |
stt_whisper_cpp |
Completed-recording local whisper.cpp transcription |
stt_deepgram |
Completed-recording Deepgram transcription |
stt_openai |
Completed-recording OpenAI transcription |
stt_google |
Completed-recording Google REST transcription |
refine |
OpenAI-compatible developer-dictation cleanup |
Run a built-in step directly for smoke checks:
cargo run -q -- __internal_step <stt_apple_speech|stt_whisper_cpp|stt_deepgram|stt_openai|stt_google|refine>Use the JSON fixtures in tests/fixtures for example input envelopes.
Recorded mode is the default. Enable streaming explicitly:
[transcription]
mode = "streaming"
providers = ["deepgram", "openai"]
[transcription.streaming]
frame_ms = 100
finish_timeout_ms = 10000
fallback_to_recorded_on_error = trueStreaming behavior:
- Deepgram streaming sends mono LINEAR16 audio over WebSocket.
- OpenAI streaming uses Realtime transcription and forces 24 kHz mono capture for that utterance.
- Google streaming is not currently callable because the pinned
google-cloud-speech-v21.12.0 dependency exposes request and response types but no callable streaming method. - Muninn still writes the completed WAV during streaming.
- When streaming fails and
fallback_to_recorded_on_error = true, Muninn can run the completed-WAV route. - A successful streaming transcript seeds
transcript.raw_text;refine, scoring, replay, and injection use the same downstream pipeline as recorded mode. - Interim streaming results are transient. Muninn does not show a partial transcript UI or persist partial transcript history.
Muninn can change refine behavior based on the frontmost app. It captures the bundle id, app name, and a best-effort window title, then applies the first matching profile_rules entry. If no rule matches, behavior falls back to [app].profile; the idle tray glyph falls back to M.
Resolution order:
- Start from the base config.
- Apply the matched voice, if the matched profile names one.
- Apply profile overrides last.
Voice means text-shaping behavior plus an optional tray glyph, not an audio voice.
[app]
profile = "default"
[voices.codex]
indicator_glyph = "C"
system_prompt = "Prefer terse developer dictation. Keep commands, flags, file names, and code tokens intact."
system_prompt_append = """
Vocabulary JSON:
{"terms":["Codex","Muninn","Cargo.toml"],"commands":["cargo test --all-targets","cargo clippy --all-targets -- -D warnings"]}
"""
[voices.terminal]
indicator_glyph = "T"
system_prompt = "Preserve shell commands exactly. Prefer minimal punctuation changes."
[profiles.codex]
voice = "codex"
[profiles.terminal]
voice = "terminal"
[[profile_rules]]
id = "codex-app"
profile = "codex"
app_name = "Codex"
[[profile_rules]]
id = "terminal-app"
profile = "terminal"
bundle_id = "com.apple.Terminal"Tray behavior:
- idle preview shows the glyph for the matched voice, or
M - recording and processing freeze the resolved glyph for that utterance
?is reserved for missing-credentials feedback
Muninn can be driven by agents and scripts through two transports:
muninn://URL scheme, available for the packaged macOS.app- localhost streamable-HTTP MCP server, disabled by default
Both transports use the same recording-control vocabulary as tray and hotkey events.
[external_control]
url_scheme_enabled = true
mcp_enabled = false
start_recording_enabled = false
mcp_bind_address = "127.0.0.1:2769"Action semantics:
| Action | Behavior |
|---|---|
start |
Starts recording only when idle and start_recording_enabled = true |
stop |
Stops an active recording and runs the pipeline; no-op when idle |
toggle |
Starts when idle and allowed; otherwise stops an active recording |
cancel |
Discards an active recording without transcription or injection |
External start is disabled by default because it starts microphone capture. Enabling start_recording_enabled = true is the local trust decision for configured agents and scripts.
The packaged .app registers muninn:// through CFBundleURLTypes.
| URL | Action |
|---|---|
muninn://record, muninn://start |
start |
muninn://stop, muninn://done |
stop |
muninn://toggle |
toggle |
muninn://cancel, muninn://abort |
cancel |
open "muninn://record"A binary launched with cargo run does not receive these LaunchServices links.
When mcp_enabled = true, Muninn serves MCP at:
http://127.0.0.1:2769/mcpTools:
get_statusstart_recordingstop_recordingcancel_recording
Example registration with an MCP-aware client:
auggie mcp add muninn --transport http --url http://127.0.0.1:2769/mcpget_status is read-only and returns JSON like:
{
"state": "idle",
"recording_active": false,
"busy": false,
"permissions": {
"microphone": "granted",
"accessibility": "granted",
"input_monitoring": "granted"
}
}state is one of idle, recording_active, permission_blocked, already_running, or failed.
Security constraints:
- The MCP server has no authentication.
mcp_bind_addressmust be an explicit loopback socket address such as127.0.0.1:2769or[::1]:2769.- Muninn refuses wildcard, LAN, hostname, and other non-loopback binds.
- The MCP server starts only at app launch. Changing
mcp_enabledlater requires restarting Muninn.
Tracing logs go to stderr and are controlled with RUST_LOG.
RUST_LOG=recording=debug cargo run --release --bin muninnReplay logging is disabled by default. When enabled:
replay_detail = "minimal"stores sparse utterance metadata onlyreplay_detail = "full_debug"stores redacted config, target context, final envelopes, pipeline outcome, refine context, and injection routereplay_retain_audio = truekeeps audio only whenreplay_detail = "full_debug"- retained audio uses a hard link when possible and falls back to a copy
- full-debug snapshots redact provider secrets and prompt fields
- replay artifacts are for inspection, not re-run
[logging]
replay_enabled = true
replay_detail = "minimal"
replay_retain_audio = false
replay_dir = "~/.local/state/muninn/replay"
replay_retention_days = 7
replay_max_bytes = 52428800Common recovery checks:
| Symptom | Check |
|---|---|
| Hotkey does not start recording | Grant Input Monitoring to Muninn and restart after changing hotkey config |
| Tray click records but hotkey does not | Input Monitoring is missing or the hotkey listener needs restart |
| Text is not injected | Grant Accessibility to Muninn |
| No text is injected after a local-only Whisper route | Check for missing_whisper_cpp_model and verify the configured model path |
| External MCP start is rejected | Set external_control.start_recording_enabled = true and restart if the MCP server was not enabled at launch |
| Google streaming falls back or reports unavailable | Use recorded Google transcription, Deepgram streaming, or OpenAI streaming |
Run the core checks:
cargo fmt --all -- --check
cargo clippy --all-targets --all-features -- -D warnings
cargo test --all-targetsThe repository also includes prek hooks:
prek validate-config
prek run --all-files
prek installRun the benchmark suite:
cargo bench --bench runtime_bottlenecksFilter to one benchmark group:
cargo bench --bench runtime_bottlenecks pipeline_runner
cargo bench --bench runtime_bottlenecks replay_persistThe benchmark target focuses on per-utterance latency paths that do not require network calls:
- audio output transform and resampling
- envelope JSON round trips
- Google request-body construction
- profile and voice resolution
- replacement scoring
- in-process pipeline runner overhead
- replay persistence with and without retained audio artifacts
- Muninn's supported runtime is macOS.
- Apple Speech requires macOS 26+ and Apple-managed Speech assets.
- whisper.cpp and Apple Speech are completed-recording providers only.
- Google streaming request construction exists, but live Google streaming is not callable until the pinned official client exposes a streaming RPC.
- Streaming mode uses provider final text only. There is no partial transcript UI.
- Replay artifacts are for inspection, not deterministic replay.
- Provider-backed transcription needs realistic timeout budgets.
- The external-control MCP server has no authentication, is disabled by default, binds loopback-only, and starts only at app launch.
- The repository release workflow packages raw binaries; use the local packaging script when you need a
.appbundle.
Use these files when checking README claims against source:
