Skip to content

settings: Add ORT thread count setting with auto-tune benchmark.#1120

Open
andrewleech wants to merge 3 commits intocjpais:mainfrom
andrewleech:feat/ort-thread-tuning
Open

settings: Add ORT thread count setting with auto-tune benchmark.#1120
andrewleech wants to merge 3 commits intocjpais:mainfrom
andrewleech:feat/ort-thread-tuning

Conversation

@andrewleech
Copy link
Copy Markdown

@andrewleech andrewleech commented Mar 23, 2026

Summary

ORT's default of "use all cores" isn't always the fastest for ONNX inference — on my Zen 5 machine, 6 threads gives 7.1x realtime vs ~6.7x at 16 threads for Qwen3 0.6B Int4. But finding the optimal count requires manual trial-and-error with no feedback.

This adds two things:

  1. Thread count setting — a number input (0-32, where 0 = auto) in Advanced → Experimental Features that controls transcribe_rs::accel::set_ort_intra_threads. Changing it unloads the current model so the next transcription picks up the new session configuration.

  2. Auto-tune benchmark — an "Auto" button next to the input that opens a modal dialog, transcribes real recordings from the user's history at different thread counts, and shows the results. It tests a coarse grid (1, 2, 4, 6, 8, 12, 16, 20, 24, 32), starting from mid-range for quick initial feedback. On completion it shows a table of all results with RTF (real-time factor) so the user can see the performance curve and apply the best value.

handy_8C0jgYuQDw handy_bMKBkPQRsQ handy_Ig4Kt31dpp

The benchmark:

  • Selects recordings from history totalling ≥15 seconds (longest-first)
  • Auto-loads the selected model if it was idle-unloaded
  • Tests mid-range first, then max, then 1, then fills the remaining grid
  • Supports cancellation between trials
  • Uses RAII to guarantee the original thread count is restored on completion or panic
  • Shows all trial results in a table with the best highlighted

Depends on transcribe-rs feat/ort-thread-count (PR forthcoming). The DROP commit pins to this branch and should be dropped when it merges upstream.

Testing

  • Tested on Windows (Zen 5 + AMD 860M iGPU) with Qwen3 ASR 0.6B Int4 and Parakeet V3
  • Verified: setting persists across restarts, model reloads on change, benchmark completes and applies result
  • Verified: cancel mid-benchmark restores previous state
  • Verified: benchmark with no recordings shows error message

Trade-offs and Alternatives

  • The benchmark requires a full model reload per trial (~4-12s each depending on model size) because ORT session thread count is baked in at creation time. For a 10-point grid this means ~1-2 minutes total.
  • The coarse grid [1,2,4,6,8,12,16,20,24,32] may miss the exact optimum but gets within ±2 threads of it.
  • The benchmark locks the engine exclusively — the user can't record while it runs.

@andrewleech
Copy link
Copy Markdown
Author

Depends on transcribe-rs PR #64 for set_ort_intra_threads / get_ort_intra_threads. The DROP commit pins to that branch and should be dropped when it merges.

@andrewleech andrewleech force-pushed the feat/ort-thread-tuning branch 2 times, most recently from 596471b to b121f2e Compare March 23, 2026 05:36
pi-anl added 3 commits March 25, 2026 13:06
Exposes transcribe-rs set_ort_intra_threads as a user-facing setting
in Advanced → Experimental Features. Default 0 uses all cores.
Adds an "Auto" button next to the ONNX Thread Count setting that
benchmarks different thread counts against recordings from history.
Uses a coarse grid sweep (1,2,4,6,8,12,16,20,24,32), tests mid-range
first for quick feedback, shows a modal with live progress and a
results table with all timings on completion.
@andrewleech andrewleech force-pushed the feat/ort-thread-tuning branch from b121f2e to 041e651 Compare March 25, 2026 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants