settings: Add ORT thread count setting with auto-tune benchmark.#1120
Open
andrewleech wants to merge 3 commits intocjpais:mainfrom
Open
settings: Add ORT thread count setting with auto-tune benchmark.#1120andrewleech wants to merge 3 commits intocjpais:mainfrom
andrewleech wants to merge 3 commits intocjpais:mainfrom
Conversation
Author
|
Depends on transcribe-rs PR #64 for |
596471b to
b121f2e
Compare
Exposes transcribe-rs set_ort_intra_threads as a user-facing setting in Advanced → Experimental Features. Default 0 uses all cores.
Adds an "Auto" button next to the ONNX Thread Count setting that benchmarks different thread counts against recordings from history. Uses a coarse grid sweep (1,2,4,6,8,12,16,20,24,32), tests mid-range first for quick feedback, shows a modal with live progress and a results table with all timings on completion.
b121f2e to
041e651
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ORT's default of "use all cores" isn't always the fastest for ONNX inference — on my Zen 5 machine, 6 threads gives 7.1x realtime vs ~6.7x at 16 threads for Qwen3 0.6B Int4. But finding the optimal count requires manual trial-and-error with no feedback.
This adds two things:
Thread count setting — a number input (0-32, where 0 = auto) in Advanced → Experimental Features that controls
transcribe_rs::accel::set_ort_intra_threads. Changing it unloads the current model so the next transcription picks up the new session configuration.Auto-tune benchmark — an "Auto" button next to the input that opens a modal dialog, transcribes real recordings from the user's history at different thread counts, and shows the results. It tests a coarse grid (1, 2, 4, 6, 8, 12, 16, 20, 24, 32), starting from mid-range for quick initial feedback. On completion it shows a table of all results with RTF (real-time factor) so the user can see the performance curve and apply the best value.
The benchmark:
Depends on transcribe-rs feat/ort-thread-count (PR forthcoming). The DROP commit pins to this branch and should be dropped when it merges upstream.
Testing
Trade-offs and Alternatives