fix(linux): enable GPU acceleration for ONNX models#1298
Closed
aasmall wants to merge 1 commit intocjpais:mainfrom
Closed
fix(linux): enable GPU acceleration for ONNX models#1298aasmall wants to merge 1 commit intocjpais:mainfrom
aasmall wants to merge 1 commit intocjpais:mainfrom
Conversation
This was referenced Apr 15, 2026
981b50b to
d94235f
Compare
Linux was pinned at transcribe-rs 0.3.3 with only whisper-vulkan, leaving all ONNX models (Parakeet, Canary, Moonshine, SenseVoice) CPU-only while macOS and Windows had GPU acceleration. Bumps to match the generic dep version and adds ort-rocm + ort-cuda features for AMD and NVIDIA GPUs. Also clarifies the accelerator description UI so users understand which setting applies to Whisper vs ONNX models.
d94235f to
74fe7d1
Compare
Owner
|
This is not going to get pulled in mainly because we're going to be deprecating Onnx in the future |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Before Submitting This PR
Human Written Description
On Arch/Omarchy, GPU acceleration isn't fully wired up - most of the plumbing exists, but without the transcribe-rs features, it's non-functional. This PR narrowly enables required features to permit GPU acceleration if the runtime requirements are already met.
Prior art and context
This is a well-known pain point with active work in progress. I want to be explicit about where this PR fits:
OrtAcceleratorSettingenum insettings.rsalready hasCuda,Rocm, andDirectMlvariants and an "Experimental" settings group for the dropdown. This PR just enables the backend features so those existing variants become real on Linux.Scope of this PR (intentionally narrow)
This PR is the necessary but not sufficient change to enable GPU-accelerated ONNX inference on Linux. It enables the Rust code paths that register CUDA and ROCm execution providers, which unblocks users who already have a GPU-enabled onnxruntime on their system (e.g. Arch's
onnxruntime-rocmpackage, NixOS overlays, from-source builds). Official release binaries will continue to ship a CPU-only onnxruntime — users of those see no change unless they also install a GPU-enabled onnxruntime separately.This PR deliberately does not try to solve the bigger distribution question that @cjpais is actively working on (which onnxruntime variants to bundle, multi-artifact releases vs runtime downloads, AMD vs NVIDIA vs WebGPU, binary size trade-offs). It's a one-line change plus i18n copy that unblocks the system-library-override path without touching CI or distribution.
Related Issues/Discussions
Changes
One-line Cargo.toml change:
Plus i18n copy updates to clarify the acceleration settings UI (Whisper Acceleration vs ONNX Acceleration descriptions now explicitly say which models each affects).
Note on Cargo.lock: Not included — intentionally. The generic dep on line 72 is already
0.3.8, so Cargo already resolves Linux to 0.3.8 today. This PR's only net change is adding features, which Cargo.lock tracks separately without version churn. If the maintainer wants the lock regenerated,cargo update -p transcribe-rs -p ortproduces the diff.Note on build-time SDKs and CI: I traced through the ort-sys 2.0.0-rc.12 build script to confirm CI won't need new toolchains. Two cases:
ORT_LIB_LOCATIONis set (CI's Ubuntu 22.04 + macOS x86_64 paths), ort-sys does dynamic linking withrustc-link-lib=onnxruntimeand returns immediately. No feature inspection, no SDK requirement.ORT_LIB_LOCATIONis not set (other CI paths), ort-sys tries to download a prebuilt matching feature set"cu12,rocm". No such combined prebuilt exists in ort's dist table, so it logs a warning and falls back to(target, "none")— the same CPU-only binary that builds use today.Either path produces a working build. The feature flags only enable Rust code paths that register the EP APIs; they don't require CUDA or ROCm toolkits at build time.
Testing
onnxruntime-rocmpackageAuto-triggered CI checks verified locally before marking ready:
cargo fmt --checkbun run lint,bun run format:check(excluding upstream's pre-existing AGENTS.md prettier issue that fails on main too)bun run check:translations— all 20 languages have complete keysbun run test:playwright— passesNix build not verified locally (no Nix on my machine). Based on the build script analysis above, the nix-check workflow should pass.
Screenshots/Videos
N/A — no UI changes beyond copy updates to the acceleration settings descriptions.
AI Assistance
If AI was used: