Skip to content

fix(linux): enable GPU acceleration for ONNX models#1298

Closed
aasmall wants to merge 1 commit intocjpais:mainfrom
aasmall:linux-gpu-acceleration
Closed

fix(linux): enable GPU acceleration for ONNX models#1298
aasmall wants to merge 1 commit intocjpais:mainfrom
aasmall:linux-gpu-acceleration

Conversation

@aasmall
Copy link
Copy Markdown

@aasmall aasmall commented Apr 15, 2026

Before Submitting This PR

Human Written Description

On Arch/Omarchy, GPU acceleration isn't fully wired up - most of the plumbing exists, but without the transcribe-rs features, it's non-functional. This PR narrowly enables required features to permit GPU acceleration if the runtime requirements are already met.

Prior art and context

This is a well-known pain point with active work in progress. I want to be explicit about where this PR fits:

  • Discussion #494 — "Support for NVIDIA GPU on parakeet models" is the active thread. @cjpais most recently stated: "It's being worked on officially, the main issue is distribution and ci/cd pipelines." Another user (ballyhoo38923) built Handy with CUDA locally and reported a 3.2x speedup on RTX 4060 Ti (Parakeet on Windows).
  • PR #1058 (merged) is @cjpais's pattern for experimental accelerator selection — the OrtAcceleratorSetting enum in settings.rs already has Cuda, Rocm, and DirectMl variants and an "Experimental" settings group for the dropdown. This PR just enables the backend features so those existing variants become real on Linux.
  • PR #1203 "try ort cuda" is @cjpais's own open experiment. This PR does not conflict — try ort cuda #1203 is broader and will likely supersede or absorb the Linux-specific piece whenever it lands.
  • PRs #958 and #985 were earlier community GPU attempts closed in favour of CJ's own approach. I read the close comments — the recurring theme was CI/distribution, which I explicitly address below.

Scope of this PR (intentionally narrow)

This PR is the necessary but not sufficient change to enable GPU-accelerated ONNX inference on Linux. It enables the Rust code paths that register CUDA and ROCm execution providers, which unblocks users who already have a GPU-enabled onnxruntime on their system (e.g. Arch's onnxruntime-rocm package, NixOS overlays, from-source builds). Official release binaries will continue to ship a CPU-only onnxruntime — users of those see no change unless they also install a GPU-enabled onnxruntime separately.

This PR deliberately does not try to solve the bigger distribution question that @cjpais is actively working on (which onnxruntime variants to bundle, multi-artifact releases vs runtime downloads, AMD vs NVIDIA vs WebGPU, binary size trade-offs). It's a one-line change plus i18n copy that unblocks the system-library-override path without touching CI or distribution.

Related Issues/Discussions

Changes

One-line Cargo.toml change:

-transcribe-rs = { version = "0.3.3", features = ["whisper-vulkan"] }
+transcribe-rs = { version = "0.3.8", features = ["whisper-cpp", "onnx", "whisper-vulkan", "ort-rocm", "ort-cuda"] }

Plus i18n copy updates to clarify the acceleration settings UI (Whisper Acceleration vs ONNX Acceleration descriptions now explicitly say which models each affects).

Note on Cargo.lock: Not included — intentionally. The generic dep on line 72 is already 0.3.8, so Cargo already resolves Linux to 0.3.8 today. This PR's only net change is adding features, which Cargo.lock tracks separately without version churn. If the maintainer wants the lock regenerated, cargo update -p transcribe-rs -p ort produces the diff.

Note on build-time SDKs and CI: I traced through the ort-sys 2.0.0-rc.12 build script to confirm CI won't need new toolchains. Two cases:

  • When ORT_LIB_LOCATION is set (CI's Ubuntu 22.04 + macOS x86_64 paths), ort-sys does dynamic linking with rustc-link-lib=onnxruntime and returns immediately. No feature inspection, no SDK requirement.
  • When ORT_LIB_LOCATION is not set (other CI paths), ort-sys tries to download a prebuilt matching feature set "cu12,rocm". No such combined prebuilt exists in ort's dist table, so it logs a warning and falls back to (target, "none") — the same CPU-only binary that builds use today.

Either path produces a working build. The feature flags only enable Rust code paths that register the EP APIs; they don't require CUDA or ROCm toolkits at build time.

Testing

  • Tested on Framework 13 AMD (Ryzen AI 300, Radeon 880M) running Arch Linux with Hyprland
  • ROCm-accelerated ONNX inference confirmed working via system onnxruntime-rocm package
  • Parakeet V2 model loads and transcribes using GPU acceleration (previously CPU-only)
  • Whisper models continue to work with Vulkan acceleration (no regression)

Auto-triggered CI checks verified locally before marking ready:

  • cargo fmt --check
  • bun run lint, bun run format:check (excluding upstream's pre-existing AGENTS.md prettier issue that fails on main too)
  • bun run check:translations — all 20 languages have complete keys
  • bun run test:playwright — passes

Nix build not verified locally (no Nix on my machine). Based on the build script analysis above, the nix-check workflow should pass.

Screenshots/Videos

N/A — no UI changes beyond copy updates to the acceleration settings descriptions.

AI Assistance

  • AI was used (please describe below)

If AI was used:

  • Tools used: Claude Code (Claude Opus 4.7), with code review from CodeRabbit, OpenAI Codex, and Grok
  • How extensively: AI identified the version/feature mismatch and wrote the one-line fix. Multiple rounds of independent AI code review. AI also traced through the ort-sys build script to verify CI compatibility (see "Note on build-time SDKs and CI" above) and surveyed existing PRs/discussions to position this PR relative to prior art. Human tested on real hardware and made the scope decisions.

Linux was pinned at transcribe-rs 0.3.3 with only whisper-vulkan, leaving
all ONNX models (Parakeet, Canary, Moonshine, SenseVoice) CPU-only while
macOS and Windows had GPU acceleration. Bumps to match the generic dep
version and adds ort-rocm + ort-cuda features for AMD and NVIDIA GPUs.

Also clarifies the accelerator description UI so users understand which
setting applies to Whisper vs ONNX models.
@aasmall aasmall force-pushed the linux-gpu-acceleration branch from d94235f to 74fe7d1 Compare April 17, 2026 19:09
@aasmall aasmall marked this pull request as ready for review April 17, 2026 23:55
@cjpais
Copy link
Copy Markdown
Owner

cjpais commented Apr 19, 2026

This is not going to get pulled in mainly because we're going to be deprecating Onnx in the future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants