fix(linux): enable GPU acceleration for ONNX models by aasmall · Pull Request #1298 · cjpais/Handy

aasmall · 2026-04-15T19:22:30Z

Before Submitting This PR

I have searched existing issues and pull requests (including closed ones) to ensure this isn't a duplicate
I have read CONTRIBUTING.md

Human Written Description

On Arch/Omarchy, GPU acceleration isn't fully wired up - most of the plumbing exists, but without the transcribe-rs features, it's non-functional. This PR narrowly enables required features to permit GPU acceleration if the runtime requirements are already met.

Prior art and context

This is a well-known pain point with active work in progress. I want to be explicit about where this PR fits:

Discussion #494 — "Support for NVIDIA GPU on parakeet models" is the active thread. @cjpais most recently stated: "It's being worked on officially, the main issue is distribution and ci/cd pipelines." Another user (ballyhoo38923) built Handy with CUDA locally and reported a 3.2x speedup on RTX 4060 Ti (Parakeet on Windows).
PR #1058 (merged) is @cjpais's pattern for experimental accelerator selection — the OrtAcceleratorSetting enum in settings.rs already has Cuda, Rocm, and DirectMl variants and an "Experimental" settings group for the dropdown. This PR just enables the backend features so those existing variants become real on Linux.
PR #1203 "try ort cuda" is @cjpais's own open experiment. This PR does not conflict — try ort cuda #1203 is broader and will likely supersede or absorb the Linux-specific piece whenever it lands.
PRs #958 and #985 were earlier community GPU attempts closed in favour of CJ's own approach. I read the close comments — the recurring theme was CI/distribution, which I explicitly address below.

Scope of this PR (intentionally narrow)

This PR is the necessary but not sufficient change to enable GPU-accelerated ONNX inference on Linux. It enables the Rust code paths that register CUDA and ROCm execution providers, which unblocks users who already have a GPU-enabled onnxruntime on their system (e.g. Arch's onnxruntime-rocm package, NixOS overlays, from-source builds). Official release binaries will continue to ship a CPU-only onnxruntime — users of those see no change unless they also install a GPU-enabled onnxruntime separately.

This PR deliberately does not try to solve the bigger distribution question that @cjpais is actively working on (which onnxruntime variants to bundle, multi-artifact releases vs runtime downloads, AMD vs NVIDIA vs WebGPU, binary size trade-offs). It's a one-line change plus i18n copy that unblocks the system-library-override path without touching CI or distribution.

Related Issues/Discussions

#494 — Support for NVIDIA GPU on parakeet models (active thread)

Changes

One-line Cargo.toml change:

-transcribe-rs = { version = "0.3.3", features = ["whisper-vulkan"] }
+transcribe-rs = { version = "0.3.8", features = ["whisper-cpp", "onnx", "whisper-vulkan", "ort-rocm", "ort-cuda"] }

Plus i18n copy updates to clarify the acceleration settings UI (Whisper Acceleration vs ONNX Acceleration descriptions now explicitly say which models each affects).

Note on Cargo.lock: Not included — intentionally. The generic dep on line 72 is already 0.3.8, so Cargo already resolves Linux to 0.3.8 today. This PR's only net change is adding features, which Cargo.lock tracks separately without version churn. If the maintainer wants the lock regenerated, cargo update -p transcribe-rs -p ort produces the diff.

Note on build-time SDKs and CI: I traced through the ort-sys 2.0.0-rc.12 build script to confirm CI won't need new toolchains. Two cases:

When ORT_LIB_LOCATION is set (CI's Ubuntu 22.04 + macOS x86_64 paths), ort-sys does dynamic linking with rustc-link-lib=onnxruntime and returns immediately. No feature inspection, no SDK requirement.
When ORT_LIB_LOCATION is not set (other CI paths), ort-sys tries to download a prebuilt matching feature set "cu12,rocm". No such combined prebuilt exists in ort's dist table, so it logs a warning and falls back to (target, "none") — the same CPU-only binary that builds use today.

Either path produces a working build. The feature flags only enable Rust code paths that register the EP APIs; they don't require CUDA or ROCm toolkits at build time.

Testing

Tested on Framework 13 AMD (Ryzen AI 300, Radeon 880M) running Arch Linux with Hyprland
ROCm-accelerated ONNX inference confirmed working via system onnxruntime-rocm package
Parakeet V2 model loads and transcribes using GPU acceleration (previously CPU-only)
Whisper models continue to work with Vulkan acceleration (no regression)

Auto-triggered CI checks verified locally before marking ready:

cargo fmt --check
bun run lint, bun run format:check (excluding upstream's pre-existing AGENTS.md prettier issue that fails on main too)
bun run check:translations — all 20 languages have complete keys
bun run test:playwright — passes

Nix build not verified locally (no Nix on my machine). Based on the build script analysis above, the nix-check workflow should pass.

Screenshots/Videos

N/A — no UI changes beyond copy updates to the acceleration settings descriptions.

AI Assistance

AI was used (please describe below)

If AI was used:

Tools used: Claude Code (Claude Opus 4.7), with code review from CodeRabbit, OpenAI Codex, and Grok
How extensively: AI identified the version/feature mismatch and wrote the one-line fix. Multiple rounds of independent AI code review. AI also traced through the ort-sys build script to verify CI compatibility (see "Note on build-time SDKs and CI" above) and surveyed existing PRs/discussions to position this PR relative to prior art. Human tested on real hardware and made the scope decisions.

Linux was pinned at transcribe-rs 0.3.3 with only whisper-vulkan, leaving all ONNX models (Parakeet, Canary, Moonshine, SenseVoice) CPU-only while macOS and Windows had GPU acceleration. Bumps to match the generic dep version and adds ort-rocm + ort-cuda features for AMD and NVIDIA GPUs. Also clarifies the accelerator description UI so users understand which setting applies to Whisper vs ONNX models.

cjpais · 2026-04-19T02:26:32Z

This is not going to get pulled in mainly because we're going to be deprecating Onnx in the future

This was referenced Apr 15, 2026

fix(linux): prefer dotool for reliable Wayland text input #1299

Open

fix(linux): add 'None' keyboard implementation for Wayland compositors #1300

Open

aasmall force-pushed the linux-gpu-acceleration branch 3 times, most recently from 981b50b to d94235f Compare April 17, 2026 18:55

aasmall force-pushed the linux-gpu-acceleration branch from d94235f to 74fe7d1 Compare April 17, 2026 19:09

aasmall marked this pull request as ready for review April 17, 2026 23:55

cjpais closed this Apr 19, 2026

MzaGuille mentioned this pull request May 2, 2026

[BUG] ONNX models (Canary, Cohere) cannot use NVIDIA dGPU for inference on Linux #1359

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(linux): enable GPU acceleration for ONNX models#1298

fix(linux): enable GPU acceleration for ONNX models#1298
aasmall wants to merge 1 commit intocjpais:mainfrom
aasmall:linux-gpu-acceleration

aasmall commented Apr 15, 2026 •

edited

Loading

Uh oh!

cjpais commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

aasmall commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before Submitting This PR

Human Written Description

Prior art and context

Scope of this PR (intentionally narrow)

Related Issues/Discussions

Changes

Testing

Screenshots/Videos

AI Assistance

Uh oh!

cjpais commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aasmall commented Apr 15, 2026 •

edited

Loading