Skip to content

feat: Add Sherpa ONNX backend for ASR and TTS#8523

Merged
mudler merged 1 commit intomudler:masterfrom
richiejp:feat/add-sherpa-backend
Apr 24, 2026
Merged

feat: Add Sherpa ONNX backend for ASR and TTS#8523
mudler merged 1 commit intomudler:masterfrom
richiejp:feat/add-sherpa-backend

Conversation

@richiejp
Copy link
Copy Markdown
Collaborator

@richiejp richiejp commented Feb 12, 2026

The Sherpa backend can handle... almost everything related to voice. So far with have VAD, ASR, TTS. It should be relatively simple to add wake words, diarization etc. However I've reached a point where there is so much stuff to test that I'm just going to add one model we don't already have (Ominiligual ASR) and go towards just trying to get the backend initially merged and then expand on testing.

Sherpa supports a lot of models we already have Python backends for, but at a fraction of the size because it is all based on ONNX. We also have ONNX backends already, but it's not clear that we have GPU acceleration for all of those.

@netlify
Copy link
Copy Markdown

netlify Bot commented Feb 12, 2026

Deploy Preview for localai ready!

Name Link
🔨 Latest commit ffc5018
🔍 Latest deploy log https://app.netlify.com/projects/localai/deploys/69949097c49885000868f16b
😎 Deploy Preview https://deploy-preview-8523--localai.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@richiejp richiejp force-pushed the feat/add-sherpa-backend branch from 5f176ef to f705e60 Compare February 13, 2026 17:06
@richiejp richiejp force-pushed the feat/add-sherpa-backend branch from f705e60 to c696f1c Compare February 17, 2026 09:09
@richiejp richiejp marked this pull request as ready for review February 17, 2026 09:19
@richiejp richiejp force-pushed the feat/add-sherpa-backend branch 6 times, most recently from 2c97d0b to ffc5018 Compare February 17, 2026 16:00
@mudler
Copy link
Copy Markdown
Owner

mudler commented Apr 21, 2026

seems I've completely missed this, sorry @richiejp !

@richiejp
Copy link
Copy Markdown
Collaborator Author

No, problem, I put it on hold while I was blocked on testing other stuff, but can reboot it now. The main issue with this backend is testing, it has a huge feature/api surface.

@richiejp richiejp force-pushed the feat/add-sherpa-backend branch from ffc5018 to 85339fa Compare April 21, 2026 11:52
@mudler
Copy link
Copy Markdown
Owner

mudler commented Apr 21, 2026

No, problem, I put it on hold while I was blocked on testing other stuff, but can reboot it now. The main issue with this backend is testing, it has a huge feature/api surface.

👍 I see yep, would make sense then to try out pointing claude at https://github.com/mudler/LocalAI/tree/master/tests/e2e-backends as we have already a "small" suite e2e for backends directly by calling via gRPC - this basically skip all the API e2e tests and jump directly to exercise the backend. It usually is very good at doing test scaffolding in order to test. worth a shot

mudler
mudler previously approved these changes Apr 21, 2026
Comment thread backend/go/sherpa-onnx/backend_test.go Outdated
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
)

func TestSherpaBackendStruct(t *testing.T) {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to use ginkgo here for consistency

"os/exec"
"path/filepath"
"strings"
"testing"
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto here

Comment thread backend/go/sherpa-onnx/backend.go Outdated
package main

/*
#cgo LDFLAGS: -lsherpa-onnx-c-api -lonnxruntime -lstdc++
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no purego here?

@richiejp richiejp force-pushed the feat/add-sherpa-backend branch 5 times, most recently from 29c5906 to ad9f1c7 Compare April 24, 2026 10:42
Adds a new Go backend wrapping sherpa-onnx via purego (no cgo). Same
approach as opus/stablediffusion-ggml/whisper — a thin C shim
(csrc/shim.c + shim.h → libsherpa-shim.so) wraps the bits purego
can't reach directly: nested struct config writes, result-struct field
reads, and the streaming TTS callback trampoline. The Go side uses
opaque uintptr handles and purego.NewCallback for the TTS callback.

Supports:
- VAD via sherpa-onnx's Silero VAD
- Offline ASR: Whisper, Paraformer, SenseVoice, Omnilingual CTC
- Online/streaming ASR: zipformer transducer with endpoint detection
  (AudioTranscriptionStream emits delta events during decode)
- Offline TTS: VITS (LJS, etc.)
- Streaming TTS: sherpa-onnx's callback API → PCM chunks on a channel,
  prefixed by a streaming WAV header

Gallery entries: omnilingual-0.3b-ctc-q8-sherpa (1600-language offline
ASR), streaming-zipformer-en-sherpa (low-latency streaming ASR),
silero-vad-sherpa, vits-ljs-sherpa.

E2E coverage: tests/e2e-backends for offline + streaming ASR,
tests/e2e for the full realtime pipeline (VAD + STT + TTS).

Assisted-by: claude-opus-4-7-1M [Claude Code]

Signed-off-by: Richard Palethorpe <io@richiejp.com>
@richiejp richiejp force-pushed the feat/add-sherpa-backend branch from ad9f1c7 to fe68e6a Compare April 24, 2026 11:39
@richiejp
Copy link
Copy Markdown
Collaborator Author

No, problem, I put it on hold while I was blocked on testing other stuff, but can reboot it now. The main issue with this backend is testing, it has a huge feature/api surface.

👍 I see yep, would make sense then to try out pointing claude at https://github.com/mudler/LocalAI/tree/master/tests/e2e-backends as we have already a "small" suite e2e for backends directly by calling via gRPC - this basically skip all the API e2e tests and jump directly to exercise the backend. It usually is very good at doing test scaffolding in order to test. worth a shot

I think it should be using e2e-backends now for gRPC level tests. There are e2e tests based on the realtime API and lower level tests in the backend source as well so that we have a 3 tier approach.

Hopefully this is ready to go now. Next I'd want to use the real streaming for ASR and TTS in the realtime API. Also there are still features in Sherpa that haven't been exposed.

I have to say though that I am not a huge fan of ONNX, it seems harder to package than GGML based backends if you want all of the GPUs to work and I haven't even tried here, just using CUDA and CPU.

@mudler mudler merged commit 13734ae into mudler:master Apr 24, 2026
17 checks passed
@mudler mudler added the enhancement New feature or request label Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants