Skip to content

fix(stage-ui): prevent Kokoro fp32-webgpu hang and fix STT mic device enumeration#1638

Open
ENTWOPY wants to merge 4 commits intomoeru-ai:mainfrom
ENTWOPY:claude/happy-lalande
Open

fix(stage-ui): prevent Kokoro fp32-webgpu hang and fix STT mic device enumeration#1638
ENTWOPY wants to merge 4 commits intomoeru-ai:mainfrom
ENTWOPY:claude/happy-lalande

Conversation

@ENTWOPY
Copy link
Copy Markdown
Contributor

@ENTWOPY ENTWOPY commented Apr 11, 2026

Summary

  • Kokoro TTS no longer hangs on first visitgetDefaultKokoroModel now always returns q4f16 (WASM, ~320 MB) instead of fp32-webgpu (~700 MB). The WebGPU model caused the settings page to become unresponsive on first load while the worker silently attempted a massive background download. Users who want the full-precision WebGPU model can still select it manually.
  • Existing fp32-webgpu saves are migratedkokoro-local.vue now also migrates any already-saved fp32-webgpu config to q4f16 on mount, so returning users are unblocked without having to manually reset their settings.
  • STT mic device dropdown no longer appears emptyhearing.vue now calls askPermission() in onMounted so the browser permission dialog fires immediately when the user opens the Hearing page. Previously the dropdown stayed empty until the user manually triggered device enumeration, making STT look completely broken.
  • Auto-selected device is now persistedaudio-device.ts adds a reverse watcher: when the useAudioDevice composable auto-selects a default mic after permission is granted, that selection is written back to localStorage. Without this, the dropdown and the audio stream could fall out of sync across reloads.

Test plan

  • Open Settings → Providers → Speech → Kokoro TTS (Local) on a fresh profile (no saved model) — page should load without hanging and q4f16 should be selected automatically
  • Open the same page with fp32-webgpu previously saved in localStorage — page should migrate to q4f16 on mount instead of hanging
  • Open Settings → Modules → Hearing — browser should prompt for microphone permission immediately; after granting, the audio input dropdown should be populated
  • Reload the page after granting mic permission — the previously auto-selected device should still be selected in the dropdown

🤖 Generated with Claude Code

ENTWOPY added 3 commits April 11, 2026 01:49
On first visit to the Kokoro TTS settings page, config.model is
undefined so validateProviderConfig fails and the model/voices
never load. Auto-save the hardware-appropriate default (q4f16 for
WASM, fp32-webgpu for WebGPU) before validation so the model
initialises correctly without requiring manual selection.
…eration

- kokoro/constants: always default to q4f16 (WASM, ~320MB) instead of
  fp32-webgpu (~700MB) to avoid indefinite page hang on first visit;
  WebGPU model is still selectable manually via the dropdown
- kokoro-local.vue: also migrate existing saved fp32-webgpu config to
  q4f16 on page mount so returning users are unblocked without a reset
- audio-device.ts: add reverse watcher so when the composable
  auto-selects a default device after permission is granted the
  selected device ID is written back to persisted storage, keeping the
  dropdown and stream in sync across reloads
- hearing.vue: call askPermission() in onMounted so the browser
  permission dialog fires immediately and the audio input dropdown is
  populated when the user first opens the Hearing settings page
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves the initialization of audio devices and optimizes the default Kokoro model selection to prevent performance issues. It now requests microphone permissions on mount to populate the device list immediately and defaults to the lighter 'q4f16' model to avoid long downloads. Feedback was provided to ensure that errors during the permission request are surfaced to the user, as the current implementation silently catches them, which could leave the user confused if the device list remains empty.

Comment on lines +476 to +478
askPermission().catch(() => {
// Permission denied — the dropdown will remain empty and the user will see a warning.
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The askPermission() call catches errors but doesn't surface them to the user, despite the comment stating that the user will see a warning. If microphone permission is denied, the audio input dropdown will remain empty without any feedback, which can be confusing for users. It's better to capture the error and display it using the existing error ref so the user understands why the device list is empty.

  askPermission().catch((err) => {
    // Permission denied — the dropdown will remain empty.
    // We surface the error so the user knows why devices are missing.
    error.value = err instanceof Error ? err.message : String(err)
  })

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5b4128d768

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +115 to +116
if (!config.model || config.model === 'fp32-webgpu') {
config.model = getDefaultKokoroModel(hasWebGPU.value)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve user-selected fp32-webgpu model

This condition rewrites config.model from fp32-webgpu to q4f16 on every page mount, so users who explicitly choose the WebGPU model cannot persist that choice across navigation/reload. In practice, the app will silently revert them each time they revisit Kokoro settings, which breaks expected settings persistence and makes manual opt-in effectively non-sticky.

Useful? React with 👍 / 👎.

Add pattern to TOOLS_RELATED_ERROR_PATTERNS matching Groq's 400 response
"property 'X' is unsupported" — triggered by OpenAI-specific tool params
(e.g. capture_tool_errors) that the xsai library sends unconditionally.
On first failure airi now auto-retries without tools instead of surfacing
a raw 400 error to the user.
@nekomeowww nekomeowww changed the title fix(tts/stt): prevent Kokoro fp32-webgpu hang and fix STT mic device enumeration fix(stage-ui): prevent Kokoro fp32-webgpu hang and fix STT mic device enumeration Apr 13, 2026
@nekomeowww
Copy link
Copy Markdown
Member

Rebase is needed, it conflicted.

@nekomeowww nekomeowww added bug/providers Some providers aren't working pr-review/ok-to-merge Pull Request that looks good to maintainers, equivalent to LGTM pr-review/hold/needs-rebase Pull Request that needs rebase to either main branch or specific branch pr-review/ok-to-deploy Pull Request that confirmed to be deploy to either Preview or Prod safe scope/audio-output Scope related to audio output (TTS, Voice cloning, etc.) labels Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug/providers Some providers aren't working pr-review/hold/needs-rebase Pull Request that needs rebase to either main branch or specific branch pr-review/ok-to-deploy Pull Request that confirmed to be deploy to either Preview or Prod safe pr-review/ok-to-merge Pull Request that looks good to maintainers, equivalent to LGTM scope/audio-output Scope related to audio output (TTS, Voice cloning, etc.)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants