Skip to content

feat(visual-chat): harden vision-text pipeline and packaged desktop runtime#1579

Open
Joker-of-Gotham wants to merge 9 commits intomoeru-ai:mainfrom
Joker-of-Gotham:joker-of-gotham/feature/ai-visual-chat-multisource-realtime-pipeline
Open

feat(visual-chat): harden vision-text pipeline and packaged desktop runtime#1579
Joker-of-Gotham wants to merge 9 commits intomoeru-ai:mainfrom
Joker-of-Gotham:joker-of-gotham/feature/ai-visual-chat-multisource-realtime-pipeline

Conversation

@Joker-of-Gotham
Copy link
Copy Markdown

@Joker-of-Gotham Joker-of-Gotham commented Apr 5, 2026

Description

This PR hardens the AIRI visual chat stack and aligns the shipped product around a secure vision+text realtime pipeline. It removes the remaining workspace-only assumptions from the desktop runtime path, tightens session access, and updates the docs / diagnostics / devtools to match the actual ollama-lite worker behavior.

What this PR adds

  • Session-scoped and local-admin access tokens for visual chat gateway access
  • Unified sessionId validation across gateway routes, websocket subscription, and storage paths
  • Protected room token issuance, diagnostics, and worker proxy surfaces
  • Packaged desktop runtime startup for bundled visual-chat gateway / worker entries
  • App-data-backed tunnel config, public endpoint files, and cloudflared cache paths
  • Safer visual-chat stop behavior that avoids blind port-based process killing
  • A clearer shipped interaction mode name: �ision-text-realtime
  • Worker health / diagnostics / devtools / docs alignment with the fixed ollama-lite path
  • Apple Silicon Ollama discovery improvement and Windows ARM64 cloudflared download support

Technical path

This implementation keeps the current shipped runtime explicitly focused on realtime latest-frame vision plus typed text prompts. The gateway now derives stable local access tokens from a persisted secret, enforces them across session routes and websocket flows, and the desktop runtime can now resolve either a development workspace or a packaged runtime path.

Key design points:

  • session-scoped access for phone/shared clients
  • local-admin access for diagnostics, session management, and worker proxy routes
  • fail-closed desktop setup when neither workspace nor packaged runtime is available
  • packaged runtime startup via bundled visual-chat service dist entries
  • visual-chat runtime artifacts moved out of the repository root and into app-data locations
  • docs and devtools updated to stop advertising native duplex audio in ollama-lite mode

Linked Issues

N/A

Additional Context

Validation / Checks

  • pnpm install
  • pnpm -F @proj-airi/electron-eventa build
  • pnpm -F @proj-airi/stage-tamagotchi run typecheck:node
  • pnpm -F @proj-airi/visual-chat-ops doctor:visual-chat
  • pnpm -F @proj-airi/stage-tamagotchi build
  • pnpm -F @proj-airi/visual-chat-ops start:local
  • targeted pnpm exec moeru-lint --fix ... on touched visual-chat / desktop-runtime paths

Reviewer focus

  • Gateway auth / websocket subscription enforcement
  • Packaged desktop runtime startup in stage-tamagotchi
  • App-data path migration for tunnel and public endpoint artifacts
  • Docs / diagnostics / devtools alignment with the fixed ollama-lite pipeline
image image image image image image

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Visual Chat feature, adding a comprehensive architecture for real-time multimodal interaction. It includes a gateway service, an inference worker bridge, and integration with the AIRI ecosystem. My review identified a potential issue with process group termination in the development script, a need for better error feedback in the WebSocket subscription handler, and a missing timeout mechanism for user-triggered inference requests.

Comment on lines +12 to +17
const child = spawn('pnpm', ['exec', 'electron-vite', 'dev'], {
cwd: process.cwd(),
env,
stdio: 'inherit',
shell: true,
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

On non-Windows platforms, process.kill(-child.pid!, 'SIGTERM') is used to terminate the entire process group. For this to work correctly, the child process must be spawned as a process group leader using the detached: true option. Without it, child.pid is not a valid process group ID, and the signal might fail or be sent to the parent's process group.

Suggested change
const child = spawn('pnpm', ['exec', 'electron-vite', 'dev'], {
cwd: process.cwd(),
env,
stdio: 'inherit',
shell: true,
})
const child = spawn('pnpm', ['exec', 'electron-vite', 'dev'], {
cwd: process.cwd(),
env,
stdio: 'inherit',
shell: true,
detached: process.platform !== 'win32',
})

Comment on lines +31 to +32
if (!data.sessionToken || !options.authorizeSessionAccess?.(data.sessionId, data.sessionToken))
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When a subscription request fails authorization, the message is silently ignored. It would be better to provide feedback to the client by sending an error event. Per repository rules, ensure the event includes a programmatic 'source' identifier for better extensibility.

        if (data.type === 'subscribe' && data.sessionId) {
          if (!data.sessionToken || !options.authorizeSessionAccess?.(data.sessionId, data.sessionToken)) {
            peer.send(JSON.stringify({ event: 'error', source: 'subscription_handler', sessionId: data.sessionId, data: { message: 'Unauthorized session access' }, timestamp: Date.now() }))
            return
          }
References
  1. For extensibility, identify the source of an event using a programmatic identifier (e.g., a source field in metadata) rather than relying on string patterns within the message content.

Comment on lines +721 to +729
const timeoutId = auto
? setTimeout(() => {
if (!abortController.signal.aborted) {
state.stats.timedOut++
log.withTag('realtime').log(`[trace:${traceId}] Inference timed out after ${AUTO_OBSERVE_INFERENCE_TIMEOUT_MS}ms`)
abortController.abort()
}
}, AUTO_OBSERVE_INFERENCE_TIMEOUT_MS)
: null
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

User-triggered inference requests (auto: false) currently have no timeout mechanism. If the worker bridge or the upstream Ollama service hangs, the gateway's request will remain pending indefinitely, potentially leading to resource exhaustion or a degraded user experience. Consider applying a default timeout for all inference requests.

    const timeoutId = setTimeout(() => {
      if (!abortController.signal.aborted) {
        if (auto) state.stats.timedOut++
        log.withTag('realtime').log('[trace:' + traceId + '] Inference timed out after ' + AUTO_OBSERVE_INFERENCE_TIMEOUT_MS + 'ms')
        abortController.abort()
      }
    }, AUTO_OBSERVE_INFERENCE_TIMEOUT_MS)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 25761c705e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


router.post('/api/sessions/:sessionId/access', defineEventHandler((event) => {
requireGatewayAccess(event)
const sessionId = normalizeVisualChatSessionId(getRouterParam(event, 'sessionId')!)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle invalid session IDs before normalizing access route

normalizeVisualChatSessionId(...) throws for malformed IDs, but this handler does not catch that exception, so a bad :sessionId currently bubbles out as a 500 instead of a 4xx client error. That makes malformed requests look like server failures and breaks the "unified sessionId validation" behavior expected by callers (the same uncaught pattern is also used in the restore route).

Useful? React with 👍 / 👎.

Comment on lines +620 to +623
state.autoObserve.lastFrameFingerprint = fingerprint

const startedAt = Date.now()
await this.inferWithLatestFrame(sessionId, DEFAULT_AUTO_OBSERVE_PROMPT, true)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Update auto-observe fingerprint only after successful inference

The frame fingerprint is persisted before running inferWithLatestFrame(...). If that inference fails or is aborted (e.g., transient worker outage), lastFrameFingerprint still advances, so subsequent cycles on the same frame are treated as "no change" and skipped. In practice this can stall auto-observe updates until a new frame arrives, even though no successful memory refresh happened.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4a3346bc5e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

function handleChunk(chunk: Uint8Array | string) {
const text = chunk.toString()
appendOutput(text)
const match = text.match(QUICK_TUNNEL_PATTERN)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Parse quick-tunnel URL from accumulated output

cloudflared stdout/stderr chunks are not line- or token-aligned, but the matcher only checks the current text chunk. If the trycloudflare.com URL is split across chunk boundaries, the regex never matches and the command times out even though the tunnel is actually up, causing flaky share/start behavior in real runs.

Useful? React with 👍 / 👎.

Comment on lines +180 to +184
settled = true
resolve({
name: tunnelName,
url: '',
close: () => child.kill(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject tunnel startup when readiness times out

When no readiness log line appears before TUNNEL_RUN_TIMEOUT_MS, this path resolves success with an empty URL instead of failing. That lets startNamedTunnels continue, write endpoint metadata, and report tunnels as running even when they may never have connected, which can leave users with broken public URLs and no immediate error.

Useful? React with 👍 / 👎.

@nekomeowww nekomeowww added scope/extension Scope related to extension api, or internally known as tentacle api, mod api, plugin api scope/agent Scope related to how we harness agent, or build the agent workflow pr-review/way-too-large Pull Request that way too large, not easy to review, be careful when reviewing priority/nice-to-have Issue, or Pull Request that nice to have but can be handled later feature Related to feature scope/visual-input Scope related to vision/visual input (YOLO, OCR, VLM, etc.) labels Apr 7, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 7, 2026

⏳ Approval required for deploying to Cloudflare Workers (Preview) for stage-web.

Name Link
🔭 Waiting for approval For maintainers, approve here

Hey, @nekomeowww, @sumimakito, @luoling8192, @LemonNekoGH, kindly take some time to review and approve this deployment when you are available. Thank you! 🙏

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 55cc0f13c7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +49 to +51
if (!actual || expected.length !== actual.length)
return false
return timingSafeEqual(Buffer.from(expected), Buffer.from(actual))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard timing-safe compare against UTF-8 length mismatches

tokensMatch checks string length before calling timingSafeEqual, but Buffer.from(...) compares byte length, not JS string length. A header containing non-ASCII characters can have the same .length as the expected token while producing a different byte length, which makes timingSafeEqual throw and turns auth failures into 500s on requireGatewayAccess routes instead of a clean 403. This is externally triggerable with a crafted token header and should be handled as a normal auth miss.

Useful? React with 👍 / 👎.

}
else {
try {
execSync('pkill -f "visual-chat-gateway|visual-chat-worker-minicpmo|visual-chat-ops/.+(share.ts|dev-tamagotchi.ts|setup-tunnel.ts)|cloudflared tunnel"', { stdio: 'ignore' })
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Limit stop command to AIRI-managed cloudflared processes

The Unix stop path unconditionally runs pkill -f ...|cloudflared tunnel, which matches any cloudflared tunnel process on the machine, not just tunnels started by this project. Running visual-chat-ops stop can therefore terminate unrelated Cloudflare tunnels used by other apps/sessions, causing collateral outages outside AIRI.

Useful? React with 👍 / 👎.

…ture/ai-visual-chat-multisource-realtime-pipeline

# Conflicts:
#	apps/stage-tamagotchi/tsconfig.node.json
#	apps/stage-tamagotchi/tsconfig.web.json
#	packages/stage-pages/tsconfig.json
#	packages/stage-ui/package.json
#	packages/stage-ui/src/components/markdown/markdown-renderer.vue
#	packages/stage-ui/src/composables/use-modules-list.ts
#	pnpm-lock.yaml
#	pnpm-workspace.yaml
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ab7c572dd8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

console.info('\nInstallation complete.')
}

install()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Badge Guard install CLI from running on module import

Remove the top-level install() call or gate it behind a direct-execution check. This file is re-exported by packages/visual-chat-ops/src/index.ts, and apps/stage-tamagotchi/src/main/services/electron/visual-chat.ts imports that package at runtime; as written, simply importing the module can execute pnpm install and pnpm run build:packages, causing unexpected long-running side effects and startup failures in production/dev flows that only intended to use helper functions.

Useful? React with 👍 / 👎.

console.info('\nPrune complete.')
}

prune()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Badge Guard prune CLI from running on module import

Do not invoke prune() at module top level; gate it with an isDirectExecution() check like the other CLI files. Because this module is also re-exported from packages/visual-chat-ops/src/index.ts, importing @proj-airi/visual-chat-ops can immediately run retention cleanup (pruneWithPolicy) against visual-chat data/log/cache directories, which is destructive behavior for callers that only intended to import library APIs.

Useful? React with 👍 / 👎.

…ture/ai-visual-chat-multisource-realtime-pipeline
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 83497f10f7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

+ ' ($_.CommandLine -match \'visual-chat-gateway\')'
+ ' -or ($_.CommandLine -match \'visual-chat-worker-minicpmo\')'
+ ' -or ($_.CommandLine -match \'visual-chat-ops.+(share\\\\.ts|dev-tamagotchi\\\\.ts|setup-tunnel\\\\.ts)\')'
+ ' -or ($_.Name -match \'cloudflared\')'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Scope Windows process kill filter to AIRI-managed tunnels

Tighten the Windows stop filter so it does not match all cloudflared processes by executable name alone. The current predicate includes ($_.Name -match 'cloudflared'), so running visual-chat-ops stop on Windows will terminate unrelated Cloudflare tunnels on the same machine, not just AIRI-managed ones; this causes collateral outages for other local services/users whenever they coexist.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Related to feature pr-review/way-too-large Pull Request that way too large, not easy to review, be careful when reviewing priority/nice-to-have Issue, or Pull Request that nice to have but can be handled later scope/agent Scope related to how we harness agent, or build the agent workflow scope/extension Scope related to extension api, or internally known as tentacle api, mod api, plugin api scope/visual-input Scope related to vision/visual input (YOLO, OCR, VLM, etc.)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants