Skip to content

feat(stage-tamagotchi,computer-use-mcp): implement browser-native DOM action routing#1648

Merged
nekomeowww merged 31 commits intomoeru-ai:mainfrom
3361559784:codex/desktop-v2-browser-routing
Apr 24, 2026
Merged

feat(stage-tamagotchi,computer-use-mcp): implement browser-native DOM action routing#1648
nekomeowww merged 31 commits intomoeru-ai:mainfrom
3361559784:codex/desktop-v2-browser-routing

Conversation

@3361559784
Copy link
Copy Markdown
Contributor

这是 Desktop 线路的第 2 / 3 步。基于 1/3 的观察层,解决执行精准度问题。将符合条件的 chrome_dom 目标拦截,并在可能的情况下绕开 OS-level 点击,直插浏览器 extension/CDP 事件流。

本次包含:

  • 引入了 browser-action-router,提供对 clickcheckCheckboxsetInputValue 的显式类型路由决策树。
  • 补上了完整的 handler 层 Integration 测试。

不在本次范围:

  • 依然没有介入 Chrome 浏览器的生命周期管理(这部分交由 3/3 核心解决)。
  • 未对非 DOM 元素的原生 OS 弹窗做特殊注入。

⚠️ 高危与后续债 (Known Debts):

  • fallback 存在撕裂风险:如果由于 bridge 断连,从 browser_dom 回退到 os_input 时,如果页面正好发生了重排(Reflow),落脚点坐标依然可能失效。
  • 并发请求保护未加锁:存在前端快速发出的重叠指令对 bridge 造成击穿的可能性,此次不碰。

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 12, 2026

⏳ Approval required for deploying to Cloudflare Workers (Preview) for stage-web.

Name Link
🔭 Waiting for approval For maintainers, approve here

Hey, maintainers, kindly take some time to review and approve this deployment when you are available. Thank you! 🙏

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 50d218e4d6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread services/computer-use-mcp/src/server/register-desktop-grounding.ts Outdated
Comment thread services/computer-use-mcp/src/chrome-semantic-adapter.ts
Comment thread services/computer-use-mcp/src/server/register-desktop-grounding.ts Outdated
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a comprehensive desktop grounding layer for macOS automation, featuring a transparent visualization overlay, a read-only Chrome extension for DOM observation, and unified observation tools like desktop_observe and desktop_click_target. It also introduces a snap resolution engine and a descriptor-driven tool registration system. Feedback focuses on increasing the staleness threshold for click actions to better handle LLM latency, fixing a memory leak in the overlay polling timer, and improving the robustness of CSS selector generation and application detection consistency.

Comment thread services/computer-use-mcp/src/server/register-desktop-grounding.ts
Comment thread apps/stage-tamagotchi/src/renderer/pages/desktop-overlay-polling.ts
Comment thread services/computer-use-mcp/src/desktop-grounding.ts Outdated
Comment thread services/computer-use-mcp/src/chrome-semantic-adapter.ts
Comment thread services/computer-use-mcp/src/chrome-semantic-adapter.ts
Comment thread services/computer-use-mcp/src/chrome-semantic-adapter.ts
@nekomeowww nekomeowww added apps/stage-tamagotchi Desktop App: Windows & macOS & Linux pr-review/ok-to-merge Pull Request that looks good to maintainers, equivalent to LGTM pr-review/ok-to-deploy Pull Request that confirmed to be deploy to either Preview or Prod safe scope/extension Scope related to extension api, or internally known as tentacle api, mod api, plugin api scope/ui Scope related to UI/UX, or interface improve, perf, and bugs priority/urgent Issue, or Pull Request that urgent to be fixed or processed and removed pr-review/ok-to-merge Pull Request that looks good to maintainers, equivalent to LGTM labels Apr 16, 2026
@3361559784 3361559784 force-pushed the codex/desktop-v2-browser-routing branch from b815b7f to 4caca43 Compare April 20, 2026 07:13
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d9d02ec11b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread services/computer-use-mcp/src/server/register-desktop-grounding.ts Outdated
Comment thread services/computer-use-mcp/chrome-extension/content.js Outdated
Comment thread services/computer-use-mcp/chrome-extension/background.js
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7c35c91fa1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread services/computer-use-mcp/src/desktop-grounding.ts
Comment thread services/computer-use-mcp/src/chrome-semantic-adapter.ts
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b0861357bd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread services/computer-use-mcp/src/server/action-executor.ts Outdated
Comment thread services/computer-use-mcp/chrome-extension/background.js
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7d39ec590f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread services/computer-use-mcp/src/server/action-executor.ts
Comment thread services/computer-use-mcp/src/server/register-desktop-grounding.ts Outdated
Comment thread services/computer-use-mcp/chrome-extension/content.js Outdated
@3361559784 3361559784 force-pushed the codex/desktop-v2-browser-routing branch from d015ff7 to d92d173 Compare April 23, 2026 08:36
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d92d1732c2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread services/computer-use-mcp/chrome-extension/content.js
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3a0d95e21c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread apps/stage-tamagotchi/src/renderer/pages/desktop-overlay.vue Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3047fc1aa6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread apps/stage-tamagotchi/src/renderer/pages/desktop-overlay.vue Outdated
Comment thread services/computer-use-mcp/src/chrome-semantic-adapter.ts
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 09bdcb61b1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread services/computer-use-mcp/chrome-extension/background.js Outdated
Comment thread services/computer-use-mcp/src/server/register-desktop-grounding.ts Outdated
Comment thread services/computer-use-mcp/chrome-extension/manifest.json
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1b5ca6a401

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread services/computer-use-mcp/src/desktop-grounding.ts Outdated
Comment thread services/computer-use-mcp/chrome-extension/content.js
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5bb1b3f69b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread services/computer-use-mcp/chrome-extension/background.js
Copilot AI review requested due to automatic review settings April 24, 2026 13:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds “desktop grounding” for the Desktop lane by introducing a unified observe→candidate→click flow, and routes eligible Chrome DOM targets into the browser extension/CDP path to improve click/type precision while maintaining OS-input fallback.

Changes:

  • Add new MCP actions (desktop_observe, desktop_click_target) with RunState support, strategy advisories, and transparency text.
  • Implement grounding pipeline (snapshot aggregation, candidate ranking/dedup, snap resolver) plus browser-dom routing for click/type.
  • Introduce tool descriptor registry + add Electron desktop overlay window (polls desktop_get_state) for visualizing snap/candidates.

Reviewed changes

Copilot reviewed 47 out of 50 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
services/computer-use-mcp/src/types.ts Adds new action kinds/inputs and enriches DOM element metadata (role).
services/computer-use-mcp/src/transparency.ts Adds intent/outcome strings + run-state summary for grounding snapshot/pointer.
services/computer-use-mcp/src/strategy.ts Adds grounding-related advisories (observe-first, stale, duplicate click).
services/computer-use-mcp/src/state.ts Stores grounding snapshot, pointer intent, last clicked candidate + update helpers.
services/computer-use-mcp/src/snap-resolver.ts Implements snapping logic and candidate-ID resolution helpers.
services/computer-use-mcp/src/snap-resolver.test.ts Unit tests for snap/geometry helpers.
services/computer-use-mcp/src/server/tool-descriptors/vscode.ts Adds VS Code tool descriptors.
services/computer-use-mcp/src/server/tool-descriptors/types.ts Defines descriptor types/validation for registry.
services/computer-use-mcp/src/server/tool-descriptors/task-memory.ts Adds task memory tool descriptors.
services/computer-use-mcp/src/server/tool-descriptors/registry.ts Implements descriptor registry/query/validation.
services/computer-use-mcp/src/server/tool-descriptors/registry.test.ts Adds registry validation/query/coverage tests.
services/computer-use-mcp/src/server/tool-descriptors/register-helper.ts Adds descriptor-driven tool registration helpers.
services/computer-use-mcp/src/server/tool-descriptors/pty.ts Adds PTY tool descriptors.
services/computer-use-mcp/src/server/tool-descriptors/index.ts Exports registry + helpers/types.
services/computer-use-mcp/src/server/tool-descriptors/display.ts Adds display tool descriptors.
services/computer-use-mcp/src/server/tool-descriptors/desktop.ts Adds desktop/browser-dom/workflow/meta tool descriptors.
services/computer-use-mcp/src/server/tool-descriptors/coding.ts Adds coding tool descriptors.
services/computer-use-mcp/src/server/tool-descriptors/cdp.ts Adds CDP tool descriptors.
services/computer-use-mcp/src/server/tool-descriptors/all.ts Aggregates descriptors and initializes global registry.
services/computer-use-mcp/src/server/tool-descriptors/accessibility.ts Adds accessibility tool descriptors.
services/computer-use-mcp/src/server/register-desktop-grounding.ts Registers desktop_observe / desktop_click_target and routes to browser-dom when possible.
services/computer-use-mcp/src/server/action-executor.ts Routes type_text through browser-dom setInputValue when appropriate.
services/computer-use-mcp/src/server.ts Wires desktop grounding tool registration into server startup.
services/computer-use-mcp/src/desktop-grounding.ts Implements unified grounding capture + candidate merge/dedup + agent formatting.
services/computer-use-mcp/src/desktop-grounding.test.ts Unit tests for candidate building and formatting.
services/computer-use-mcp/src/desktop-grounding-types.ts Adds grounding snapshot/candidate/snap/pointer intent types.
services/computer-use-mcp/src/chrome-semantic-adapter.ts Captures Chrome semantics via extension/CDP and maps to candidates/selectors.
services/computer-use-mcp/src/chrome-semantic-adapter.test.ts Unit tests for Chrome semantic adapter + routing integration.
services/computer-use-mcp/src/browser-action-router.ts Adds explicit routing decision tree for click/type via browser-dom vs OS input.
services/computer-use-mcp/src/browser-action-router.test.ts Unit tests for routing decisions and consistency.
services/computer-use-mcp/chrome-extension/msg_bridge.js Adds isolated-world relay between background and main-world content script.
services/computer-use-mcp/chrome-extension/manifest.json New MV3 extension manifest for grounding bridge.
services/computer-use-mcp/chrome-extension/icon48.png Extension icon asset.
services/computer-use-mcp/chrome-extension/icon16.png Extension icon asset.
services/computer-use-mcp/chrome-extension/icon128.png Extension icon asset.
services/computer-use-mcp/chrome-extension/content.js Main-world DOM collector + limited DOM action helpers.
services/computer-use-mcp/chrome-extension/background.js Background worker that relays commands and maintains WS connection.
services/computer-use-mcp/chrome-extension/README.md Extension documentation.
packages/stage-ui/src/stores/mcp-tool-bridge.ts Adds a minimal MCP tool bridge interface for UI consumers.
packages/stage-ui/package.json Exports the new stage-ui store entrypoint.
apps/stage-tamagotchi/src/renderer/pages/desktop-overlay.vue Adds overlay renderer to visualize pointer/candidate boxes.
apps/stage-tamagotchi/src/renderer/pages/desktop-overlay-polling.ts Adds polling/extraction logic for overlay (pure module).
apps/stage-tamagotchi/src/renderer/pages/desktop-overlay-polling.test.ts Unit tests for overlay polling/extraction.
apps/stage-tamagotchi/src/renderer/pages/desktop-overlay-coordinates.ts Coordinate conversion helpers for overlay rendering.
apps/stage-tamagotchi/src/renderer/pages/desktop-overlay-coordinates.test.ts Unit tests for overlay coordinate helpers.
apps/stage-tamagotchi/src/main/windows/desktop-overlay/rpc/index.electron.ts Sets up Electron RPC services for overlay window.
apps/stage-tamagotchi/src/main/windows/desktop-overlay/index.ts Creates the click-through always-on-top overlay window (env-gated).
apps/stage-tamagotchi/src/main/index.ts Boots overlay window when AIRI_DESKTOP_OVERLAY=1.
apps/stage-tamagotchi/electron.vite.config.ts Adds explicit alias for stage-ui stores/mcp-tool-bridge.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread services/computer-use-mcp/chrome-extension/background.js
Comment thread services/computer-use-mcp/src/server/tool-descriptors/types.ts
Comment thread services/computer-use-mcp/src/desktop-grounding.ts
Comment thread services/computer-use-mcp/src/server/action-executor.ts
Comment thread services/computer-use-mcp/src/server/register-desktop-grounding.ts
Comment thread services/computer-use-mcp/src/server/register-desktop-grounding.ts
Comment thread services/computer-use-mcp/chrome-extension/README.md Outdated
Comment thread services/computer-use-mcp/chrome-extension/content.js Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

if ('success' in record)
return Boolean(record.success)

P2 Badge Unwrap frame payload before evaluating browser_dom success

This success predicate reads top-level frame.result.success, but extension replies are wrapped as { success: true, data: <tool result> } by content.js for any non-throwing call. That means frame-level failures like { data: { success: false, error: 'not found' } } are still counted as successful, so browser_dom tool summaries can claim N/N frame(s) succeeded when actions actually failed in every frame.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread services/computer-use-mcp/src/server/register-desktop-grounding.ts Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d83d0dedbc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread services/computer-use-mcp/src/browser-action-router.ts Outdated
Comment thread services/computer-use-mcp/chrome-extension/content.js
@3361559784
Copy link
Copy Markdown
Contributor Author

Addressed the two new review comments in 66ac599:

P1 — role=textbox routing (browser-action-router.ts): Fixed. Removed the role === 'textbox' catch-all from isTextInputCandidate. Now only actual <input> and <textarea> elements are routed to setInputValue. Contenteditable divs fall through to OS typing via desktop_type_text.

P2 — simulateKeystrokes (content.js): Added TODO — implementing per-character KeyboardEvent dispatch is non-trivial (need proper key/code/keyCode mapping, composition events for CJK, etc.). The current direct value assignment + input/change events works for the majority of forms. Will track as a follow-up.

Also fixed 2 eslint innerText lint errors (intentional use — innerText returns layout-aware visible text, textContent doesn't).

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

content: [
textContent(`Clicked selector "${selector}" in frame ${result.targetFrameId} at (${result.targetPoint.x}, ${result.targetPoint.y}).`),
],

P1 Badge Check clickResults before reporting browser_dom_click success

browser_dom_click always emits a successful Clicked selector... response after clickSelector resolves, but clickSelector can return clickResults where every frame reports { success: false } (for example after a reflow between target lookup and click dispatch). In that case this tool still reports status: "ok" even though no DOM click happened, which can make downstream automation proceed under a false success assumption.


if ('success' in record)
return Boolean(record.success)

P2 Badge Unwrap nested browser_dom payloads before success counting

didBrowserDomFrameSucceed treats the outer frame.result.success as the operation outcome, but extension responses are wrapped and can look like { success: true, data: { success: false, error: ... } }. This causes frame-level failures to be counted as successes in summaries, so tools can report misleading X/Y frame(s) succeeded counts and hide real DOM-action failures.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@3361559784
Copy link
Copy Markdown
Contributor Author

Addressed latest review in efd1a13:

P1 — browser_dom_click clickResults (register-tools.ts): Fixed. Now validates per-frame clickResults before reporting success. Returns isError: true with status: 'click_miss' if no frame reported a successful DOM click. Same pattern as register-desktop-grounding.ts fix.

P2 — nested payload unwrapping (didBrowserDomFrameSucceed): Not taking this. The extension content.js returns flat { success: true/false, error?: string } — there is no { success: true, data: { success: false } } nesting in the actual protocol. The reviewer is assuming a wrapping pattern that doesn't exist in the codebase. didBrowserDomFrameSucceed checking record.success is correct.

autofix-ci Bot and others added 22 commits April 24, 2026 23:46
…ting click success

When executeAction returns a denial or approval-required result the
handler was proceeding unconditionally to the 'Clicked: success'
response and updating pointer/clicked-candidate state, misleading the
agent into assuming the target was clicked when it was only queued or
blocked by policy.

Fix: capture the CallToolResult from both the OS-input and browser-dom
fallback paths and return it early when structuredContent.status is
'approval_required' or 'denied', or when isError is set. Post-click
state updates are now only reached when execution actually occurred.
…k unknown bridge actions as failed

P1 (content.js): getClickTarget returned click coordinates only under
the nested `center` object. BrowserDomExtensionBridge.clickSelector
reads top-level x/y from the unwrapped payload, so extension clicks
always failed to resolve a target point and fell back to OS input.
Expose x/y at both the top level and under center for compatibility.

P1 (background.js): unknown actions fell into the default branch and
returned { ok: true, result: { error: '...' } }. The bridge treated
any ok:true as success, so callers like setInputValue/checkCheckbox
saw a resolved promise and never triggered their OS-input fallback
paths, producing silent no-ops. Changed to return ok:false so the
bridge correctly rejects and callers fall back.
…coords; add write actions to extension

P1 (action-executor.ts): type_text with explicit x/y coordinates was still
routing through lastClickedCandidateId to setInputValue, so text could land
in a stale Chrome selector when the user had moved to a different app or
window. Gate the browser-dom type path out when explicit coordinates are
present — coordinates imply the caller has already targeted a specific screen
position that may not correspond to the last Chrome candidate.

P1 (background.js + content.js): the extension command switch only handled
read-only observation actions. BrowserDomExtensionBridge issues write
commands (setInputValue, checkCheckbox, selectOption) that all hit the
default branch and returned ok:false, so browser-dom write tools never
succeeded and always degraded to OS-input fallback.

Add setInputValue, checkCheckbox, selectOption implementations to content.js
(bridge-triggered, limited DOM mutations using native value setter +
framework-compatible events). Wire the three actions into background.js
switch to route them to the correct content.js methods via runCUAction.
rolldown v1.0.0-rc.15 does not reliably resolve exports-field glob
patterns ("./stores/*") against new files introduced in the same PR,
causing stage-tamagotchi build and typecheck to fail with UNLOADABLE_DEPENDENCY.

Add an explicit entry for mcp-tool-bridge alongside existing precise
entries to guarantee the module is resolvable regardless of glob
support in the bundler.
… setter

P1 (content.js): setInputValue always preferred HTMLInputElement.prototype
value setter via || chain. When the target is a <textarea>, calling the
input setter throws 'Illegal invocation'. Now selects the prototype matching
the actual element type (HTMLTextAreaElement vs HTMLInputElement).

P1 (action-executor.ts): setInputValue only throws on transport failure —
per-frame DOM failures (selector not found, frame stale) resolve with
{ success: false } in the frame result array. The code unconditionally set
usedBrowserDom = true, skipping OS typeText fallback on silent DOM failure.
Now checks that at least one frame reported success before marking browser-dom
used; otherwise re-throws to enter the catch fallback.

P1 (register-desktop-grounding.ts): checkCheckbox has the same frame-result
gap. Bridge resolve with all frames returning success:false was treated as
success, producing a 'Clicked: success' response without any DOM mutation.
Now validates frame results and throws on total failure to trigger OS click
fallback through the policy pipeline.
Dispatching a synthetic click after manually setting el.checked = target
can cause React controlled-component onChange handlers to toggle the value
back, since the browser's real event order is click→change. Only dispatch
change, which is sufficient for Vue @change and React onChange bindings.
Update the JSDoc to explain why no click event is emitted.
…n and ARIA checkbox guard

Build (stage-tamagotchi): @proj-airi/stage-ui alias points to a directory;
rolldown concatenates sub-paths without adding a file extension, so flat .ts
files at the stores/ root (mcp-tool-bridge.ts) are not resolved.
Add an explicit alias entry for mcp-tool-bridge before the directory-level
alias so rolldown finds the file directly.

P1 (content.js checkCheckbox): for custom ARIA checkboxes (div[role=checkbox])
writing el.checked only creates an expando property — no native state changes
and nothing is visually toggled. The bridge still returned success:true,
so desktop_click_target reported click success and skipped OS-input fallback.
Add an instanceof HTMLInputElement guard: non-native elements now return
success:false, which triggers the frame-result validation added earlier and
lets the handler fall back to OS click through the policy pipeline.
mcp-tool-bridge.ts was deleted in main by a refactor (5bbf955) that
moved LLM tool registration to llm-tools.ts. However, that refactor
targeted a different concern (xsai tool registration), while desktop-
overlay.vue uses getMcpToolBridge().callTool() to invoke computer-use
MCP tools at runtime — a separate bridge interface not replaced by the
refactor.

Restore the file with the original interface plus expanded JSDoc so the
desktop overlay pages can resolve and use it.
The overlay window never calls setMcpToolBridge(), so the polling controller
failed immediately with 'MCP tool bridge is not available'. Since the
computer-use MCP tools are already exposed to the main process via
electronMcpCallTool (registered during setupDesktopOverlayElectronInvokes),
we can bypass the bridge singleton entirely and invoke the eventa route directly.
The previous commit introduced an unresolved import error in the tamagotchi
renderer build because the relative path to shared/eventa stepped one directory
too high (../../../shared/eventa instead of ../../shared/eventa). Fixed the
path in desktop-overlay.vue, desktop-overlay-polling.ts, and its test file.
…WebSocket relay

- background.js: read blur/simulateKeystrokes from cmd.opts (not top-level)
  to match BrowserDomExtensionBridge.setInputValue() wire format
- state.ts: split updatePointerIntent into updatePointerIntent (intent only)
  and recordClickedCandidate (candidate id only), so duplicate-click guard
  only fires after execution succeeds, not before
- register-desktop-grounding.ts: move recordClickedCandidate after action
  execution; if policy denies/requires approval, candidate is not marked
- background.js: replace dead ws-incoming listener with native WebSocket
  relay connecting to BrowserDomExtensionBridge on localhost:8765
…e extension

Add readInputValue, getComputedStyles, triggerEvent, waitForElement, and
clickAt to both background.js (command router) and content.js (DOM API).

These actions are called by BrowserDomExtensionBridge but were missing from
the extension switch block, causing them to return 'unknown action' and
making the bridge methods fail silently.

Also makes the content.js message handler async-aware so waitForElement
(which uses MutationObserver and returns a Promise) is properly awaited.
… paths, docstring

- findChromeWindowBounds: prefer exact foregroundApp match over substring
  search to avoid selecting wrong Chrome window when multiple exist
- README: update to reflect actual DOM-mutation capabilities, document
  WebSocket relay, remove stale 'no DOM mutations' claim
- Remove local filesystem paths (/Users/liuziheng/...) from provenance
  comments in background.js, content.js, msg_bridge.js
- ToolDescriptor: fix docstring to acknowledge defaultDeferred is optional
- Add TODO for WebSocket auth handshake follow-up
…porting success

clickSelector can resolve even when clickAt misses (e.g. layout shift after
getClickTarget). Previously the handler always reported success and locked
lastClickedCandidateId, blocking retries via duplicate-click protection.

Now inspects per-frame results from clickAt; if none report success, throws
to trigger OS-input fallback — matching the existing checkCheckbox pattern.
… fix lint

- isTextInputCandidate: remove role=textbox catch-all that incorrectly
  routed contenteditable divs to setInputValue (which only works on native
  input/textarea). These targets now fall through to OS typing.

- content.js: suppress unicorn/prefer-dom-node-text-content on intentional
  innerText uses (we want layout-aware visible text, not raw DOM text).

- content.js: add TODO for simulateKeystrokes option (currently ignored).
…dler

Same class of bug as register-desktop-grounding.ts fix (d83d0de).
clickSelector can resolve even when clickAt misses after a reflow.
Now inspects per-frame results; reports isError + click_miss status
if no frame reported success.
…andler

The message handler was wrapping method results as
  { success: true, data: <methodResult> }
which created a double-envelope: methods like setInputValue already return
  { success: false, error: 'not found' }
so consumers checking fr.result.success always saw true (transport layer)
while the real DOM failure was buried in fr.result.data.success.

Now returns method results directly. This fixes all clickResults/frameResults
success checks in register-desktop-grounding.ts, register-tools.ts, and
action-executor.ts — they were checking the correct field (fr.result.success)
but getting the wrong semantics due to the wrapper.
@3361559784 3361559784 force-pushed the codex/desktop-v2-browser-routing branch from 4b6c5ce to 9c0284f Compare April 24, 2026 15:51
- Restore DESKTOP_CLICK_SNAPSHOT_MAX_AGE_MS and foregroundWindowTitle from main
- Align updatePointerIntent signature to accept candidateId (matches main)
- Fix test call sites for updated signature
@3361559784 3361559784 force-pushed the codex/desktop-v2-browser-routing branch from 9c0284f to ab1114d Compare April 24, 2026 15:54
autofix-ci Bot and others added 2 commits April 24, 2026 15:57
The noop reference was introduced in moeru-ai#1647 without an import.
Use an inline arrow function instead.
@nekomeowww nekomeowww changed the title [2/3] feat(desktop): implement browser-native DOM action routing feat(stage-tamagotchi,computer-use-mcp): implement browser-native DOM action routing Apr 24, 2026
@nekomeowww nekomeowww merged commit 255ce71 into moeru-ai:main Apr 24, 2026
10 checks passed
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: da94fe735e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

}
: snapshot.screenshot,
})
runtime.stateManager.updateGroundingSnapshot(snapshot)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Strip screenshot bytes before saving grounding snapshot

desktop_observe now writes the full snapshot into RunState, including screenshot.dataBase64. Because desktop_get_state returns the whole run state and the overlay poller requests it repeatedly, this turns every state response into a large image payload and can significantly increase IPC traffic and memory churn in normal use. Keep returning the image in the observe response, but redact screenshot bytes when persisting lastGroundingSnapshot (as the previous code path did).

Useful? React with 👍 / 👎.

Comment on lines 162 to 164
const result = await Promise.race([
currentCall,
config.callTool(MCP_TOOL_NAME),
new Promise<never>((_, reject) =>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Cap timed-out overlay polls to prevent invoke buildup

This poll loop races callTool against a timeout but never cancels or tracks the timed-out invoke. If the Eventa call hangs (the code comment already calls out this startup race), each cycle schedules a new invoke while previous ones remain pending indefinitely, so prolonged outages accumulate unresolved RPC calls. Reintroduce an in-flight/lease cap (or equivalent guard) so fallback retries do not create unbounded outstanding calls.

Useful? React with 👍 / 👎.

// TODO: Add shared-secret auth handshake to prevent rogue localhost processes
// from hijacking the bridge. The bridge server should generate a token and
// inject it into chrome.storage.local so the extension can present it on hello.
const WS_URL = 'ws://localhost:8765'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use configured bridge endpoint instead of fixed localhost URL

The extension now hardcodes ws://localhost:8765, but the server runtime still supports non-default browser-dom bridge host/port configuration. In environments that set a different endpoint, the extension cannot connect, so browser-dom routing never activates and all DOM-native actions degrade to fallback behavior. Restore endpoint configurability (or handshake-based endpoint discovery) rather than a fixed URL.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

apps/stage-tamagotchi Desktop App: Windows & macOS & Linux pr-review/ok-to-deploy Pull Request that confirmed to be deploy to either Preview or Prod safe priority/urgent Issue, or Pull Request that urgent to be fixed or processed scope/extension Scope related to extension api, or internally known as tentacle api, mod api, plugin api scope/ui Scope related to UI/UX, or interface improve, perf, and bugs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants