feat(stage-tamagotchi,computer-use-mcp): implement browser-native DOM action routing#1648
Conversation
⏳ Approval required for deploying to Cloudflare Workers (Preview) for stage-web.
Hey, maintainers, kindly take some time to review and approve this deployment when you are available. Thank you! 🙏 |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 50d218e4d6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Code Review
This pull request implements a comprehensive desktop grounding layer for macOS automation, featuring a transparent visualization overlay, a read-only Chrome extension for DOM observation, and unified observation tools like desktop_observe and desktop_click_target. It also introduces a snap resolution engine and a descriptor-driven tool registration system. Feedback focuses on increasing the staleness threshold for click actions to better handle LLM latency, fixing a memory leak in the overlay polling timer, and improving the robustness of CSS selector generation and application detection consistency.
b815b7f to
4caca43
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d9d02ec11b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7c35c91fa1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b0861357bd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7d39ec590f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
d015ff7 to
d92d173
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d92d1732c2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3a0d95e21c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3047fc1aa6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 09bdcb61b1
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1b5ca6a401
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5bb1b3f69b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR adds “desktop grounding” for the Desktop lane by introducing a unified observe→candidate→click flow, and routes eligible Chrome DOM targets into the browser extension/CDP path to improve click/type precision while maintaining OS-input fallback.
Changes:
- Add new MCP actions (
desktop_observe,desktop_click_target) with RunState support, strategy advisories, and transparency text. - Implement grounding pipeline (snapshot aggregation, candidate ranking/dedup, snap resolver) plus browser-dom routing for click/type.
- Introduce tool descriptor registry + add Electron desktop overlay window (polls
desktop_get_state) for visualizing snap/candidates.
Reviewed changes
Copilot reviewed 47 out of 50 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| services/computer-use-mcp/src/types.ts | Adds new action kinds/inputs and enriches DOM element metadata (role). |
| services/computer-use-mcp/src/transparency.ts | Adds intent/outcome strings + run-state summary for grounding snapshot/pointer. |
| services/computer-use-mcp/src/strategy.ts | Adds grounding-related advisories (observe-first, stale, duplicate click). |
| services/computer-use-mcp/src/state.ts | Stores grounding snapshot, pointer intent, last clicked candidate + update helpers. |
| services/computer-use-mcp/src/snap-resolver.ts | Implements snapping logic and candidate-ID resolution helpers. |
| services/computer-use-mcp/src/snap-resolver.test.ts | Unit tests for snap/geometry helpers. |
| services/computer-use-mcp/src/server/tool-descriptors/vscode.ts | Adds VS Code tool descriptors. |
| services/computer-use-mcp/src/server/tool-descriptors/types.ts | Defines descriptor types/validation for registry. |
| services/computer-use-mcp/src/server/tool-descriptors/task-memory.ts | Adds task memory tool descriptors. |
| services/computer-use-mcp/src/server/tool-descriptors/registry.ts | Implements descriptor registry/query/validation. |
| services/computer-use-mcp/src/server/tool-descriptors/registry.test.ts | Adds registry validation/query/coverage tests. |
| services/computer-use-mcp/src/server/tool-descriptors/register-helper.ts | Adds descriptor-driven tool registration helpers. |
| services/computer-use-mcp/src/server/tool-descriptors/pty.ts | Adds PTY tool descriptors. |
| services/computer-use-mcp/src/server/tool-descriptors/index.ts | Exports registry + helpers/types. |
| services/computer-use-mcp/src/server/tool-descriptors/display.ts | Adds display tool descriptors. |
| services/computer-use-mcp/src/server/tool-descriptors/desktop.ts | Adds desktop/browser-dom/workflow/meta tool descriptors. |
| services/computer-use-mcp/src/server/tool-descriptors/coding.ts | Adds coding tool descriptors. |
| services/computer-use-mcp/src/server/tool-descriptors/cdp.ts | Adds CDP tool descriptors. |
| services/computer-use-mcp/src/server/tool-descriptors/all.ts | Aggregates descriptors and initializes global registry. |
| services/computer-use-mcp/src/server/tool-descriptors/accessibility.ts | Adds accessibility tool descriptors. |
| services/computer-use-mcp/src/server/register-desktop-grounding.ts | Registers desktop_observe / desktop_click_target and routes to browser-dom when possible. |
| services/computer-use-mcp/src/server/action-executor.ts | Routes type_text through browser-dom setInputValue when appropriate. |
| services/computer-use-mcp/src/server.ts | Wires desktop grounding tool registration into server startup. |
| services/computer-use-mcp/src/desktop-grounding.ts | Implements unified grounding capture + candidate merge/dedup + agent formatting. |
| services/computer-use-mcp/src/desktop-grounding.test.ts | Unit tests for candidate building and formatting. |
| services/computer-use-mcp/src/desktop-grounding-types.ts | Adds grounding snapshot/candidate/snap/pointer intent types. |
| services/computer-use-mcp/src/chrome-semantic-adapter.ts | Captures Chrome semantics via extension/CDP and maps to candidates/selectors. |
| services/computer-use-mcp/src/chrome-semantic-adapter.test.ts | Unit tests for Chrome semantic adapter + routing integration. |
| services/computer-use-mcp/src/browser-action-router.ts | Adds explicit routing decision tree for click/type via browser-dom vs OS input. |
| services/computer-use-mcp/src/browser-action-router.test.ts | Unit tests for routing decisions and consistency. |
| services/computer-use-mcp/chrome-extension/msg_bridge.js | Adds isolated-world relay between background and main-world content script. |
| services/computer-use-mcp/chrome-extension/manifest.json | New MV3 extension manifest for grounding bridge. |
| services/computer-use-mcp/chrome-extension/icon48.png | Extension icon asset. |
| services/computer-use-mcp/chrome-extension/icon16.png | Extension icon asset. |
| services/computer-use-mcp/chrome-extension/icon128.png | Extension icon asset. |
| services/computer-use-mcp/chrome-extension/content.js | Main-world DOM collector + limited DOM action helpers. |
| services/computer-use-mcp/chrome-extension/background.js | Background worker that relays commands and maintains WS connection. |
| services/computer-use-mcp/chrome-extension/README.md | Extension documentation. |
| packages/stage-ui/src/stores/mcp-tool-bridge.ts | Adds a minimal MCP tool bridge interface for UI consumers. |
| packages/stage-ui/package.json | Exports the new stage-ui store entrypoint. |
| apps/stage-tamagotchi/src/renderer/pages/desktop-overlay.vue | Adds overlay renderer to visualize pointer/candidate boxes. |
| apps/stage-tamagotchi/src/renderer/pages/desktop-overlay-polling.ts | Adds polling/extraction logic for overlay (pure module). |
| apps/stage-tamagotchi/src/renderer/pages/desktop-overlay-polling.test.ts | Unit tests for overlay polling/extraction. |
| apps/stage-tamagotchi/src/renderer/pages/desktop-overlay-coordinates.ts | Coordinate conversion helpers for overlay rendering. |
| apps/stage-tamagotchi/src/renderer/pages/desktop-overlay-coordinates.test.ts | Unit tests for overlay coordinate helpers. |
| apps/stage-tamagotchi/src/main/windows/desktop-overlay/rpc/index.electron.ts | Sets up Electron RPC services for overlay window. |
| apps/stage-tamagotchi/src/main/windows/desktop-overlay/index.ts | Creates the click-through always-on-top overlay window (env-gated). |
| apps/stage-tamagotchi/src/main/index.ts | Boots overlay window when AIRI_DESKTOP_OVERLAY=1. |
| apps/stage-tamagotchi/electron.vite.config.ts | Adds explicit alias for stage-ui stores/mcp-tool-bridge. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
airi/services/computer-use-mcp/src/server/register-tools.ts
Lines 74 to 75 in a3c915f
This success predicate reads top-level frame.result.success, but extension replies are wrapped as { success: true, data: <tool result> } by content.js for any non-throwing call. That means frame-level failures like { data: { success: false, error: 'not found' } } are still counted as successful, so browser_dom tool summaries can claim N/N frame(s) succeeded when actions actually failed in every frame.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d83d0dedbc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Addressed the two new review comments in 66ac599: P1 — role=textbox routing (browser-action-router.ts): Fixed. Removed the P2 — simulateKeystrokes (content.js): Added Also fixed 2 eslint |
There was a problem hiding this comment.
💡 Codex Review
airi/services/computer-use-mcp/src/server/register-tools.ts
Lines 565 to 567 in 6a3dc6a
browser_dom_click always emits a successful Clicked selector... response after clickSelector resolves, but clickSelector can return clickResults where every frame reports { success: false } (for example after a reflow between target lookup and click dispatch). In that case this tool still reports status: "ok" even though no DOM click happened, which can make downstream automation proceed under a false success assumption.
airi/services/computer-use-mcp/src/server/register-tools.ts
Lines 74 to 75 in 6a3dc6a
didBrowserDomFrameSucceed treats the outer frame.result.success as the operation outcome, but extension responses are wrapped and can look like { success: true, data: { success: false, error: ... } }. This causes frame-level failures to be counted as successes in summaries, so tools can report misleading X/Y frame(s) succeeded counts and hide real DOM-action failures.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Addressed latest review in efd1a13: P1 — browser_dom_click clickResults ( P2 — nested payload unwrapping ( |
…ting click success When executeAction returns a denial or approval-required result the handler was proceeding unconditionally to the 'Clicked: success' response and updating pointer/clicked-candidate state, misleading the agent into assuming the target was clicked when it was only queued or blocked by policy. Fix: capture the CallToolResult from both the OS-input and browser-dom fallback paths and return it early when structuredContent.status is 'approval_required' or 'denied', or when isError is set. Post-click state updates are now only reached when execution actually occurred.
…k unknown bridge actions as failed
P1 (content.js): getClickTarget returned click coordinates only under
the nested `center` object. BrowserDomExtensionBridge.clickSelector
reads top-level x/y from the unwrapped payload, so extension clicks
always failed to resolve a target point and fell back to OS input.
Expose x/y at both the top level and under center for compatibility.
P1 (background.js): unknown actions fell into the default branch and
returned { ok: true, result: { error: '...' } }. The bridge treated
any ok:true as success, so callers like setInputValue/checkCheckbox
saw a resolved promise and never triggered their OS-input fallback
paths, producing silent no-ops. Changed to return ok:false so the
bridge correctly rejects and callers fall back.
…coords; add write actions to extension P1 (action-executor.ts): type_text with explicit x/y coordinates was still routing through lastClickedCandidateId to setInputValue, so text could land in a stale Chrome selector when the user had moved to a different app or window. Gate the browser-dom type path out when explicit coordinates are present — coordinates imply the caller has already targeted a specific screen position that may not correspond to the last Chrome candidate. P1 (background.js + content.js): the extension command switch only handled read-only observation actions. BrowserDomExtensionBridge issues write commands (setInputValue, checkCheckbox, selectOption) that all hit the default branch and returned ok:false, so browser-dom write tools never succeeded and always degraded to OS-input fallback. Add setInputValue, checkCheckbox, selectOption implementations to content.js (bridge-triggered, limited DOM mutations using native value setter + framework-compatible events). Wire the three actions into background.js switch to route them to the correct content.js methods via runCUAction.
rolldown v1.0.0-rc.15 does not reliably resolve exports-field glob
patterns ("./stores/*") against new files introduced in the same PR,
causing stage-tamagotchi build and typecheck to fail with UNLOADABLE_DEPENDENCY.
Add an explicit entry for mcp-tool-bridge alongside existing precise
entries to guarantee the module is resolvable regardless of glob
support in the bundler.
… setter
P1 (content.js): setInputValue always preferred HTMLInputElement.prototype
value setter via || chain. When the target is a <textarea>, calling the
input setter throws 'Illegal invocation'. Now selects the prototype matching
the actual element type (HTMLTextAreaElement vs HTMLInputElement).
P1 (action-executor.ts): setInputValue only throws on transport failure —
per-frame DOM failures (selector not found, frame stale) resolve with
{ success: false } in the frame result array. The code unconditionally set
usedBrowserDom = true, skipping OS typeText fallback on silent DOM failure.
Now checks that at least one frame reported success before marking browser-dom
used; otherwise re-throws to enter the catch fallback.
P1 (register-desktop-grounding.ts): checkCheckbox has the same frame-result
gap. Bridge resolve with all frames returning success:false was treated as
success, producing a 'Clicked: success' response without any DOM mutation.
Now validates frame results and throws on total failure to trigger OS click
fallback through the policy pipeline.
Dispatching a synthetic click after manually setting el.checked = target can cause React controlled-component onChange handlers to toggle the value back, since the browser's real event order is click→change. Only dispatch change, which is sufficient for Vue @change and React onChange bindings. Update the JSDoc to explain why no click event is emitted.
…n and ARIA checkbox guard Build (stage-tamagotchi): @proj-airi/stage-ui alias points to a directory; rolldown concatenates sub-paths without adding a file extension, so flat .ts files at the stores/ root (mcp-tool-bridge.ts) are not resolved. Add an explicit alias entry for mcp-tool-bridge before the directory-level alias so rolldown finds the file directly. P1 (content.js checkCheckbox): for custom ARIA checkboxes (div[role=checkbox]) writing el.checked only creates an expando property — no native state changes and nothing is visually toggled. The bridge still returned success:true, so desktop_click_target reported click success and skipped OS-input fallback. Add an instanceof HTMLInputElement guard: non-native elements now return success:false, which triggers the frame-result validation added earlier and lets the handler fall back to OS click through the policy pipeline.
mcp-tool-bridge.ts was deleted in main by a refactor (5bbf955) that moved LLM tool registration to llm-tools.ts. However, that refactor targeted a different concern (xsai tool registration), while desktop- overlay.vue uses getMcpToolBridge().callTool() to invoke computer-use MCP tools at runtime — a separate bridge interface not replaced by the refactor. Restore the file with the original interface plus expanded JSDoc so the desktop overlay pages can resolve and use it.
The overlay window never calls setMcpToolBridge(), so the polling controller failed immediately with 'MCP tool bridge is not available'. Since the computer-use MCP tools are already exposed to the main process via electronMcpCallTool (registered during setupDesktopOverlayElectronInvokes), we can bypass the bridge singleton entirely and invoke the eventa route directly.
The previous commit introduced an unresolved import error in the tamagotchi renderer build because the relative path to shared/eventa stepped one directory too high (../../../shared/eventa instead of ../../shared/eventa). Fixed the path in desktop-overlay.vue, desktop-overlay-polling.ts, and its test file.
…WebSocket relay - background.js: read blur/simulateKeystrokes from cmd.opts (not top-level) to match BrowserDomExtensionBridge.setInputValue() wire format - state.ts: split updatePointerIntent into updatePointerIntent (intent only) and recordClickedCandidate (candidate id only), so duplicate-click guard only fires after execution succeeds, not before - register-desktop-grounding.ts: move recordClickedCandidate after action execution; if policy denies/requires approval, candidate is not marked - background.js: replace dead ws-incoming listener with native WebSocket relay connecting to BrowserDomExtensionBridge on localhost:8765
…e extension Add readInputValue, getComputedStyles, triggerEvent, waitForElement, and clickAt to both background.js (command router) and content.js (DOM API). These actions are called by BrowserDomExtensionBridge but were missing from the extension switch block, causing them to return 'unknown action' and making the bridge methods fail silently. Also makes the content.js message handler async-aware so waitForElement (which uses MutationObserver and returns a Promise) is properly awaited.
… paths, docstring - findChromeWindowBounds: prefer exact foregroundApp match over substring search to avoid selecting wrong Chrome window when multiple exist - README: update to reflect actual DOM-mutation capabilities, document WebSocket relay, remove stale 'no DOM mutations' claim - Remove local filesystem paths (/Users/liuziheng/...) from provenance comments in background.js, content.js, msg_bridge.js - ToolDescriptor: fix docstring to acknowledge defaultDeferred is optional - Add TODO for WebSocket auth handshake follow-up
…porting success clickSelector can resolve even when clickAt misses (e.g. layout shift after getClickTarget). Previously the handler always reported success and locked lastClickedCandidateId, blocking retries via duplicate-click protection. Now inspects per-frame results from clickAt; if none report success, throws to trigger OS-input fallback — matching the existing checkCheckbox pattern.
… fix lint - isTextInputCandidate: remove role=textbox catch-all that incorrectly routed contenteditable divs to setInputValue (which only works on native input/textarea). These targets now fall through to OS typing. - content.js: suppress unicorn/prefer-dom-node-text-content on intentional innerText uses (we want layout-aware visible text, not raw DOM text). - content.js: add TODO for simulateKeystrokes option (currently ignored).
…dler Same class of bug as register-desktop-grounding.ts fix (d83d0de). clickSelector can resolve even when clickAt misses after a reflow. Now inspects per-frame results; reports isError + click_miss status if no frame reported success.
…andler
The message handler was wrapping method results as
{ success: true, data: <methodResult> }
which created a double-envelope: methods like setInputValue already return
{ success: false, error: 'not found' }
so consumers checking fr.result.success always saw true (transport layer)
while the real DOM failure was buried in fr.result.data.success.
Now returns method results directly. This fixes all clickResults/frameResults
success checks in register-desktop-grounding.ts, register-tools.ts, and
action-executor.ts — they were checking the correct field (fr.result.success)
but getting the wrong semantics due to the wrapper.
4b6c5ce to
9c0284f
Compare
- Restore DESKTOP_CLICK_SNAPSHOT_MAX_AGE_MS and foregroundWindowTitle from main - Align updatePointerIntent signature to accept candidateId (matches main) - Fix test call sites for updated signature
9c0284f to
ab1114d
Compare
The noop reference was introduced in moeru-ai#1647 without an import. Use an inline arrow function instead.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: da94fe735e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
| : snapshot.screenshot, | ||
| }) | ||
| runtime.stateManager.updateGroundingSnapshot(snapshot) |
There was a problem hiding this comment.
Strip screenshot bytes before saving grounding snapshot
desktop_observe now writes the full snapshot into RunState, including screenshot.dataBase64. Because desktop_get_state returns the whole run state and the overlay poller requests it repeatedly, this turns every state response into a large image payload and can significantly increase IPC traffic and memory churn in normal use. Keep returning the image in the observe response, but redact screenshot bytes when persisting lastGroundingSnapshot (as the previous code path did).
Useful? React with 👍 / 👎.
| const result = await Promise.race([ | ||
| currentCall, | ||
| config.callTool(MCP_TOOL_NAME), | ||
| new Promise<never>((_, reject) => |
There was a problem hiding this comment.
Cap timed-out overlay polls to prevent invoke buildup
This poll loop races callTool against a timeout but never cancels or tracks the timed-out invoke. If the Eventa call hangs (the code comment already calls out this startup race), each cycle schedules a new invoke while previous ones remain pending indefinitely, so prolonged outages accumulate unresolved RPC calls. Reintroduce an in-flight/lease cap (or equivalent guard) so fallback retries do not create unbounded outstanding calls.
Useful? React with 👍 / 👎.
| // TODO: Add shared-secret auth handshake to prevent rogue localhost processes | ||
| // from hijacking the bridge. The bridge server should generate a token and | ||
| // inject it into chrome.storage.local so the extension can present it on hello. | ||
| const WS_URL = 'ws://localhost:8765' |
There was a problem hiding this comment.
Use configured bridge endpoint instead of fixed localhost URL
The extension now hardcodes ws://localhost:8765, but the server runtime still supports non-default browser-dom bridge host/port configuration. In environments that set a different endpoint, the extension cannot connect, so browser-dom routing never activates and all DOM-native actions degrade to fallback behavior. Restore endpoint configurability (or handshake-based endpoint discovery) rather than a fixed URL.
Useful? React with 👍 / 👎.
这是 Desktop 线路的第 2 / 3 步。基于
1/3的观察层,解决执行精准度问题。将符合条件的chrome_dom目标拦截,并在可能的情况下绕开 OS-level 点击,直插浏览器 extension/CDP 事件流。本次包含:
browser-action-router,提供对click、checkCheckbox、setInputValue的显式类型路由决策树。不在本次范围:
browser_dom回退到os_input时,如果页面正好发生了重排(Reflow),落脚点坐标依然可能失效。