[3/3] feat(desktop): intro agent-owned session and ghost pointer phases#1649
Conversation
⏳ Approval required for deploying to Cloudflare Workers (Preview) for stage-web.
Hey, maintainers, kindly take some time to review and approve this deployment when you are available. Thank you! 🙏 |
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive desktop grounding layer for macOS, enabling high-precision automation through a unified observation system that integrates screenshots, accessibility trees, and Chrome DOM data. Key additions include a transparent Electron overlay for ghost pointer visualization, a Chrome extension for read-only DOM observation, and a suite of MCP tools for target discovery and snap-resolved interactions. Feedback focuses on improving code maintainability by refactoring complex routing logic into helper functions and optimizing the initialization sequence of the desktop overlay window.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1a7371c4a0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
the session lifecycle separation between user and agent is the right architectural call. we deal with the same problem in our macOS desktop automation, where the agent needs to click and type without interfering with the user's real mouse and keyboard. for the ghost pointer approach, one thing to watch out on macOS: CGEvent posts always move the real system cursor. there's no native way to have a "virtual cursor" that only affects a specific app window. the workaround we use is to save the real cursor position before the agent acts (CGEvent.getLocation), post the agent's events, then immediately restore the cursor position via CGWarpMouseCursorPosition. this creates a brief flicker but is functionally invisible at 60fps. for the preview -> executing -> completed state flow, the visual feedback is important. but make sure the overlay layer doesn't intercept mouse events from the user during the executing phase. on macOS, NSWindow with level .floating and ignoresMouseEvents = true lets you render the pointer animation without blocking real input. the CDP bridge lifecycle cleanup on session crash is critical. we've seen orphaned Chrome DevTools sessions consume GBs of memory over time. a heartbeat ping every 5 seconds with automatic teardown on 3 consecutive failures works well. |
|
our macOS MCP server with the CGEvent cursor save/restore pattern and AXUIElement window management for agent actions without stealing user focus: https://github.com/mediar-ai/mcp-server-macos-use/blob/main/Sources/MCPServer/main.swift and the desktop element interaction layer in Rust (cross-platform, handles the same cursor management concerns): https://github.com/mediar-ai/terminator/blob/main/crates/terminator/src/element.rs |
Thanks, this is very helpful. The point about CGEvent always moving the real system cursor is especially useful. Our current ghost-pointer phase is only a visual separation layer, so the save → act → restore cursor pattern is exactly the kind of practical mitigation I need to look at next on macOS. Good call as well on keeping the overlay fully non-intercepting during execution, and on adding a stricter bridge heartbeat / teardown path. Those are both real debt items on this branch, so I appreciate the concrete warning signs here. |
This is gold, thank you. I’ll study both references closely, especially the cursor save/restore flow and the AXUIElement-based window/session handling on macOS. That is very close to the problem boundary of this PR, so having a real implementation to compare against is extremely valuable. Really appreciate you sharing concrete code instead of just high-level advice. Appreciate the references — this gives me a much better implementation target than guessing through the macOS edge cases blind. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: eb2888a8a9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review this PR and verify that the critical bug fix is correct |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b776261021
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a385738fca
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5f447f81be
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
airi/services/computer-use-mcp/src/browser-dom/extension-bridge.ts
Lines 66 to 67 in 847edd6
start() sets this.started = true before the websocket server has successfully bound. If binding fails (e.g. transient port collision), the catch block records the error but never resets started, and subsequent start() calls short-circuit forever; combined with close() not resetting started, the bridge cannot be restarted in-process after failure.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ad82245c6d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review and verify that the critical bug fix is correct |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b3ee33e455
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
start() marks the bridge as started before attempting to bind the WebSocket server, but the catch block only records lastError and never clears started. If startup fails once (for example, temporary port bind failure), every later start() call returns early and the extension bridge can never recover without restarting the whole MCP process.
airi/services/computer-use-mcp/src/browser-dom/extension-bridge.ts
Lines 121 to 125 in 44d1cfe
The socket close handler updates connection status but leaves pending requests unresolved, so in-flight actions wait until the full request timeout before failing. In disconnect-prone sessions this adds ~10s stalls before fallback paths (e.g., OS-input fallback in click/type flows) can run, making browser actions appear hung.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cefcd38dd5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR finalizes the “Desktop” baseline by splitting agent-owned vs user sessions, adding a Chrome-specific session lifecycle, and introducing a desktop grounding + overlay pipeline (observe → snap/route → click + visual feedback loop).
Changes:
- Adds desktop grounding (
desktop_observe,desktop_click_target) with snap resolution, staleness guards, and browser-dom routing hooks. - Introduces an agent-owned Chrome session lifecycle (
ChromeSessionManager,desktop_ensure_chrome) plus desktop session ownership state. - Adds an Electron transparent overlay window that polls MCP state and renders ghost pointer phases + candidate boxes.
Reviewed changes
Copilot reviewed 62 out of 65 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| services/computer-use-mcp/src/utils/sleep.ts | Adds mockable async sleep helper for tests. |
| services/computer-use-mcp/src/types.ts | Extends action/types for desktop grounding and Chrome session metadata. |
| services/computer-use-mcp/src/transparency.ts | Adds intent/outcome messaging and run-state summary for grounding + pointer. |
| services/computer-use-mcp/src/strategy.ts | Adds grounding-related advisories (fresh observe required, stale snapshot, duplicate click). |
| services/computer-use-mcp/src/state.ts | Persists grounding snapshot/pointer intent and Chrome/desktop session fields in RunState. |
| services/computer-use-mcp/src/snap-resolver.ts | Implements snap-to-candidate coordinate resolver + geometry helpers. |
| services/computer-use-mcp/src/snap-resolver.test.ts | Unit tests for snap resolver and geometry helpers. |
| services/computer-use-mcp/src/server/tool-descriptors/vscode.ts | Adds tool descriptors for VS Code lane. |
| services/computer-use-mcp/src/server/tool-descriptors/types.ts | Introduces canonical ToolDescriptor types + validation helpers. |
| services/computer-use-mcp/src/server/tool-descriptors/task-memory.ts | Adds tool descriptors for task-memory lane. |
| services/computer-use-mcp/src/server/tool-descriptors/registry.ts | Adds descriptor registry with query/group/validation helpers. |
| services/computer-use-mcp/src/server/tool-descriptors/registry.test.ts | Tests registry initialization, completeness, and grounding tool enablement rules. |
| services/computer-use-mcp/src/server/tool-descriptors/register-helper.ts | Adds descriptor-driven MCP tool registration + descriptor lookup helpers. |
| services/computer-use-mcp/src/server/tool-descriptors/pty.ts | Adds PTY lane descriptors. |
| services/computer-use-mcp/src/server/tool-descriptors/index.ts | Exports consolidated descriptor APIs/registry/types. |
| services/computer-use-mcp/src/server/tool-descriptors/display.ts | Adds display lane descriptors. |
| services/computer-use-mcp/src/server/tool-descriptors/coding.ts | Adds coding lane descriptors. |
| services/computer-use-mcp/src/server/tool-descriptors/cdp.ts | Adds browser_cdp lane descriptors. |
| services/computer-use-mcp/src/server/tool-descriptors/all.ts | Aggregates all descriptors and initializes global registry. |
| services/computer-use-mcp/src/server/tool-descriptors/accessibility.ts | Adds accessibility lane descriptors. |
| services/computer-use-mcp/src/server/runtime.ts | Wires ChromeSessionManager + DesktopSessionController into server runtime. |
| services/computer-use-mcp/src/server/register-tools.ts | Adds browser-dom capability gating to return structured “unsupported actions” errors. |
| services/computer-use-mcp/src/server/register-tools-pty-approval.test.ts | Tests new browser-dom capability gating behaviors. |
| services/computer-use-mcp/src/server/register-chrome-session.ts | Registers desktop_ensure_chrome tool with policy/approval/audit and CDP auto-connect best-effort. |
| services/computer-use-mcp/src/server/register-chrome-session.test.ts | Tests approval-required flow and non-approval execution persistence for Chrome session tool. |
| services/computer-use-mcp/src/server/action-executor.ts | Routes type_text through browser-dom setInputValue for chrome_dom text inputs when applicable. |
| services/computer-use-mcp/src/server/action-executor.test.ts | Tests typing route guardrails (explicit coords bypass; unsupported setInputValue falls back). |
| services/computer-use-mcp/src/server.ts | Registers new desktop grounding + Chrome session tools. |
| services/computer-use-mcp/src/desktop-session.ts | Adds desktop execution ownership model and foreground enforcement hooks. |
| services/computer-use-mcp/src/desktop-session.test.ts | Unit tests for DesktopSessionController behavior. |
| services/computer-use-mcp/src/desktop-grounding.ts | Implements unified desktop observation aggregation and candidate dedup/ranking. |
| services/computer-use-mcp/src/desktop-grounding.test.ts | Tests candidate extraction/dedup, formatting output behavior. |
| services/computer-use-mcp/src/desktop-grounding-types.ts | Adds grounding snapshot/candidate/snap/pointer intent types (incl. ghost pointer phases). |
| services/computer-use-mcp/src/chrome-session-manager.ts | Adds macOS Chrome lifecycle manager (launch/join/new window, CDP port, focus restore). |
| services/computer-use-mcp/src/chrome-session-manager.test.ts | Tests ChromeSessionManager flows with mocked shell interactions. |
| services/computer-use-mcp/src/chrome-semantic-adapter.ts | Captures Chrome semantic snapshot via extension/CDP and maps to target candidates w/ coordinate transforms. |
| services/computer-use-mcp/src/browser-dom/extension-bridge.ts | Adds action capability gating, pending-request rejection on disconnect, and start retry hygiene. |
| services/computer-use-mcp/src/browser-dom/extension-bridge.test.ts | Tests read-only transport behavior, retryable startup, and in-flight rejection on disconnect. |
| services/computer-use-mcp/src/browser-dom/capabilities.ts | Adds capability helpers to detect unsupported browser-dom actions. |
| services/computer-use-mcp/src/browser-action-router.ts | Adds deterministic routing rules (browser_dom vs os_input) for click/type/checkbox/select. |
| services/computer-use-mcp/chrome-extension/msg_bridge.js | Adds isolated-world message relay between background and MAIN-world content API. |
| services/computer-use-mcp/chrome-extension/manifest.json | Adds MV3 manifest for read-only grounding extension. |
| services/computer-use-mcp/chrome-extension/icon48.png | Adds extension icon asset. |
| services/computer-use-mcp/chrome-extension/icon16.png | Adds extension icon asset. |
| services/computer-use-mcp/chrome-extension/icon128.png | Adds extension icon asset. |
| services/computer-use-mcp/chrome-extension/content.js | Adds MAIN-world read-only DOM observation API (window.__AIRI_DG__). |
| services/computer-use-mcp/chrome-extension/README.md | Documents extension behavior, architecture, and supported commands. |
| packages/plugin-sdk/src/plugin-host/core.ts | Refactors Object.fromEntries construction to avoid spread+map allocation. |
| packages/pipelines-audio/src/processors/tts-chunker.ts | Uses stack.at(-1) for cleaner stack top access. |
| apps/stage-tamagotchi/src/shared/eventa/index.ts | Adds DesktopOverlay readiness invoke contract. |
| apps/stage-tamagotchi/src/renderer/pages/desktop-overlay.vue | Adds transparent renderer page for ghost pointer/candidate rendering and ripple animation. |
| apps/stage-tamagotchi/src/renderer/pages/desktop-overlay-polling.ts | Adds pure polling/state-extraction logic with bootstrap readiness handshake and timeouts. |
| apps/stage-tamagotchi/src/renderer/pages/desktop-overlay-coordinates.ts | Adds pure coordinate mapping helpers (screen ↔ overlay-local) and intersection tests. |
| apps/stage-tamagotchi/src/renderer/pages/desktop-overlay-coordinates.test.ts | Unit tests for overlay coordinate helpers. |
| apps/stage-tamagotchi/src/main/windows/desktop-overlay/rpc/index.electron.ts | Adds eventa RPC bootstrap for overlay window + readiness reporting. |
| apps/stage-tamagotchi/src/main/windows/desktop-overlay/rpc/contracts.ts | Re-exports overlay readiness contract/types. |
| apps/stage-tamagotchi/src/main/windows/desktop-overlay/index.ts | Creates always-on-top, click-through overlay BrowserWindow gated by env var. |
| apps/stage-tamagotchi/src/main/index.ts | Eagerly instantiates overlay window when AIRI_DESKTOP_OVERLAY=1. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e12267f977
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 94629d5eac
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
90b1bdc to
18a4784
Compare
dbea76e to
723040f
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 666d652792
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
666d652 to
79d64d5
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0912e598c1
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
0912e59 to
bdd16bc
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2ef6a9db3b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@nekomeowww rebase 好了 |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e6675d8c6d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
这是 Desktop 线路的基础形态收官 3 / 3 步。正式剥离了用户会话和 agent 会话的概念权责,引入 Chrome 专属 Session Lifecycle,并在 Overlay 层加入了指针执行阶段的视觉分离。使得从侦测、拦截、派发到视觉反馈形成逻辑回环。
本次包含:
ChromeSessionManager托管 agent 自建窗口的 PID 与存活感知。preview -> executing -> completed状态流,并挂载波纹淡出动画。不在本次范围:
bringToFront,依然会有短时间的输入打扰。pure-cdp-background-action-router) 开发。