Skip to content

fix(plugin-lifecycle): OpenCode Zod schemas, OpenClaw isolation, session drift, lifecycle filter#2063

Closed
thedotmack wants to merge 2 commits intomainfrom
bugfix/plugin-lifecycle-bugs
Closed

fix(plugin-lifecycle): OpenCode Zod schemas, OpenClaw isolation, session drift, lifecycle filter#2063
thedotmack wants to merge 2 commits intomainfrom
bugfix/plugin-lifecycle-bugs

Conversation

@thedotmack
Copy link
Copy Markdown
Owner

Summary

Test plan

  • Verify OpenCode plugin loads without tool resolution crash (Zod schemas)
  • Verify OpenClaw installer completes without "plugin not found" error
  • Verify agents only see their own observations via context injection and slash commands
  • Verify session context uses current key, not stale alias
  • Verify observations are stored when workspaceDir is missing (falls back to $HOME)
  • Verify heartbeat sessions aren't summarized
  • Verify lifecycle events like NO_REPLY are filtered from observation storage

Closes #1947, closes #1948, closes #1949, closes #1950, closes #1951

🤖 Generated with Claude Code

…ion drift, lifecycle filter

- #1947: Convert OpenCode plugin tool args from plain objects to Zod schemas
- #1948: Pre-populate plugins.allow before install to fix chicken-and-egg
- #1949: Scope getContextForPrompt and slash commands to agent-specific project
- #1950: Prefer current sessionKey over stale aliases, fallback workspaceDir
- #1951: Filter lifecycle events from observations, exempt heartbeat sessions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 18, 2026

Warning

Rate limit exceeded

@thedotmack has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 3 minutes and 3 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 3 minutes and 3 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ade94d31-7bd4-48d4-b8b6-850492598e27

📥 Commits

Reviewing files that changed from the base of the PR and between beea789 and 2c7967e.

📒 Files selected for processing (3)
  • openclaw/install.sh
  • openclaw/src/index.ts
  • src/integrations/opencode-plugin/index.ts
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bugfix/plugin-lifecycle-bugs

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 18, 2026

Code Review — PR #2063

Good set of targeted fixes across five related issues. The PR description is clear and the test plan is well-structured. Below is my analysis by area.


src/integrations/opencode-plugin/index.ts — Zod schemas (Fix #1947)

✅ Correct fix. Converting { type: "string" } to z.string().describe(...) is the right change for OpenCode compatibility.

Minor: zod is now a runtime import. Confirm zod is in dependencies (not devDependencies) in the opencode-plugin's package.json, otherwise this will fail at runtime in production builds.


openclaw/install.sh — pre-populate plugins.allow (Fix #1948)

✅ Logic is correct. Pre-populating the allow list before calling plugins install cleanly solves the chicken-and-egg.

Concerns:

  • Node.js as a hard dependency: The installer now requires node to be available to modify the JSON config. If Node is absent, it silently warns and continues — then plugins install will likely fail anyway. Either document node as a prerequisite or use jq (with a node fallback) for broader portability.
  • No backup of config before mutation: Writing directly to $oc_config without a backup could corrupt user config if the process is interrupted mid-write. Consider writing to a temp file and mv-ing atomically:
    node -e "..." > "$oc_config.tmp" && mv "$oc_config.tmp" "$oc_config"
  • Minimal config clobbers existing file: The else branch creates { plugins: { allow: ['claude-mem'] } } from scratch. If the file doesn't exist yet that's fine, but if there's a race where it was just created by another process, you'd overwrite it. The -f check immediately above should prevent this, but worth a defensive note.

openclaw/src/index.ts — Session drift (Fix #1950)

✅ The session key preference logic is sound. Preferring ctx.sessionKey over a stale alias prevents drift across turns.

Concern — clearSessionContext on session_start: Clearing stale mappings on session_start is correct for /new and /reset scenarios. Verify that clearSessionContext doesn't inadvertently remove mappings that are still live if two sessions happen to share a canonical key (e.g., same workspace opened twice). If that case can't happen, no action needed.


openclaw/src/index.ts — Agent isolation (Fix #1949)

Potential bug in slash commands:

getContextForPrompt was correctly changed to use getProjectName(ctx) for agent-specific isolation. But the search and timeline slash commands both do:

const agentProject = baseProjectName;  // ⚠️ not agent-specific

The comment says "Scope search to the current agent's project for per-agent isolation" — but baseProjectName is the gateway-level project, not the agent project. If the intent truly is per-agent isolation, this should be getProjectName(ctx) (same as getContextForPrompt). If slash commands intentionally use the gateway's project, the comment is misleading and should be corrected.

Please clarify or fix this inconsistency before merging.


openclaw/src/index.ts — Lifecycle event filtering (Fix #1951)

✅ The exclusion set approach is clean. Using a Set for O(1) lookup is correct.

Concerns:

  • Incomplete system-session detection: before_agent_start marks sessions as system sessions when the prompt is "heartbeat", "no_reply", or "lifecycle_event". But EXCLUDED_LIFECYCLE_EVENT_TYPES also includes "user_re_engagement" and "session_start" — these aren't covered in the system-session detection logic. Should they be?

  • Content pattern anchor: EXCLUDED_LIFECYCLE_CONTENT_PATTERNS uses ^...$ anchors, so they only match if the entire response is a lifecycle keyword. This seems intentional to avoid false positives, but confirm that multi-line lifecycle noise (e.g., "heartbeat\n") doesn't slip through. The .trim() before .test() handles trailing whitespace, so single-line cases are fine.

  • system_session flag — worker-side handling: The initPayload.system_session = true flag is sent to the worker, but the diff doesn't include the worker endpoint handling it. If the worker ignores this flag today, the isolation is only enforced client-side. Is the worker-side handling in a separate PR or already merged?


openclaw/src/index.ts — workspaceDir fallback (Fix #1950)

Trade-off concern: Falling back to $HOME means observations from different projects with no workspaceDir all land under the home directory in the DB. This could mix unrelated project observations. The previous behavior (drop the observation) was lossy but clean. Consider whether a virtual key like "__unknown__" would be safer than $HOME, to keep them queryable but isolated from real home-directory projects.

The process.env.HOME check is slightly over-cautious (typeof process !== "undefined" is always true in Node/Bun), but harmless.


systemSessionIds — memory leak risk

systemSessionIds is cleaned up in agent_end (per-session delete) and disconnect (full clear). If a heartbeat session never fires agent_end (e.g., it times out or is abandoned), its entry stays in the set indefinitely. For long-running gateways with many heartbeats this could accumulate. Consider a TTL-based cleanup or capping the set size.


Summary

Area Status
Zod schemas (OpenCode) ✅ Correct, check zod dep placement
install.sh pre-populate ✅ Correct, atomic write recommended
Session drift fix ✅ Sound
Agent isolation in getContextForPrompt ✅ Correct
Agent isolation in slash commands ⚠️ Likely bug — uses baseProjectName not getProjectName(ctx)
Lifecycle filtering ✅ Good, minor gaps noted
system_session worker handling ❓ Not visible in diff — confirm worker handles it
systemSessionIds memory leak ⚠️ Low risk but worth a follow-up
workspaceDir fallback ⚠️ Consider "__unknown__" instead of $HOME

The slash command agentProject = baseProjectName inconsistency is the most important item to resolve before merge. Everything else is lower severity.

🤖 Generated with Claude Code

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 18, 2026

Greptile Summary

This PR bundles five bug fixes across OpenCode/OpenClaw integration: Zod schema conversion for tool args, pre-populating plugins.allow before install, per-agent project scoping in context injection, session-key canonicalization to prevent drift, and lifecycle-event filtering to suppress heartbeat/no-reply noise from observations and summaries.

  • P1 — Incomplete lifecycle session detection (openclaw/src/index.ts:857): before_agent_start only marks 3 of the 5 EXCLUDED_LIFECYCLE_EVENT_TYPES as system sessions, so user_re_engagement and session_start prompts still flow through summarization.
  • P1 — Slash-command isolation gap (openclaw/src/index.ts:1169,1260): /claude-mem-search and /claude-mem-timeline use baseProjectName instead of getProjectName(ctx), leaking cross-agent observations — the same leak getContextForPrompt was just fixed to prevent.

Confidence Score: 4/5

Two P1 logic gaps should be addressed before merging to fully deliver the isolation and filtering goals.

The Zod schema fix, install.sh chicken-and-egg fix, workspaceDir fallback, and session-drift canonicalization are all correct. However, the lifecycle-session detection in before_agent_start covers only 3 of 5 excluded types, and both slash commands scope to baseProjectName instead of the agent-specific project — the same isolation regression the PR set out to fix.

openclaw/src/index.ts — before_agent_start lifecycle detection (line 857) and slash command project scoping (lines 1169, 1260).

Important Files Changed

Filename Overview
openclaw/src/index.ts Core plugin with 5 bug fixes; two P1 issues — incomplete lifecycle session detection (only 3 of 5 EXCLUDED_LIFECYCLE_EVENT_TYPES checked in before_agent_start) and slash commands using baseProjectName instead of getProjectName(ctx) for per-agent isolation.
openclaw/install.sh Pre-populates plugins.allow before running plugins install to fix the config-validation chicken-and-egg; logic looks correct and handles both existing-config and fresh-install paths.
src/integrations/opencode-plugin/index.ts Straightforward Zod schema fix — converts plain-object arg definitions to z.string().describe(...) as required by OpenCode's plugin SDK.

Sequence Diagram

sequenceDiagram
    participant OC as OpenClaw Runtime
    participant P as claude-mem Plugin
    participant W as Worker

    OC->>P: session_start
    P->>P: clearSessionContext
    P->>P: rememberSessionContext

    OC->>P: before_agent_start(prompt)
    P->>P: check prompt vs lifecycle patterns
    P->>P: mark systemSession if heartbeat/no_reply/lifecycle_event
    P->>W: POST sessions/init

    OC->>P: tool_result_persist(toolName)
    alt toolName in EXCLUDED_LIFECYCLE_EVENT_TYPES
        P-->>OC: skip observation
    else content matches EXCLUDED_LIFECYCLE_CONTENT_PATTERNS
        P-->>OC: skip observation
    else workspaceDir missing
        P->>P: fallback to home dir
        P->>W: POST sessions/observations
    else
        P->>W: POST sessions/observations
    end

    OC->>P: agent_end
    alt contentSessionId in systemSessionIds
        P->>P: delete from systemSessionIds
        P->>W: scheduleSessionComplete (no summary)
    else
        P->>W: POST sessions/summary
        P->>W: scheduleSessionComplete
    end
Loading

Fix All in Claude Code

Reviews (1): Last reviewed commit: "fix(plugin-lifecycle): OpenCode Zod sche..." | Re-trigger Greptile

Comment thread openclaw/src/index.ts Outdated
Comment on lines +857 to +859
if (promptLower === "heartbeat" || promptLower === "no_reply" || promptLower === "lifecycle_event") {
systemSessionIds.add(contentSessionId);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Incomplete lifecycle session detection

The before_agent_start check only covers 3 of the 5 types defined in EXCLUDED_LIFECYCLE_EVENT_TYPES. Sessions arriving with a prompt of "user_re_engagement" or "session_start" will not be added to systemSessionIds, so they will proceed through normal summarization at agent_end despite being lifecycle noise — the exact problem this fix aims to prevent.

Suggested change
if (promptLower === "heartbeat" || promptLower === "no_reply" || promptLower === "lifecycle_event") {
systemSessionIds.add(contentSessionId);
}
if (
promptLower === "heartbeat" ||
promptLower === "no_reply" ||
promptLower === "no reply" ||
promptLower === "lifecycle_event" ||
promptLower === "user_re_engagement" ||
promptLower === "session_start"
) {
systemSessionIds.add(contentSessionId);
}

Fix in Claude Code

Comment thread openclaw/src/index.ts Outdated
const query = hasTrailingLimit ? pieces.slice(0, -1).join(" ") : raw;

// Scope search to the current agent's project for per-agent isolation
const agentProject = baseProjectName;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Slash-command project scoping uses base project, not agent-specific project

The PR description states these commands add "per-agent isolation", but both /claude-mem-search and /claude-mem-timeline hardcode agentProject = baseProjectName instead of calling getProjectName(ctx). When invoked inside an agent that has an agentId, this returns results from the shared base project (e.g. "openclaw") rather than the agent-scoped one (e.g. "openclaw-abc123"), leaking cross-agent observations — the same problem getContextForPrompt was fixed to avoid.

Suggested change
const agentProject = baseProjectName;
const agentProject = getProjectName(ctx);

Fix in Claude Code

Comment thread openclaw/src/index.ts Outdated

const query = parts.join(" ");
// Scope timeline to the current agent's project for per-agent isolation
const agentProject = baseProjectName;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Same base-project scoping issue in /claude-mem-timeline

Same concern as /claude-mem-search above: agentProject = baseProjectName should be getProjectName(ctx) to honour per-agent isolation for timeline lookups.

Suggested change
const agentProject = baseProjectName;
const agentProject = getProjectName(ctx);

Fix in Claude Code

Comment thread openclaw/src/index.ts
Comment on lines 944 to +947
if (!workspaceDir) {
api.logger.warn(`[claude-mem] Skipping observation persist because workspaceDir is unavailable: session=${canonicalKey} tool=${toolName}`);
return;
const homeDir = typeof process !== "undefined" && process.env?.HOME ? process.env.HOME : "/tmp";
api.logger.warn(`[claude-mem] workspaceDir unavailable for session=${canonicalKey} tool=${toolName}, falling back to ${homeDir}`);
workspaceDir = homeDir;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 /tmp fallback may silently mix observations across agents

When process.env.HOME is unset (e.g. some CI/containerised environments), every agent falls back to "/tmp" as workspaceDir. Observations from all agents are then bucketed under the same path, defeating per-agent isolation. Consider falling back to an agent- or session-scoped sub-directory (e.g. path.join("/tmp", canonicalKey)) or surfacing a clearer warning.

Fix in Claude Code

…h commands, safe fallback

1. Expand before_agent_start lifecycle detection to check all EXCLUDED_LIFECYCLE_EVENT_TYPES
   and content patterns instead of only 3 hardcoded strings (covers user_re_engagement,
   session_start in addition to heartbeat, no_reply, lifecycle_event)

2-3. Fix /claude-mem-search and /claude-mem-timeline to use agent-specific project names
     (e.g., openclaw-<agentId>) instead of the base project name, via session→project
     tracking populated by before_agent_start

4. Use unique /tmp/claude-mem-<sessionId> subdirectory when HOME is unset to prevent
   cross-agent observation mixing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude Bot commented Apr 18, 2026

Code Review

Overview

This PR addresses five distinct issues: OpenCode Zod schema compatibility, OpenClaw installation ordering, agent context isolation, session drift/key canonicalization, and lifecycle event noise filtering. The changes are well-scoped and the PR description is thorough.


Bugs / Issues

1. Memory leak — sessionProjectNames never cleared

sessionProjectNames is populated in before_agent_start but is absent from the cleanup handler that clears canonicalSessionKeys, sessionAliasesByCanonicalKey, and systemSessionIds. In long-running gateway processes this will grow unbounded.

// cleanup handler (~line 1044) — add this:
sessionProjectNames.clear();

2. Race condition — lastActiveProjectName is a shared mutable variable

lastActiveProjectName is set on every before_agent_start and then read as the fallback in both slash command handlers. In a multi-agent/concurrent scenario, Agent B's before_agent_start will overwrite lastActiveProjectName before Agent A's /claude-mem-search fires, silently returning the wrong project scope. The ctx.channel primary path is correct, but this fallback isn't safe.

Consider initializing lastActiveProjectName per-session rather than as a closure-level let, or removing the fallback entirely and logging a warning when the channel mapping is missing.

3. Stale session aliases are not evicted on key change

In rememberSessionContext, when a new ctx.sessionKey is adopted as canonical, the old mappings in canonicalSessionKeys and sessionAliasesByCanonicalKey for the previous key are not removed. The comment says "clearing stale mappings" but the code only establishes the new key — it doesn't evict the old one. This means the deduplication map keeps growing and old keys could still resolve.


Minor Issues

4. Redundant dual case-check for EXCLUDED_LIFECYCLE_EVENT_TYPES

EXCLUDED_LIFECYCLE_EVENT_TYPES.has(promptLower) ||
EXCLUDED_LIFECYCLE_EVENT_TYPES.has(promptText.trim()) ||

The set contains both lowercase entries ("heartbeat") and uppercase ones ("NO_REPLY"). The lowercase check misses "NO_REPLY" (because "NO_REPLY".toLowerCase() = "no_reply" which isn't in the set), and the original-case check misses lowercase input. The regex fallback catches the rest. This works but is confusing — it would be cleaner to normalize the set to all-lowercase and use only promptLower, with the regex as a secondary check.

5. workspaceDir fallback mixes cross-project observations

The fallback to $HOME is better than dropping observations, but any observations without a workspaceDir from different projects will share the same workspace key. If the worker scopes observations by workspaceDir, this could silently merge context across unrelated projects. Worth documenting or adding a log warn noting the potential mixing.

6. typeof process !== "undefined" check is dead code

In a Bun/Node runtime, process is always defined. This defensive check adds noise without benefit.


Positives

  • Zod migration (src/integrations/opencode-plugin/index.ts) is minimal and correct.
  • Install script correctly uses an env var to pass the config path to inline Node (avoids shell injection), and the fallback || warn keeps the install non-fatal.
  • system_session flag propagation to the worker is a clean design — the worker can independently filter without duplicating detection logic.
  • systemSessionIds.delete(contentSessionId) in agent_end prevents unbounded growth. The same pattern should be applied to sessionProjectNames.
  • session_start clearing stale context via clearSessionContext before rememberSessionContext is the right order.

Summary

Two concrete bugs worth fixing before merge: the sessionProjectNames memory leak and the lastActiveProjectName race condition. The stale alias eviction is lower priority but worth tracking. Everything else is clean.

@thedotmack
Copy link
Copy Markdown
Owner Author

Closing to start fresh from main — will redo fixes isolated in Docker container.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment