fix(compaction): tighten memory rules, fix history integrity, honest event flags by njbrake · Pull Request #1273 · mozilla-ai/clawbolt

njbrake · 2026-05-08T10:49:42Z

Description

Bundles the four sub-tasks from #1243 plus the broken-tool memory concern surfaced this week, when the CompanyCam photo-search bug (now fixed in v0.4.1) was persisted into a user's MEMORY.md as a durable "fact" by the compaction LLM.

Reframes the whole compaction layer around the principle that Clawbolt is not the system of record. The contractor's authoritative data lives in their integrations:

Source of truth	What it owns
QuickBooks	customers, contacts, invoices, estimates, items, payments
CompanyCam	projects, addresses, photos, project status
AppFolio	work orders, tenant info, vendor jobs
Heartbeat / Calendar	time-bounded reminders
Drive	saved files, receipt images

Memory exists for cross-system knowledge that lives nowhere else: pricing rules, cross-system relationships, disambiguation guidance, communication conventions. Anything an integration can answer live, the agent should look up live.

What changed (per #1243 sub-task)

1. HISTORY.md append integrity (memory_db.py, compaction.py)

MemoryStore.append_history now returns the row's new full plaintext under the same row-level lock that wrote it. Compaction's audit-snapshot path uses that return value instead of reconstructing the snapshot from a .strip()'d read of pre-write text. The reconstruction silently dropped the trailing newline that the DB write actually retains, so two consecutive history entries showed up jammed together in the audit even when they were separated in the live row.
Append also guarantees a newline separator at the entry boundary even if the stored text lacks one (e.g. a manual edit, or pre-guarantee data).
Regression tests in test_memory.py and test_compaction.py.

2. Reduce compaction prompt noise (compaction.py)

New _strip_assistant_noise pre-processing pass strips URLs from assistant content before sending to the compactor. Deep links to CompanyCam photos and QBO drafts no longer eat context budget.
User content is left untouched: contractor-pasted URLs are usually intentional.
Two regression tests pin the asymmetric behavior.

3. Tighten durable-memory rules (prompts/compaction.md) + broken-tool memory concern

Full prompt rewrite around the system-of-record principle. Explicit include / exclude lists. Memory must not duplicate facts that already live in QBO / CompanyCam / AppFolio / Drive / heartbeat.
Explicit prune rule on rewrite: drop items that have moved into a source-of-truth integration; drop completed or expired reminders; drop transient bug notes (a bug that was true yesterday is not a durable fact).
HISTORY.md format guidance: breadcrumbs over numbers, pointers over deep links.

4. Honest event flags (compaction.py)

memory_updated / user_profile_updated / soul_updated now reflect real persisted diffs. An LLM that echoes the existing memory verbatim no longer flips the flag to True.
The persisted CompactionEvent row, the compaction.summary structured log line, and the admin "memory updated" indicator are all now honest about whether anything actually changed.
Two regression tests cover the echo case.

Type

Checklist

Tests pass (uv run pytest -v) — 2466 passed, 2 skipped
Lint passes (ruff check backend/ && ruff format --check backend/)
New tests added for new functionality
Bug fixes include regression tests

New regression tests:

test_format_messages_strips_urls_from_assistant_content
test_format_messages_keeps_user_content_urls
test_compact_session_does_not_write_when_memory_unchanged
test_compact_session_summary_log_marks_memory_unchanged_when_llm_echoes
test_compaction_history_snapshot_matches_db_after_append
test_append_history_returns_new_full_text
test_append_history_inserts_separator_when_existing_lacks_newline

What this does NOT do

The "snapshot has entry, DB does not" symptom in Fix compaction history integrity and reduce low-signal compaction noise #1243 sub-task 1 is not directly diagnosed; I could not reproduce it from code-reading alone. The append-path hardening above removes the most plausible cause (the snapshot/DB mismatch on the trailing-newline boundary). If it recurs, we now have append_history's return-value as the source of truth for the snapshot, which closes the read-after-write race window.
Existing polluted MEMORY.md content (e.g. customer IDs, invoice line items already in users' memory) is not migrated. The next compaction will gradually prune under the new rules. If you want immediate cleanup, run an admin-triggered compaction with a steering note after this lands.

AI Usage

AI-assisted (designed and drafted by Claude via discussion with @njbrake; design pass walked through Jesse's actual MEMORY.md against the system-of-record principle before implementation; reasoning and design decisions are mine, prose is Claude's.)
No AI used

Fixes #1243

…event flags Bundles the four sub-tasks from #1243 plus the broken-tool memory concern surfaced when the CompanyCam search bug got persisted into a user's MEMORY.md as a "fact." Reframes the compaction prompt around the principle that Clawbolt is not the system of record. QuickBooks owns customers, contacts, invoices, estimates. CompanyCam owns projects, addresses, photos. AppFolio owns work orders. Heartbeat / Calendar own time-bounded reminders. Drive owns saved files. Memory exists for cross-system knowledge that lives nowhere else: pricing rules, cross-system relationships, disambiguation guidance, communication conventions. Specific changes: 1. Rewrite prompts/compaction.md with explicit include / exclude lists and a prune rule. Forbid: source-of-truth duplicates (customer IDs, emails, invoice line items, project addresses), transient tool-call failures and "X is broken" notes, completed reminders. Tighten HISTORY.md rules: breadcrumbs over numbers. 2. Pre-process assistant content with a URL-stripping pass before sending to the compactor, so deep links to CompanyCam photos and QBO drafts no longer eat context budget. User content untouched since contractor-pasted URLs are intentional. 3. Make MemoryStore.append_history return the row's new full plaintext under the same row-level lock that wrote it, and guarantee a newline separator at the entry boundary even if the stored text was missing the trailing newline. Compaction's audit snapshot now records what actually landed in the DB instead of reconstructing from a .strip()'d read that silently dropped the separator. 4. Make memory_updated / user_profile_updated / soul_updated reflect real persisted diffs. An LLM that echoes the existing memory verbatim no longer flips the flag to True. The persisted event row, the structured-summary log, and the admin "memory updated" indicator are all now honest. Fixes #1243 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…values Two follow-ups to the review of #1273: 1. Re-add the explicit "preserve every existing field" guard to the USER.md guidance. The previous prompt revision dropped it; without it, an LLM rewriting USER.md while updating one field could silently trim others. 2. Promote the underlying durable-memory rule from "the integration owns this fact" to "the integration can change this fact without telling Clawbolt." Memorizing a value the source system can mutate creates a stale-cache risk that's worse than just inefficient: a memorized phone number, email, status, or amount can become actively wrong by the next read. The worked example anchors the rule with a concrete archetype: AppFolio rotates tenant contact phone numbers every few days for privacy. A memorized number quoted back next week now belongs to a different tenant, and the contractor calls a stranger. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Apply the OSS AGENTS.md "Editing prompt files" guidance to the compaction prompt (107 -> 75 lines, same rules). Specifically: - Promote the stale-cache rule and AppFolio worked example into the Operating principle. They're the most important rules in the document and were buried halfway down. - Merge the standalone "Maintaining MEMORY.md" section into the MEMORY.md section. The prune rules were stated twice. - Flatten the section nesting. ``## What goes in each file`` with ``### user_profile``, ``### memory``, ``### soul`` subsections is more structure than the document needs. - Drop the redundant dispatch reminder at the end. The USER.md and MEMORY.md sections already say where each kind of fact lives. - Drop the parenthetical "(the agent should look these up...)" that duplicated what the source-of-truth table already states. Also tighten the no-op-echo comment in compaction.py from six lines to three. The "why" (honest event flags) is enough; the enumeration of downstream consumers (persisted flag, summary log, admin UI) was restating what's evident from where the flags get used. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The slim pass left real customer / contact names from a production user's MEMORY.md inline as worked examples (Arbors, Wilham, Brett Rentschler, Surman, paula@..., $7,413.49 estimate, txnId=544). Those are real businesses and people Jesse works with; they should not ship as durable values inside the system prompt of a public codebase. Replaced with placeholder slots (<Client>, <Tenant>, <Property Mgmt>) and generic numbers ($X,XXX, txnId=NNN). The patterns are identical; the LLM still gets the same shape signal without us shipping real relationships in the OSS repo. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… user-explicit save override Two prompts touch the same files but were drifting in different directions. The base agent prompt told Marshall to write "client names and contact info, pricing history, supplier details" into MEMORY.md. The new compaction prompt says: those live in the integrations and don't belong in memory. The base agent was actively writing things the compactor was supposed to drop. Aligning the two: - ``instructions.md``: rewrite the MEMORY.md / USER.md descriptions to match the system-of-record framing. Drop "client names and contact info" from the include list; explicitly call out customer contact details, invoice contents, project addresses, and work-order state as do-not-write. - ``instructions.md``: new section "Remember this requests" that tells Marshall how to handle explicit user save requests. Honor them, but flag staleness risk on values that mutate upstream (rotating phone numbers being the canonical case) and warn against duplicating canonical-in-integration values. - ``instructions.md``: new short section explaining how Marshall answers "how do you remember things?" when asked. Honest, brief, do-not-volunteer. - ``compaction.md``: corresponding override clause: an explicit user save request in the conversation overrides the exclusion rules, even when the value overlaps with what an integration owns. The base agent owns the staleness warning; by the time the conversation reaches the compactor, an explicit save is intentional. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Strip the inline "e.g." illustrative examples from both prompts. The category names alone carry the rule; the worked-quote shapes were padding the bullet list. One disambiguating note kept on "Cross-system relationships" since the term itself benefits from a brief definition. The standalone AppFolio worked-example block in compaction.md stays as it anchors the most important rule (stale-cache risk) with a concrete failure mode. Also strips a similar e.g. from instructions.md "Multi-field tasks" that pre-dated this PR but matched the same pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…#1280) * feat(agent): bounded-growth policies for agent-managed markdown files Every agent-managed markdown surface (USER.md, SOUL.md, HEARTBEAT.md, MEMORY.md, HISTORY.md, BOOTSTRAP.md) now has an explicit byte budget declared in a single source-of-truth registry and enforced at the write boundary. Without these bounds, a long-lived user's working memory could grow without limit, bloating every system prompt and degrading both cost and response quality. Storage and prompt-injection paths are protected: * Workspace write_file / edit_file reject over-budget content with a clean ToolResult error so the agent can rewrite smaller. HISTORY.md rejects write_file / edit_file outright so the append-only invariant from #1273 cannot be bypassed. * memory_db write_*_async raise BudgetExceededError; compaction catches and logs, leaving the previous file in place rather than crashing or corrupting durable state. * memory_db.append_history applies windowed FIFO drop on overflow so HISTORY.md storage stays bounded across long-lived users. * System prompt builders tail-truncate over-budget legacy rows so a pre-cap row cannot poison every prompt forever. Cap is a uniform 25 KiB across surfaces, matching Claude Code's MEMORY.md cap and the existing compaction_event_snapshot per-file audit cap (so an in-budget surface always fits in audit rows without truncation). Fixes #1244 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(agent): tighten bounded-growth tests and logging Self-review follow-ups for #1280: * Tighten history-entry boundary regex from ``^\[`` to ``^\[\d{4}-\d{2}-\d{2}`` so a multi-line summary whose body line happens to start with ``[`` is not mis-split into two entries. * Strengthen the FIFO drop test to assert per-iteration sentinels (``iter=0000`` dropped, ``iter=0199`` retained). The old assertion used cycling timestamps that re-introduced earlier dates and was effectively tautological. * Add a BOOTSTRAP.md test for the disk path through write_file, and an under-budget scratch-file test that locks in the registry-is-opt-in behavior for unknown markdown surfaces. * Demote ``markdown_registry: truncated ...`` log from WARNING to INFO. The truncation event fires every turn for an affected user until the agent rewrites the file; a per-turn WARNING crowds out actionable signals. INFO still surfaces the event to operators searching for it. * Promote observability from a one-line out-of-scope note to its own section in ``docs/markdown_growth_policies.md`` covering the current state (compaction WARNING on rejection, registry INFO on truncation, ``compaction_events`` audit rows) and the shape of a follow-up dashboard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(agent): close PUT /user/profile bypass for bounded-growth cap Second-pass review of #1280 caught that ``PUT /api/user/profile`` accepts ``user_text`` / ``soul_text`` / ``heartbeat_text`` and writes them via raw ``setattr`` without going through the registry. The dashboard editor (or a premium admin) could push an over-budget blob that the agent paths now reject. The read-side ``truncate_for_injection`` clamps it for prompts, but the asymmetry breaks the cap as a "what the LLM sees" guarantee. * ``update_profile`` now validates each string field against ``assert_column_within_budget`` before any commit and returns 413 with the registry's actual / allowed sizes. * Three regression tests cover ``user_text`` / ``soul_text`` / ``heartbeat_text`` rejection and one sanity test covers the under-budget happy path. * Drive-by: switch the 413 status constant in ``user_memory.py`` and ``user_profile.py`` from the deprecated ``HTTP_413_REQUEST_ENTITY_TOO_LARGE`` to ``HTTP_413_CONTENT_TOO_LARGE`` to silence the Starlette deprecation warning the new tests surfaced. * Document ``_huge()``'s ASCII assumption so a future refactor does not silently change byte-size semantics in the integration tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

njbrake and others added 2 commits May 8, 2026 10:49

njbrake mentioned this pull request May 8, 2026

Receipts for read tools, fed into compaction as 'looked-up live' evidence #1274

Open

njbrake and others added 4 commits May 8, 2026 12:00

njbrake merged commit 071edc2 into main May 8, 2026
8 checks passed

njbrake deleted the fix/compaction-quality branch May 8, 2026 12:43

njbrake mentioned this pull request May 8, 2026

feat(agent): bounded-growth policies for agent-managed markdown files #1280

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(compaction): tighten memory rules, fix history integrity, honest event flags#1273

fix(compaction): tighten memory rules, fix history integrity, honest event flags#1273
njbrake merged 6 commits intomainfrom
fix/compaction-quality

njbrake commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

njbrake commented May 8, 2026

Description

What changed (per #1243 sub-task)

Type

Checklist

What this does NOT do

AI Usage

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant