Skip to content

refactor(notebook-doc): unified DocChangeset from single patch walk #1667

@rgbkrk

Description

@rgbkrk

Problem

Three functions independently call doc.diff(before, after) and walk the same patches:

Function Crate Calls doc.diff()? Produces
diff_cells() notebook-doc Yes CellChangeset
diff_metadata_touched() notebook-doc Yes bool
compute_text_attributions() runtimed-wasm (local fn) Yes Vec<TextAttribution>

The WASM receive_frame() path calls diff_cells + compute_text_attributions = 2 doc.diff() calls with identical patches. The daemon calls diff_metadata_touched separately.

Additionally, the WASM path invalidates metadata_fingerprint_cache on every doc change (including cell source edits and output streaming), forcing re-serialization ~30/sec during execution. See the TODO at runtimed-wasm/src/lib.rs:1440.

Proposal

One function in notebook-doc that diffs once, walks patches once, classifies everything:

pub struct DocChangeset {
    pub cells: CellChangeset,
    pub metadata_changed: bool,
    pub text_patches: Vec<TextPatch>,  // raw splice/delete per cell source
}

pub fn diff_doc(
    doc: &mut AutoCommit,
    before: &[ChangeHash],
    after: &[ChangeHash],
) -> DocChangeset

text_patches

Captures the raw splice/delete operations on cell source text — the same data compute_text_attributions currently extracts, but without actor labels (those come from extract_change_actors via get_changes, a separate query). The WASM consumer combines text patches + actors into TextAttribution.

Consumer changes

WASM (runtimed-wasm):

let changeset = diff_doc(doc, &before, &after);
let actors = extract_change_actors(doc, &before);
let attributions = build_attributions(&changeset.text_patches, &actors);

if changeset.metadata_changed {
    self.metadata_fingerprint_cache = None;  // fixes the TODO
}

Daemon (runtimed):

let changeset = diff_doc(doc, &before, &after);
if changeset.metadata_changed {
    check_and_broadcast_sync_state(room).await;
}

What moves where

  • Patch-walking guts of compute_text_attributions move into diff_doc() in notebook-doc
  • compute_text_attributions stays in runtimed-wasm as a thin combiner (text patches + actors → TextAttribution)
  • diff_cells() becomes a convenience wrapper: diff_doc(...).cells
  • diff_metadata_touched() becomes: diff_doc(...).metadata_changed
  • extract_change_actors() stays as-is (uses get_changes, not diff)

Benefits

  1. WASM: one doc.diff() per sync frame instead of two
  2. WASM: metadata fingerprint cache only invalidated when metadata actually changed
  3. Daemon: one doc.diff() instead of separate metadata check
  4. Extension point: adding new patch categories is just a new field on DocChangeset

Design question

Whether text_patches should surface raw Automerge patch data or a more structured type (cell_id + index + text + deleted). The structured approach is cleaner for consumers but couples notebook-doc to the text attribution concept. Raw patches keep notebook-doc focused on classification only.

Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions