Skip to content

FileBackend for Merkle{Reader,Writer}#512

Open
malcolmgreaves wants to merge 1 commit intomg/merkle_dyn_interfacesfrom
mg/merkle_dyn_file_backend
Open

FileBackend for Merkle{Reader,Writer}#512
malcolmgreaves wants to merge 1 commit intomg/merkle_dyn_interfacesfrom
mg/merkle_dyn_file_backend

Conversation

@malcolmgreaves
Copy link
Copy Markdown
Collaborator

@malcolmgreaves malcolmgreaves commented May 1, 2026

Implements the current custom file format based Merkle tree node storage
and retrieval logic for the MerkleReader and MerkleWriter traits as
the new FileBackend type. The implementation internally uses MerkleDbError,
but this is transformed at the trait boundry into an OxenError.

The write sessions that FileBackend produces are Box'd because they are
required to be object-safe (aka dyn-compatible).

Also refactors MerkleNodeDB::to_node's logic into a new function on the
enum EMerkleTreeNode::from_type_and_bytes

Additionally, MerkleNodeDB's open_read_write, write_node, and add_child
switch from generic <N: TMerkleTreeNode> to &dyn TMerkleTreeNode so they're
callable through the dyn-compatible NodeWriteSession.

Finally, fixes a bug in MerkleNodeDb::open_read_only: it was previously
incorrectly setting the node_id: MerkleHash to 0. The function now requires
the right hash from the path to be provided and uses that.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 1, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new file-backed Merkle node backend with reader/writer sessions and drop-enforced persistence; refactors MerkleNodeDB write APIs to accept trait objects (&dyn TMerkleTreeNode); and adds EMerkleTreeNode::from_type_and_bytes for type-directed deserialization.

Changes

Cohort / File(s) Summary
Module Declaration
crates/lib/src/core/db/merkle_node.rs
Exports new file_backend submodule and re-exports FileBackend, FileNodeSession, and FileWriteSession.
File Backend Implementation
crates/lib/src/core/db/merkle_node/file_backend.rs
New file-backed backend implementing reader/writer interfaces: FileBackend<'repo>, FileWriteSession, FileNodeSession. Reader delegates to MerkleNodeDB (explicit exists pre-check for missing nodes); writer returns sessions where finish is idempotent and Drop enforces persistence. Includes unit tests for existence/get_node semantics, drop-based persistence, empty-children behavior, and writer completion.
Database API Refactor
crates/lib/src/core/db/merkle_node/merkle_node_db.rs
Write-path APIs changed from generics (N: TMerkleTreeNode) to trait-object parameters (&dyn TMerkleTreeNode) for open_read_write, add_child, and write_node; write behavior otherwise unchanged.
Node Deserialization
crates/lib/src/model/merkle_tree/node.rs
Adds pub fn from_type_and_bytes(dtype: MerkleTreeNodeType, data: &[u8]) -> Result<Self, rmp_serde::decode::Error> to construct EMerkleTreeNode from a type tag and msgpack bytes.

Sequence Diagrams

sequenceDiagram
    participant Client
    participant FileBackend
    participant FileWriteSession
    participant FileNodeSession
    participant MerkleNodeDB

    Client->>FileBackend: begin()
    FileBackend->>FileWriteSession: create()
    FileWriteSession-->>Client: session

    Client->>FileWriteSession: open_node(node)
    FileWriteSession->>FileNodeSession: create(repo, node)
    FileNodeSession->>MerkleNodeDB: open_read_write(&dyn node)
    MerkleNodeDB-->>FileNodeSession: opened

    Client->>FileNodeSession: add_child(&dyn child)
    FileNodeSession->>MerkleNodeDB: add_child(&dyn child)
    MerkleNodeDB-->>FileNodeSession: ok

    Client->>FileNodeSession: finish()
    FileNodeSession->>MerkleNodeDB: close/flush
    MerkleNodeDB-->>FileNodeSession: closed
    FileNodeSession-->>FileWriteSession: closed
    FileWriteSession-->>Client: finish() (no-op)
Loading
sequenceDiagram
    participant Client
    participant FileBackend
    participant MerkleNodeDB

    Client->>FileBackend: exists(hash)
    FileBackend->>MerkleNodeDB: exists(hash)
    MerkleNodeDB-->>FileBackend: true/false
    FileBackend-->>Client: bool

    Client->>FileBackend: get_node(hash)
    FileBackend->>MerkleNodeDB: open_read & read dtype+bytes (if exists)
    MerkleNodeDB-->>FileBackend: dtype + bytes
    FileBackend->>FileBackend: EMerkleTreeNode::from_type_and_bytes(dtype, bytes)
    FileBackend-->>Client: Ok(Some(node)) / Ok(None)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • CleanCut
  • gschoeni
  • rpschoenburg

Poem

🐰 I dug a file-backed burrow neat and bright,
Sessions hop in daylight, writes sealed tight,
Types bloom from bytes, each child finds its place,
Drop tucks the burrow — persistence in place,
A rabbit cheers the Merkle tree's new light.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title directly describes the main change: implementing FileBackend for the MerkleReader and MerkleWriter traits, which is the primary objective of the PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description clearly explains the implementation of FileBackend for Merkle{Reader,Writer} traits, refactoring work, and bug fixes related to the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch mg/merkle_dyn_file_backend

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/lib/src/core/db/merkle_node/file_backend.rs (1)

150-156: ⚡ Quick win

Drop implementation panics on I/O error.

Using expect() in Drop will panic if the underlying close() fails (e.g., disk full, permission error). While the test test_drop_finishes_file_node_session validates the happy path, production I/O errors would cause an unrecoverable panic.

Consider logging the error instead of panicking, since Drop cannot propagate errors and callers who need error guarantees should call finish() explicitly.

Proposed fix to avoid panic in Drop
 impl Drop for FileNodeSession {
     fn drop(&mut self) {
-        self.idempotent_finish()
-            .expect("Did not explicitly call finish() and encountered an error.");
+        if let Err(e) = self.idempotent_finish() {
+            log::error!(
+                "FileNodeSession::drop failed to close node {}: {}",
+                self.db.node_id,
+                e
+            );
+        }
     }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/lib/src/core/db/merkle_node/file_backend.rs` around lines 150 - 156,
The Drop impl for FileNodeSession currently calls
self.idempotent_finish().expect(...), which will panic on I/O errors during
drop; change it to call idempotent_finish() and handle the Result without
panicking (e.g., match or if let Err(e) = ...) and log the error instead of
calling expect so drop never unwinds; reference the Drop for FileNodeSession,
the drop() method, idempotent_finish(), and the explicit finish() API (callers
who need to observe errors should use finish()) when locating the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/lib/src/core/db/merkle_node/file_backend.rs`:
- Around line 63-70: get_children currently calls MerkleNodeDB::open_read_only
directly and will propagate an IO error for missing hashes; make its behavior
consistent with get_node by first checking self.exists(hash) and returning
Ok(vec![]) (or Ok(None) if you prefer Option semantics and update signature/call
sites) when the hash doesn't exist, otherwise proceed to open_read_only and
map(); reference get_children, get_node, exists(), and
MerkleNodeDB::open_read_only when making the change.

---

Nitpick comments:
In `@crates/lib/src/core/db/merkle_node/file_backend.rs`:
- Around line 150-156: The Drop impl for FileNodeSession currently calls
self.idempotent_finish().expect(...), which will panic on I/O errors during
drop; change it to call idempotent_finish() and handle the Result without
panicking (e.g., match or if let Err(e) = ...) and log the error instead of
calling expect so drop never unwinds; reference the Drop for FileNodeSession,
the drop() method, idempotent_finish(), and the explicit finish() API (callers
who need to observe errors should use finish()) when locating the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7f28e24e-2669-496b-985b-a95c8e91a0ba

📥 Commits

Reviewing files that changed from the base of the PR and between c0a0819 and 7814565.

📒 Files selected for processing (7)
  • crates/lib/src/core/db/merkle_node.rs
  • crates/lib/src/core/db/merkle_node/file_backend.rs
  • crates/lib/src/core/db/merkle_node/merkle_node_db.rs
  • crates/lib/src/model/merkle_tree.rs
  • crates/lib/src/model/merkle_tree/merkle_reader.rs
  • crates/lib/src/model/merkle_tree/merkle_writer.rs
  • crates/lib/src/model/merkle_tree/node.rs

Comment thread crates/lib/src/core/db/merkle_node/file_backend.rs
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_file_backend branch from 7814565 to f719e13 Compare May 1, 2026 20:31
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/lib/src/core/db/merkle_node/file_backend.rs`:
- Around line 47-55: The code returns db.node_id which is left as zero by
MerkleNodeDB::open_read_only; change the construction of MerkleNodeRecord in
this block to pass the requested hash (the local variable `hash`) instead of
`db.node_id` so the returned record reports the actual requested node id; update
the call to MerkleNodeRecord::new(...) to use `hash` as the first argument.
- Around line 140-155: The current idempotent_finish marks
FileNodeSession::finished before calling MerkleNodeDB::close which suppresses
retries on failure and the Drop impl panics via expect; change idempotent_finish
so it only sets self.finished = true after MerkleNodeDB::close(&mut self.db)
returns Ok (i.e., call close first and on success set finished), and modify the
Drop implementation for FileNodeSession to not panic—call
self.idempotent_finish() and ignore or log errors (e.g., .ok()) instead of using
expect so drop never aborts during unwinding.
- Around line 79-80: The implementation of MerkleWriter::begin uses a mismatched
self lifetime; change the signature in file_backend.rs from fn begin<'a>(&'a
self) -> Result<Box<dyn MerkleWriteSession + 'a>, OxenError> to fn
begin<'a>(&self) -> Result<Box<dyn MerkleWriteSession + 'a>, OxenError> so it
exactly matches the MerkleWriter trait; keep the body that returns
Box::new(FileWriteSession { repo: self.repo }) and ensure FileWriteSession and
the returned Box implement MerkleWriteSession with the correct lifetimes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 540f04b1-733d-4e51-a878-dcceed63a95b

📥 Commits

Reviewing files that changed from the base of the PR and between 7814565 and f719e13.

📒 Files selected for processing (4)
  • crates/lib/src/core/db/merkle_node.rs
  • crates/lib/src/core/db/merkle_node/file_backend.rs
  • crates/lib/src/core/db/merkle_node/merkle_node_db.rs
  • crates/lib/src/model/merkle_tree/node.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/lib/src/core/db/merkle_node.rs
  • crates/lib/src/model/merkle_tree/node.rs

Comment thread crates/lib/src/core/db/merkle_node/file_backend.rs Outdated
Comment thread crates/lib/src/core/db/merkle_node/file_backend.rs Outdated
Comment thread crates/lib/src/core/db/merkle_node/file_backend.rs
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_file_backend branch from f719e13 to cf56acd Compare May 1, 2026 20:54
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/lib/src/model/merkle_tree/node.rs (1)

76-92: ⚡ Quick win

Add a round-trip test for each node variant.

This is now the single dtype→decoder dispatch point. A small table-driven test that serializes each concrete node, then feeds its node_type() and bytes back through from_type_and_bytes, would catch enum/marker drift early.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/lib/src/model/merkle_tree/node.rs` around lines 76 - 92, Add a
table-driven round-trip unit test that for each concrete node type (CommitNode,
DirNode, FileNode, VNode, FileChunkNode) constructs a representative instance,
serializes it to msgpack bytes (using the same serializer used elsewhere),
obtains its dtype via the instance's node_type() or equivalent, then calls
EMerkleTreeNode::from_type_and_bytes(dtype, &bytes) and asserts the returned
EMerkleTreeNode variant matches and equals the original (or its contained
struct). Ensure the test covers all variants of the EMerkleTreeNode enum and
fails if any dtype→decoder mapping in from_type_and_bytes is incorrect.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/lib/src/model/merkle_tree/node.rs`:
- Around line 76-92: Add a table-driven round-trip unit test that for each
concrete node type (CommitNode, DirNode, FileNode, VNode, FileChunkNode)
constructs a representative instance, serializes it to msgpack bytes (using the
same serializer used elsewhere), obtains its dtype via the instance's
node_type() or equivalent, then calls
EMerkleTreeNode::from_type_and_bytes(dtype, &bytes) and asserts the returned
EMerkleTreeNode variant matches and equals the original (or its contained
struct). Ensure the test covers all variants of the EMerkleTreeNode enum and
fails if any dtype→decoder mapping in from_type_and_bytes is incorrect.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1ff46bee-c91f-45d7-9289-b4c7ece01845

📥 Commits

Reviewing files that changed from the base of the PR and between f719e13 and cf56acd.

📒 Files selected for processing (4)
  • crates/lib/src/core/db/merkle_node.rs
  • crates/lib/src/core/db/merkle_node/file_backend.rs
  • crates/lib/src/core/db/merkle_node/merkle_node_db.rs
  • crates/lib/src/model/merkle_tree/node.rs
✅ Files skipped from review due to trivial changes (1)
  • crates/lib/src/core/db/merkle_node.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/lib/src/core/db/merkle_node/file_backend.rs

@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_interfaces branch from c0a0819 to a1b9ba7 Compare May 1, 2026 21:03
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_file_backend branch 5 times, most recently from e44ba49 to c3bbb8d Compare May 4, 2026 17:56
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_interfaces branch from a1b9ba7 to 9a95bc3 Compare May 4, 2026 21:51
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_file_backend branch from c3bbb8d to 9b727e2 Compare May 4, 2026 21:51
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_interfaces branch 3 times, most recently from 5faad04 to 53bc183 Compare May 4, 2026 22:06
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_file_backend branch 2 times, most recently from 4fcf7bf to 1fd6e68 Compare May 4, 2026 22:21
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_interfaces branch from 53bc183 to 6827c1e Compare May 4, 2026 22:21
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_file_backend branch from 1fd6e68 to ee3ce04 Compare May 4, 2026 22:21
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_interfaces branch from 6827c1e to b0d9aa5 Compare May 4, 2026 22:25
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_file_backend branch 6 times, most recently from 3454bd6 to c5ee136 Compare May 5, 2026 00:40
@malcolmgreaves
Copy link
Copy Markdown
Collaborator Author

NOTE: Stacked PR. Must merge #510 before merging.

@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_interfaces branch from b0d9aa5 to 16ce636 Compare May 5, 2026 00:53
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_file_backend branch from c5ee136 to 96f735c Compare May 5, 2026 00:54
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_interfaces branch from 16ce636 to 5d23577 Compare May 5, 2026 01:18
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_file_backend branch from 96f735c to 814cb8f Compare May 5, 2026 01:18
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_interfaces branch from 5d23577 to 173f569 Compare May 5, 2026 17:09
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_file_backend branch from 814cb8f to 5babb48 Compare May 5, 2026 17:09
Comment on lines +161 to +166
impl Drop for FileNodeSession {
fn drop(&mut self) {
self.idempotent_finish()
.expect("Did not explicitly call finish() and encountered an error.");
}
}
Copy link
Copy Markdown
Collaborator Author

@malcolmgreaves malcolmgreaves May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am strongly considering removing this. It doesn't make conceptual sense to finish on drop.

Originally, I was thinking "make sure finish() is called if it is forgotten," but this is not that. We'll finish even on an error, which we don't want on a philosophical / conceptual design level.

The file backend writes eagerly and there's no "undo," so the writes will probably still be there. This finish is ensuring the file handles are flushed and closed. But arguably we don't want to do this if we're erroring out 🤔

@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_interfaces branch from 173f569 to f7a9eff Compare May 7, 2026 19:24
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_file_backend branch from 5babb48 to 466df7f Compare May 7, 2026 19:24
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_interfaces branch from f7a9eff to 8ff64ae Compare May 7, 2026 21:38
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_file_backend branch from 466df7f to 07c1e4e Compare May 7, 2026 21:39
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_interfaces branch 2 times, most recently from 40bfdbc to ef3729b Compare May 7, 2026 23:20
Implements the current custom file format based Merkle tree node storage
and retrieval logic for the `MerkleReader` and `MerkleWriter` traits as
the new `FileBackend` type. The implementation internally uses `MerkleDbError`,
but this is transformed at the trait boundry into an `OxenError`.

The write sessions that `FileBackend` produces are `Box`'d because they are
required to be object-safe (aka dyn-compatible).

Also refactors `MerkleNodeDB::to_node`'s logic into a new function on the
enum `EMerkleTreeNode::from_type_and_bytes`

Additionally, `MerkleNodeDB`'s `open_read_write`, `write_node`, and `add_child`
switch from generic `<N: TMerkleTreeNode>` to `&dyn TMerkleTreeNode` so they're
callable through the dyn-compatible `NodeWriteSession`.

Finally, fixes a bug in `MerkleNodeDb::open_read_only`: it was previously
incorrectly setting the `node_id: MerkleHash` to `0`. The function now requires
the right hash from the path to be provided and uses that.
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_dyn_file_backend branch from 07c1e4e to 6c2c648 Compare May 7, 2026 23:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant