Skip to content

FileBackend for Merkle{Reader,Writer}#498

Closed
malcolmgreaves wants to merge 1 commit intomg/merkle_interfacefrom
mg/merkle_file_backend
Closed

FileBackend for Merkle{Reader,Writer}#498
malcolmgreaves wants to merge 1 commit intomg/merkle_interfacefrom
mg/merkle_file_backend

Conversation

@malcolmgreaves
Copy link
Copy Markdown
Collaborator

Implements the current custom file format based Merkle tree node storage
and retrieval logic for the MerkleReader and MerkleWriter traits as
the new FileBackend type.

Also refactors MerkleNodeDB::to_node's logic into a new function on the
enum EMerkleTreeNode::from_type_and_bytes.

Implements the current custom file format based Merkle tree node storage
and retrieval logic for the `MerkleReader` and `MerkleWriter` traits as
the new `FileBackend` type.

Also refactors `MerkleNodeDB::to_node`'s logic into a new function on the
enum `EMerkleTreeNode::from_type_and_bytes`.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • File-based storage backend for Merkle node operations now available
  • Refactor

    • Centralized node deserialization logic for improved code maintainability
    • Enhanced storage backend modularity and separation of concerns

Walkthrough

This PR introduces a new file-based Merkle node storage backend (FileBackend) that implements reader and writer trait interfaces for managing Merkle tree nodes. It also refactors deserialization logic in MerkleNodeDB and adds a centralized type-dispatch deserialization method in EMerkleTreeNode.

Changes

Cohort / File(s) Summary
Module exports
crates/lib/src/core/db/merkle_node.rs
Declares new public file_backend module and publicly re-exports FileBackend, FileNodeSession, and FileWriteSession.
File-based backend implementation
crates/lib/src/core/db/merkle_node/file_backend.rs
Introduces FileBackend<'repo> implementing MerkleReader and MerkleWriter traits. FileWriteSession<'repo> and FileNodeSession manage write operations with explicit and implicit (drop-based) cleanup semantics. Read operations delegate to MerkleNodeDB with safe-fallback behavior for missing hashes.
Merkle deserialization refactoring
crates/lib/src/core/db/merkle_node/merkle_node_db.rs, crates/lib/src/model/merkle_tree/node.rs
MerkleNodeDB::to_node refactored to delegate to new EMerkleTreeNode::from_type_and_bytes method, centralizing type-to-bytes decoding logic and removing redundant variant matching.

Sequence Diagrams

sequenceDiagram
    participant Client
    participant FileBackend
    participant FileWriteSession
    participant FileNodeSession
    participant MerkleNodeDB

    Client->>FileBackend: begin()
    FileBackend->>FileWriteSession: new()
    FileWriteSession-->>Client: FileWriteSession

    Client->>FileWriteSession: node_session(node_id)
    FileWriteSession->>MerkleNodeDB: open_read_write()
    MerkleNodeDB-->>FileNodeSession: session
    FileNodeSession-->>Client: FileNodeSession

    Client->>FileNodeSession: add_child()
    FileNodeSession->>MerkleNodeDB: (delegates)

    Client->>FileNodeSession: finish()
    FileNodeSession->>MerkleNodeDB: close()
    note over FileNodeSession: finish marked, Drop ensures<br/>cleanup if not explicit

    Client->>FileWriteSession: finish()
    FileWriteSession-->>Client: Ok(())
Loading
sequenceDiagram
    participant Client
    participant FileBackend
    participant MerkleNodeDB

    Client->>FileBackend: exists(hash)
    FileBackend->>MerkleNodeDB: hash_exists()
    MerkleNodeDB-->>FileBackend: bool or false
    FileBackend-->>Client: Ok(bool)

    Client->>FileBackend: get_node(hash)
    FileBackend->>MerkleNodeDB: get_node()
    MerkleNodeDB-->>FileBackend: MerkleNodeRecord or None
    FileBackend-->>Client: Ok(Some/None)

    Client->>FileBackend: get_children(hash)
    FileBackend->>MerkleNodeDB: get_children()
    MerkleNodeDB-->>FileBackend: Vec<(hash, node)> or empty
    FileBackend-->>Client: Ok(Vec)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • CleanCut
  • rpschoenburg

Poem

🐰 A new backend springs to life, file-backed and true,
Sessions nested gently, with drop-based cleanup too!
Deserialization blooms from one central place,
Merkle trees now flourish with elegance and grace! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: implementing FileBackend to provide MerkleReader and MerkleWriter trait implementations.
Description check ✅ Passed The description clearly relates to the changeset, explaining the FileBackend implementation and the refactoring of deserialization logic into EMerkleTreeNode::from_type_and_bytes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch mg/merkle_file_backend

Review rate limit: 4/5 reviews remaining, refill in 12 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

@malcolmgreaves malcolmgreaves changed the base branch from main to mg/merkle_interface April 29, 2026 03:28
@malcolmgreaves
Copy link
Copy Markdown
Collaborator Author

STACKED PR: Do not merge until #472 has been merged.

@malcolmgreaves
Copy link
Copy Markdown
Collaborator Author

@coderabbitai please re-review this PR

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

@malcolmgreaves Sure! I'll re-review this PR now. Note that as mentioned in your stacked PR comment, this PR depends on #472 being merged first — I'll keep that context in mind during the review.


🧠 Learnings used
Learnt from: malcolmgreaves
Repo: Oxen-AI/Oxen PR: 502
File: crates/lib/src/model/merkle_tree/merkle_transport.rs:19-33
Timestamp: 2026-04-29T08:07:15.242Z
Learning: In Oxen-AI/Oxen, the `LegacyClientPush` variant of `PackOptions` (in `crates/lib/src/model/merkle_tree/merkle_transport.rs`) documents a `{prefix}/{suffix}/...` tar-gz layout, but hashes smaller than 0x1000 can produce an empty `dir_suffix` via `HexHash::node_db_prefix()`, collapsing the path to a single segment. This cannot be patched without a breaking migration. The planned resolution is either the zero-padding migration from PR `#474` (which zero-pads `MerkleHash` hex display to 32 chars) or a full migration to the LMDB backend if it proves strictly superior. Do not flag the single-segment path case in `LegacyClientPush` as a blocking issue.

Learnt from: malcolmgreaves
Repo: Oxen-AI/Oxen PR: 476
File: crates/lib/src/model/merkle_tree/merkle_hash.rs:128-130
Timestamp: 2026-04-21T22:36:11.956Z
Learning: In the Oxen-AI/Oxen codebase, `HexHash::node_db_prefix()` in `crates/lib/src/model/merkle_tree/merkle_hash.rs` has a known edge case where hashes smaller than 0x1000 produce an empty `dir_suffix`, resulting in a malformed single-segment path. The definitive fix is PR `#474` ("Make hex-encoded hash always 32 characters"), which zero-pads the `MerkleHash` hex display to 32 chars (`{:032x}`), but it requires a data migration. PR `#476` is an interim stopgap that avoids panics but does not resolve the empty-suffix edge case.
✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/lib/src/core/db/merkle_node/file_backend.rs (1)

56-67: 💤 Low value

Docstring doesn't mention error behavior for missing hashes.

Unlike get_node which explicitly handles missing hashes by returning Ok(None), get_children will propagate an error from open_read_only if the hash doesn't exist. The docstring should clarify this to match the actual behavior.

📝 Suggested docstring update
     /// Retrieves the children of the node with the given `hash` from the store.
-    /// An empty vec means that either the node is a not a directory or virtual node or it is one
-    /// but has no files.
+    /// An empty vec means that either the node is not a directory or virtual node, or it is one
+    /// but has no files. Returns an error if the node does not exist; callers should use
+    /// [`exists`] first if they need to distinguish missing nodes from other errors.
     ///
     /// Alias for [`MerkleNodeDB::open_read_only`] & a `.map()` call..
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/lib/src/core/db/merkle_node/file_backend.rs` around lines 56 - 67, The
docstring for get_children is missing the error behavior when the provided hash
is absent; update the comment for fn get_children to explicitly state that,
unlike get_node which returns Ok(None) for missing hashes, get_children calls
MerkleNodeDB::open_read_only and will propagate an Err (MerkleDbError) if the
hash does not exist or cannot be opened, and document that an empty Vec still
indicates either a non-directory/virtual node or a directory with no files.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/lib/src/core/db/merkle_node/file_backend.rs`:
- Around line 56-67: The docstring for get_children is missing the error
behavior when the provided hash is absent; update the comment for fn
get_children to explicitly state that, unlike get_node which returns Ok(None)
for missing hashes, get_children calls MerkleNodeDB::open_read_only and will
propagate an Err (MerkleDbError) if the hash does not exist or cannot be opened,
and document that an empty Vec still indicates either a non-directory/virtual
node or a directory with no files.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 869f49f4-0b86-4fa7-9424-5551a5ce0742

📥 Commits

Reviewing files that changed from the base of the PR and between 4168bb8 and 44e7443.

📒 Files selected for processing (4)
  • crates/lib/src/core/db/merkle_node.rs
  • crates/lib/src/core/db/merkle_node/file_backend.rs
  • crates/lib/src/core/db/merkle_node/merkle_node_db.rs
  • crates/lib/src/model/merkle_tree/node.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant