Add merkle tree interfaces for reading & writing#510
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds object-safe Merkle read/write traits and types (MerkleReader, MerkleWriter, MerkleNodeRecord, MerkleStore) and refactors MerkleNodeDB/file-path APIs to accept repository paths ( ChangesMerkle API: reader/writer/store
MerkleNodeDB and call-site migration to repo paths
Sequence DiagramsequenceDiagram
participant Client
participant MerkleWriter
participant MerkleWriteSession
participant NodeWriteSession
participant Backend
Client->>MerkleWriter: begin()
MerkleWriter-->>Client: Box<dyn MerkleWriteSession>
Client->>MerkleWriteSession: create_node(node, parent_id)
MerkleWriteSession-->>Client: Box<dyn NodeWriteSession>
Client->>NodeWriteSession: add_child(child)
Client->>NodeWriteSession: finish() (consumes Box)
NodeWriteSession->>Backend: persist node data
Client->>MerkleWriteSession: finish() (consumes Box)
MerkleWriteSession->>Backend: commit accumulated writes
Backend-->>Client: Result (Ok / Err)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/lib/src/model/merkle_tree/merkle_reader.rs`:
- Around line 22-29: The get_children API currently returns
Result<Vec<(MerkleHash, MerkleTreeNode)>, OxenError> which conflates "missing
node" with "no children"; change the trait signature for get_children to return
Result<Option<Vec<(MerkleHash, MerkleTreeNode)>>, OxenError> (or alternatively
return a dedicated NotFound OxenError) so callers can distinguish absence from
an empty child list, then update all implementations of
MerkleReader::get_children and any callers to handle the Option (treat None as
missing hash and Some(vec![]) as a valid leaf/empty directory), and update the
doc comment above get_children to describe the new semantics.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 43ddd224-ef75-4f9d-8fc0-0f6368a3f7f1
📒 Files selected for processing (3)
crates/lib/src/model/merkle_tree.rscrates/lib/src/model/merkle_tree/merkle_reader.rscrates/lib/src/model/merkle_tree/merkle_writer.rs
acfd963 to
c0a0819
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/lib/src/model/merkle_tree/merkle_writer.rs`:
- Around line 11-19: The begin method's returned session lifetime is not tied to
the self borrow; change the signature of fn begin to borrow self with the same
lifetime parameter (i.e. make it fn begin<'a>(&'a self) -> Result<Box<dyn
MerkleWriteSession + 'a>, OxenError>) so the Box<dyn MerkleWriteSession + 'a>
cannot outlive the &self borrow, matching the documented contract and the
pattern used by MerkleWriteSession::create_node.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 16bbfaf2-593b-4131-a6de-17994a29271a
📒 Files selected for processing (3)
crates/lib/src/model/merkle_tree.rscrates/lib/src/model/merkle_tree/merkle_reader.rscrates/lib/src/model/merkle_tree/merkle_writer.rs
✅ Files skipped from review due to trivial changes (1)
- crates/lib/src/model/merkle_tree.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/lib/src/model/merkle_tree/merkle_reader.rs
9a95bc3 to
7cf2bb8
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rs (1)
141-141: ⚡ Quick winUnify Merkle write DB target path for rewritten nodes
Line 141andLine 194write using&old_repo.path, whileLine 257writes using&new_repo.path. This currently works because both repos share the same path, but it bakes in an implicit assumption. Use one write target consistently (new_repo.path) to avoid split-tree writes if these repos ever diverge.Proposed diff
- let mut root_commit_db = - MerkleNodeDB::open_read_write(&old_repo.path, &commit_node, root_node.parent_id)?; + let mut root_commit_db = + MerkleNodeDB::open_read_write(&new_repo.path, &commit_node, root_node.parent_id)?; ... - let mut dir_db = - MerkleNodeDB::open_read_write(&old_repo.path, &dir, node.parent_id)?; + let mut dir_db = + MerkleNodeDB::open_read_write(&new_repo.path, &dir, node.parent_id)?;Also applies to: 193-195, 257-261
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rs` at line 141, The Merkle write DB target is inconsistent: calls to MerkleNodeDB::open_read_write are using &old_repo.path in some places (e.g., the invocation around commit_node/root_node.parent_id and the block at lines ~193-195) while another call uses &new_repo.path (~257-261); change the write target for all rewritten-node writes to use &new_repo.path so all MerkleNodeDB::open_read_write(...) invocations that are writing rewritten nodes consistently pass &new_repo.path (update the two calls that currently pass &old_repo.path to pass &new_repo.path instead).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@crates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rs`:
- Line 141: The Merkle write DB target is inconsistent: calls to
MerkleNodeDB::open_read_write are using &old_repo.path in some places (e.g., the
invocation around commit_node/root_node.parent_id and the block at lines
~193-195) while another call uses &new_repo.path (~257-261); change the write
target for all rewritten-node writes to use &new_repo.path so all
MerkleNodeDB::open_read_write(...) invocations that are writing rewritten nodes
consistently pass &new_repo.path (update the two calls that currently pass
&old_repo.path to pass &new_repo.path instead).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 90fce15b-cab7-47c5-bff0-d9d14383696a
📒 Files selected for processing (12)
crates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rscrates/lib/src/core/db/merkle_node/merkle_node_db.rscrates/lib/src/core/v_latest/commits.rscrates/lib/src/core/v_latest/entries.rscrates/lib/src/core/v_latest/index/commit_merkle_tree.rscrates/lib/src/core/v_old/v0_19_0/index/commit_merkle_tree.rscrates/lib/src/model/merkle_tree.rscrates/lib/src/model/merkle_tree/merkle_reader.rscrates/lib/src/model/merkle_tree/merkle_writer.rscrates/lib/src/model/merkle_tree/node/merkle_tree_node.rscrates/lib/src/repositories/commits/commit_writer.rscrates/lib/src/repositories/tree.rs
✅ Files skipped from review due to trivial changes (1)
- crates/lib/src/model/merkle_tree/merkle_reader.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- crates/lib/src/model/merkle_tree.rs
- crates/lib/src/model/merkle_tree/merkle_writer.rs
7cf2bb8 to
5faad04
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/lib/src/core/db/merkle_node/merkle_node_db.rs (1)
472-482:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winReset
children_filecursor before reading inmap.
mapreads withread_to_endfrom the current cursor position. Ifmapis called more than once on the same handle, subsequent calls can read 0 bytes and fail while parsing offsets.Suggested fix
let Some(children_file) = self.children_file.as_mut() else { return Err(MerkleDbError::WriteBeforeOpen); }; + children_file.seek(SeekFrom::Start(0))?; // Parse the node parent id let data_type = MerkleTreeNodeType::from_u8(lookup.data_type)?; let parent_id = MerkleTreeNode::deserialize_id(&lookup.data, data_type)?;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/lib/src/core/db/merkle_node/merkle_node_db.rs` around lines 472 - 482, The children_file cursor must be reset before reading because read_to_end reads from the current position; update the code in the method that calls children_file.read_to_end (the map logic surrounding children_file, MerkleTreeNodeType::from_u8 and MerkleTreeNode::deserialize_id) to seek the children_file to the start (using Seek::seek with SeekFrom::Start(0)) before calling read_to_end, propagate any seek errors (like other io errors) and ensure the file handle implements Seek by importing std::io::Seek/SeekFrom if needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/lib/src/core/db/merkle_node/merkle_node_db.rs`:
- Around line 305-309: The open function currently calls
util::fs::create_dir_all(&path) unconditionally, which mutates the filesystem
even when opened with read_only=true; change the logic in fn open
(MerkleNodeDb::open) to only create directories when read_only is false (i.e.,
if !path.exists() && !read_only then call util::fs::create_dir_all(&path) and
map_err(MerkleDbError::dir_create)), and for the read_only && !path.exists()
case return a suitable MerkleDbError (or propagate an existing error) instead of
creating directories; ensure references to create_dir_all and
MerkleDbError::dir_create are updated accordingly.
---
Outside diff comments:
In `@crates/lib/src/core/db/merkle_node/merkle_node_db.rs`:
- Around line 472-482: The children_file cursor must be reset before reading
because read_to_end reads from the current position; update the code in the
method that calls children_file.read_to_end (the map logic surrounding
children_file, MerkleTreeNodeType::from_u8 and MerkleTreeNode::deserialize_id)
to seek the children_file to the start (using Seek::seek with
SeekFrom::Start(0)) before calling read_to_end, propagate any seek errors (like
other io errors) and ensure the file handle implements Seek by importing
std::io::Seek/SeekFrom if needed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b3dffc1a-a860-4264-893c-7a6103f79a57
📒 Files selected for processing (12)
crates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rscrates/lib/src/core/db/merkle_node/merkle_node_db.rscrates/lib/src/core/v_latest/commits.rscrates/lib/src/core/v_latest/entries.rscrates/lib/src/core/v_latest/index/commit_merkle_tree.rscrates/lib/src/core/v_old/v0_19_0/index/commit_merkle_tree.rscrates/lib/src/model/merkle_tree.rscrates/lib/src/model/merkle_tree/merkle_reader.rscrates/lib/src/model/merkle_tree/merkle_writer.rscrates/lib/src/model/merkle_tree/node/merkle_tree_node.rscrates/lib/src/repositories/commits/commit_writer.rscrates/lib/src/repositories/tree.rs
✅ Files skipped from review due to trivial changes (3)
- crates/lib/src/model/merkle_tree.rs
- crates/lib/src/repositories/tree.rs
- crates/lib/src/core/v_latest/index/commit_merkle_tree.rs
🚧 Files skipped from review as they are similar to previous changes (8)
- crates/lib/src/core/v_latest/entries.rs
- crates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rs
- crates/lib/src/repositories/commits/commit_writer.rs
- crates/lib/src/core/v_old/v0_19_0/index/commit_merkle_tree.rs
- crates/lib/src/model/merkle_tree/merkle_reader.rs
- crates/lib/src/model/merkle_tree/merkle_writer.rs
- crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs
- crates/lib/src/core/v_latest/commits.rs
6827c1e to
b0d9aa5
Compare
|
NOTE: Stacked PR! Must merge #523 before merging. |
b009a7f to
6736d09
Compare
b0d9aa5 to
16ce636
Compare
6736d09 to
c53b23d
Compare
16ce636 to
5d23577
Compare
5d23577 to
173f569
Compare
CleanCut
left a comment
There was a problem hiding this comment.
👍🏻 This implementation can work.
🎁 There are some opportunities for simplification that you should consider:
- Just like with the
VersionStoretrait, the trait design here can be simplified from 5 traits to just one:MerkleStore- A single trait design can still allow for implementations to be eager, buffer writes in-memory, and provide transaction semantics if necessary.
- The trait implementation can internally handle transaction semantics within its methods, rather than producing nested types representing various sessions that the user must create/use/finalize manually.
- Inner transactions/locking can be handled with private fields on the implementation struct, and happen within
create_node,add_child, (etc.) implementations. - The current design allows you to drop the structs without calling
finalize(). It would be easier to avoid problems if whateverfinalize()is supposed to do is folded into the internal transaction implementation.
MerkleNodeRecordis redundant (see comment below)
8ff64ae to
40bfdbc
Compare
New interface traits: `MerkleReader`, `MerkleWriter`, and unifying `MerkleStore`. **`MerkleStore`** A `MerkleStore` is just a `MerkleReader` + `MerkleWriter` where their `Error` type is the same. Includes a blanket impl. for any type that has a reader & writer impl. with equal `Error` types. **`MerkleReader`** Can retrieve Merkle tree nodes from storage given a hash. Supports existence checks, retrieving the node, and retrieving a node's children, if applicable. **`MerkleWriter`** Produces a `MerkleWriteSession` that allows for writing arbitrary Merkle tree nodes to the underlying physical store. Within the write session, writing a node creates a new `NodeWriteSession` instance. This instance can write a node as well as a node's children. Both session types have `finish`, which ensures their writes go to the underlying physical store. These tiered write sessions match how the Oxen code writes Merkle tree nodes. Additionally, this design allows for implementations to be eager, buffer writes in- memory, or provide transaction semantics.
40bfdbc to
ef3729b
Compare
New interface traits:
MerkleReader,MerkleWriter, and unifyingMerkleStore.MerkleStoreA
MerkleStoreis just aMerkleReader+MerkleWriterwhere theirErrortypeis the same. Includes a blanket impl. for any type that has a reader & writer impl.
with equal
Errortypes.MerkleReaderCan retrieve Merkle tree nodes from storage given a hash. Supports existence
checks, retrieving the node, and retrieving a node's children, if applicable.
MerkleWriterProduces a
MerkleWriteSessionthat allows for writing arbitrary Merkle tree nodesto the underlying physical store. Within the write session, writing a node creates a
new
NodeWriteSessioninstance. This instance can write a node as well as a node'schildren. Both session types have
finish, which ensures their writes go to theunderlying physical store.
These tiered write sessions match how the Oxen code writes Merkle tree nodes.
Additionally, this design allows for implementations to be eager, buffer writes in-
memory, or provide transaction semantics.