Add merkle tree interfaces for reading & writing#472
Add merkle tree interfaces for reading & writing#472malcolmgreaves wants to merge 1 commit intomg/oxen_macrosfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds read/write Merkle tree abstractions: new Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant MW as MerkleWriter
participant WS as MerkleWriteSession
participant NS as NodeWriteSession
Client->>MW: begin()
MW-->>Client: Session (WS)
Client->>WS: create_node(node, parent_id)
WS-->>Client: NodeSession (NS)
Client->>NS: add_child(child_node)
NS-->>Client: Ok
Client->>NS: finish()
NS-->>Client: Ok
Client->>WS: finish()
WS-->>Client: Ok
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Why are there lifetimes here? The upcoming LMDB implementation will need to maintain a heed::RwTxn for some explicit lifetime in the MerkleWriteSession. That same read-write transaction will be borrowed in the NodeWriteSession too, so that also needs to support an explicit lifetime.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/lib/src/model/merkle_tree/merkle_reader.rs`:
- Around line 12-22: Replace the typo "underyling" with "underlying" in the
public API documentation comments for the Merkle reader methods — update the doc
comments for exists, get_node, get_children (and any other MerkleReader methods
in this file showing the same typo) so the docs read "underlying storage layer"
consistently.
- Around line 29-35: MerkleNodeRecord currently has only pub(crate) fields and
no public constructor, and the MerkleReader trait's get_node is not implemented
anywhere; add a pub fn new(...) -> MerkleNodeRecord constructor on
MerkleNodeRecord that accepts (hash: MerkleHash, dtype: MerkleTreeNodeType,
parent_id: Option<MerkleHash>, node: EMerkleTreeNode, num_children: u64) and
returns Self, then implement the MerkleReader::get_node method on the
appropriate type to construct and return MerkleNodeRecord via that new() so
external callers can obtain instances without accessing internal fields; also
correct the documentation typos changing "underyling" to "underlying" wherever
present.
In `@crates/lib/src/model/merkle_tree/merkle_writer.rs`:
- Around line 29-37: Update the documentation comments for
MerkleWriteSession/NodeWriteSession to correct typos: change "comitting" to
"committing", "finialize" to "finalize", "Peristance" to "Persistence",
"invairant" to "invariant", "guarentee" to "guarantee", and "chose" to "choose"
in the section that references create_node, add_child, and finish so the session
semantics read correctly and consistently.
- Around line 15-17: Update the intra-doc links in the `begin` method docs to be
explicit: change the first ambiguous [`finish`] (referring to the returned
session) to [`MerkleWriteSession::finish`] and the second ambiguous [`finish`]
(referring to per-node sessions) to [`NodeWriteSession::finish`], so the docs
unambiguously point to the correct methods.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c87ee43a-2cf4-45d1-96bb-1539f358f5f7
📒 Files selected for processing (3)
crates/lib/src/model/merkle_tree.rscrates/lib/src/model/merkle_tree/merkle_reader.rscrates/lib/src/model/merkle_tree/merkle_writer.rs
68453db to
f25df1c
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
crates/lib/src/model/merkle_tree/merkle_writer.rs (1)
31-33:⚠️ Potential issue | 🟡 MinorQualify remaining
finishintra-doc links to avoid ambiguous rustdoc resolution.Line 31 and Line 37 use unqualified [
finish] while referring to different methods (NodeWriteSession::finishvsMerkleWriteSession::finish). Please make these explicit as done in thebegindocs.Proposed doc fix
-/// Each [`NodeWriteSession`] must have its [`finish`] called to finalize the written node -/// information. Once all nodes have been written, the [`finish`] method of the [`MerkleWriteSession`] +/// Each [`NodeWriteSession`] must have its [`NodeWriteSession::finish`] called to finalize the written node +/// information. Once all nodes have been written, the [`MerkleWriteSession::finish`] method of the [`MerkleWriteSession`] /// must be called to persist the changes to the store. @@ -/// are called. The invariant is that [`finish`] must be called to **ensure** that writes are +/// are called. The invariant is that [`MerkleWriteSession::finish`] and each +/// [`NodeWriteSession::finish`] must be called to **ensure** that writes are /// persisted.Also applies to: 37-37
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/lib/src/model/merkle_tree/merkle_writer.rs` around lines 31 - 33, Doc comments use ambiguous unqualified [`finish`] links; update them to fully-qualified intra-doc links `NodeWriteSession::finish` and `MerkleWriteSession::finish` so rustdoc resolves to the correct methods. Locate the doc sentences referencing the session finish behavior (the paragraph that mentions "Each `NodeWriteSession` must have its `finish` called..." and the paragraph about finalizing the Merkle write session) and replace the plain [`finish`] with `[`NodeWriteSession::finish`]` and `[`MerkleWriteSession::finish`]` respectively, matching the style used for `begin`.
🧹 Nitpick comments (1)
crates/lib/src/model/merkle_tree/merkle_writer.rs (1)
63-66: Clarify the contract for invalidfinishordering.Line 64 says calling
finishwith active node sessions is “invalid,” but expected behavior is unspecified (error vs panic vs rollback). Please define this contract explicitly so implementations stay consistent.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/lib/src/model/merkle_tree/merkle_writer.rs` around lines 63 - 66, The docs for the MerkleWriter trait’s finish(self: Box<Self>) -> Result<(), Self::Error> are ambiguous about what "invalid" means when there are active node write sessions; update the contract comment on finish to state that calling finish while any node write sessions are still active must return an Err(Self::Error) (not panic) describing the invalid ordering, and specify that implementations should not perform partial commits or rollbacks but must leave the writer in a well-defined state documented (e.g., unchanged) so callers can handle the error; reference the finish method and the Self::Error type in the comment so implementers (and callers of open/finish session APIs) follow the same error-return semantics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@crates/lib/src/model/merkle_tree/merkle_writer.rs`:
- Around line 31-33: Doc comments use ambiguous unqualified [`finish`] links;
update them to fully-qualified intra-doc links `NodeWriteSession::finish` and
`MerkleWriteSession::finish` so rustdoc resolves to the correct methods. Locate
the doc sentences referencing the session finish behavior (the paragraph that
mentions "Each `NodeWriteSession` must have its `finish` called..." and the
paragraph about finalizing the Merkle write session) and replace the plain
[`finish`] with `[`NodeWriteSession::finish`]` and
`[`MerkleWriteSession::finish`]` respectively, matching the style used for
`begin`.
---
Nitpick comments:
In `@crates/lib/src/model/merkle_tree/merkle_writer.rs`:
- Around line 63-66: The docs for the MerkleWriter trait’s finish(self:
Box<Self>) -> Result<(), Self::Error> are ambiguous about what "invalid" means
when there are active node write sessions; update the contract comment on finish
to state that calling finish while any node write sessions are still active must
return an Err(Self::Error) (not panic) describing the invalid ordering, and
specify that implementations should not perform partial commits or rollbacks but
must leave the writer in a well-defined state documented (e.g., unchanged) so
callers can handle the error; reference the finish method and the Self::Error
type in the comment so implementers (and callers of open/finish session APIs)
follow the same error-return semantics.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9829bffb-72ae-4e34-9168-677dd0297054
📒 Files selected for processing (3)
crates/lib/src/model/merkle_tree.rscrates/lib/src/model/merkle_tree/merkle_reader.rscrates/lib/src/model/merkle_tree/merkle_writer.rs
✅ Files skipped from review due to trivial changes (1)
- crates/lib/src/model/merkle_tree/merkle_reader.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/lib/src/model/merkle_tree.rs
8c766ba to
96ee80c
Compare
|
@coderabbitai please re-review this entire PR |
|
🧠 Learnings used✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
crates/lib/src/model/merkle_tree/merkle_reader.rs (1)
43-56: EnforceMerkleNodeRecordinvariants in constructor.
newcurrently acceptshash/dtypeindependently fromnode(Line 43), so callers can build inconsistent records. Consider validating these relationships to prevent invalid state construction.Suggested guard in constructor
impl MerkleNodeRecord { pub fn new( hash: MerkleHash, dtype: MerkleTreeNodeType, parent_id: Option<MerkleHash>, node: EMerkleTreeNode, num_children: u64, ) -> Self { + debug_assert_eq!(&hash, node.hash()); + debug_assert_eq!(dtype, node.node_type()); Self { hash, dtype, parent_id, node, num_children, } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/lib/src/model/merkle_tree/merkle_reader.rs` around lines 43 - 56, The constructor MerkleNodeRecord::new currently accepts hash, dtype, parent_id, node, and num_children independently which allows inconsistent records; update new to validate invariants (e.g., ensure node's internal hash equals the provided hash, node variant/type matches dtype, and num_children matches the node variant's expected children count) and fail fast on mismatch—preferably change the signature to return Result<Self, MerkleError> (or another error type) and return Err(...) when checks fail, or at minimum assert/unwrap with clear messages; reference the MerkleNodeRecord::new function, the dtype and node parameters, and num_children/parent_id fields when adding these checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/lib/src/model/merkle_tree/merkle_writer.rs`:
- Around line 61-65: The trait method create_node currently takes &'b self but
must be changed to &'b mut self to allow transaction-backed backends to
open/commit write transactions; update the signature of create_node<N:
TMerkleTreeNode>(&'b mut self, node: &N, parent_id: Option<MerkleHash>) ->
Result<Self::NodeSession<'b>, Self::Error>, and propagate this change to any
trait definitions and where NodeWriteSession::add_child(&mut self, ...) is
expected so implementations (e.g., any MerkleWriteSession impls or future
heed-backed sessions) can mutably borrow self to manage transaction state
without relying on interior mutability; adjust trait bounds/usages and tests
accordingly.
---
Nitpick comments:
In `@crates/lib/src/model/merkle_tree/merkle_reader.rs`:
- Around line 43-56: The constructor MerkleNodeRecord::new currently accepts
hash, dtype, parent_id, node, and num_children independently which allows
inconsistent records; update new to validate invariants (e.g., ensure node's
internal hash equals the provided hash, node variant/type matches dtype, and
num_children matches the node variant's expected children count) and fail fast
on mismatch—preferably change the signature to return Result<Self, MerkleError>
(or another error type) and return Err(...) when checks fail, or at minimum
assert/unwrap with clear messages; reference the MerkleNodeRecord::new function,
the dtype and node parameters, and num_children/parent_id fields when adding
these checks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 84dede9a-a6c4-4cc0-98e1-99730b5baa5f
📒 Files selected for processing (3)
crates/lib/src/model/merkle_tree.rscrates/lib/src/model/merkle_tree/merkle_reader.rscrates/lib/src/model/merkle_tree/merkle_writer.rs
| fn create_node<'b, N: TMerkleTreeNode>( | ||
| &'b self, | ||
| node: &N, | ||
| parent_id: Option<MerkleHash>, | ||
| ) -> Result<Self::NodeSession<'b>, Self::Error>; |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In the Rust heedcrate, do write operations onRwTxn require mutable access (&mut self), and is borrowing a mutable transaction into nested write-session objects the idiomatic pattern?
💡 Result:
Yes, write operations on RwTxn in the Rust heed crate require mutable access (&mut self). The Database::put method signature is pub fn put<'a, T>(&self, txn: &mut RwTxn<', ', T>, key: &'a KC::EItem, data: &'a DC::EItem) -> Result<>, explicitly taking &mut RwTxn for writes like put, delete, etc. All official examples use &mut wtxn for write operations, such as db.put(&mut wtxn, key, value)?. Yes, borrowing a mutable transaction into nested write-session objects is the idiomatic pattern. RwTxn supports nested transactions via RwTxn::nested, which takes a &mut RwTxn (parent) and returns a new RwTxn<'e, 'p> borrowing the parent's lifetime. Recent versions (e.g., 0.22.1-nested-rtxns) emphasize nested rtxns, and source code shows pub(crate) fn nested<'p: 'e>(env: &'e Env, parent: &'p mut RwTxn) -> Result<RwTxn<'e, 'p, T>>. This enables hierarchical transactions while maintaining LMDB's single-writer rule through borrowing.
Citations:
- 1: https://docs.rs/heed/latest/heed/struct.RwTxn.html
- 2: https://docs.rs/crate/heed/0.10.6/source/src/txn.rs
- 3: https://docs.rs/heed/0.11.0/heed/struct.Database.html
- 4: https://docs.rs/heed/latest/heed/struct.Env.html
- 5: https://docs.rs/heed/0.11.0/heed/
- 6: We can't write the entries of one database into another meilisearch/heed#189
- 7: https://docs.rs/heed3/latest/heed3/struct.RwTxn.html
🏁 Script executed:
# Find the trait definition and understand the structure
find crates/lib/src/model/merkle_tree -type f -name "*.rs" | head -20Repository: Oxen-AI/Oxen
Length of output: 913
🏁 Script executed:
# Look at the merkle_writer.rs file to see the trait definition
head -100 crates/lib/src/model/merkle_tree/merkle_writer.rsRepository: Oxen-AI/Oxen
Length of output: 4952
🏁 Script executed:
# Search for create_node calls to understand how it's used
rg "create_node" crates/lib/src/model/merkle_tree/ -A 3 -B 1Repository: Oxen-AI/Oxen
Length of output: 1189
🏁 Script executed:
# Check if heed is a dependency
rg "heed" crates/lib/Cargo.tomlRepository: Oxen-AI/Oxen
Length of output: 38
🏁 Script executed:
# Search for implementations of MerkleWriteSession
rg "impl.*MerkleWriteSession" --type rustRepository: Oxen-AI/Oxen
Length of output: 38
🏁 Script executed:
# Search for heed usage anywhere in the codebase
rg "heed" --type rustRepository: Oxen-AI/Oxen
Length of output: 1623
🏁 Script executed:
# Look for NodeWriteSession implementations which might reveal transaction handling
rg "impl.*NodeWriteSession" --type rustRepository: Oxen-AI/Oxen
Length of output: 38
🏁 Script executed:
# Check Cargo.toml in different locations for dependencies
find . -name "Cargo.toml" -exec grep -l "heed\|lmdb\|rocksdb" {} \;Repository: Oxen-AI/Oxen
Length of output: 155
🏁 Script executed:
# Search for struct definitions that might implement the traits
rg "struct.*Session" crates/lib/src/model/merkle_tree/ -A 5Repository: Oxen-AI/Oxen
Length of output: 38
🏁 Script executed:
# Look at the full merkle_tree module structure
ls -la crates/lib/src/model/merkle_tree/Repository: Oxen-AI/Oxen
Length of output: 537
🏁 Script executed:
# Search for any create_node method calls in the codebase
rg "\.create_node\(" --type rust -B 2 -A 2Repository: Oxen-AI/Oxen
Length of output: 38
🏁 Script executed:
# Check if there are any imports or uses of MerkleWriteSession
rg "MerkleWriteSession\|MerkleWriter" --type rustRepository: Oxen-AI/Oxen
Length of output: 38
🏁 Script executed:
# Search across all crates
rg "impl MerkleWriteSession\|impl MerkleWriter" --type rustRepository: Oxen-AI/Oxen
Length of output: 38
🏁 Script executed:
# Search for Merkle in all files to understand usage
rg "Merkle" crates/lib/src/ -l | head -20Repository: Oxen-AI/Oxen
Length of output: 795
🏁 Script executed:
# Check if there's any git history or comments about this
rg "create_node\|NodeWriteSession\|MerkleWriter" crates/lib/src/ --type rust -B 1 -A 1Repository: Oxen-AI/Oxen
Length of output: 38
🏁 Script executed:
# Look at the full merkle_writer.rs file to see all trait definitions
cat crates/lib/src/model/merkle_tree/merkle_writer.rsRepository: Oxen-AI/Oxen
Length of output: 4952
Change create_node to &'b mut self for transaction-backed implementations.
The current signature with &'b self is inconsistent with NodeWriteSession::add_child(&mut self, ...), which also performs state modification. For backends using transaction-based storage like heed, mutable access is required to safely manage write transactions. The current immutable design would necessitate interior mutability workarounds, which is less idiomatic and harder to reason about. Since no implementations of MerkleWriteSession exist yet in the codebase, this change can be made without breaking impact.
Proposed API adjustment
- fn create_node<'b, N: TMerkleTreeNode>(
- &'b self,
+ fn create_node<'b, N: TMerkleTreeNode>(
+ &'b mut self,
node: &N,
parent_id: Option<MerkleHash>,
) -> Result<Self::NodeSession<'b>, Self::Error>;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fn create_node<'b, N: TMerkleTreeNode>( | |
| &'b self, | |
| node: &N, | |
| parent_id: Option<MerkleHash>, | |
| ) -> Result<Self::NodeSession<'b>, Self::Error>; | |
| fn create_node<'b, N: TMerkleTreeNode>( | |
| &'b mut self, | |
| node: &N, | |
| parent_id: Option<MerkleHash>, | |
| ) -> Result<Self::NodeSession<'b>, Self::Error>; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/lib/src/model/merkle_tree/merkle_writer.rs` around lines 61 - 65, The
trait method create_node currently takes &'b self but must be changed to &'b mut
self to allow transaction-backed backends to open/commit write transactions;
update the signature of create_node<N: TMerkleTreeNode>(&'b mut self, node: &N,
parent_id: Option<MerkleHash>) -> Result<Self::NodeSession<'b>, Self::Error>,
and propagate this change to any trait definitions and where
NodeWriteSession::add_child(&mut self, ...) is expected so implementations
(e.g., any MerkleWriteSession impls or future heed-backed sessions) can mutably
borrow self to manage transaction state without relying on interior mutability;
adjust trait bounds/usages and tests accordingly.
6bf35c2 to
3fab13e
Compare
96ee80c to
ec54bc8
Compare
3fab13e to
3b3d207
Compare
ec54bc8 to
2fb9ae7
Compare
3b3d207 to
eefbbef
Compare
New interface traits: `MerkleReader`, `MerkleWriter`, and unifying `MerkleStore`. **`MerkleStore`** A `MerkleStore` is just a `MerkleReader` + `MerkleWriter` where their `Error` type is the same. Includes a blanket impl. for any type that has a reader & writer impl. with equal `Error` types. **`MerkleReader`** Can retrieve Merkle tree nodes from storage given a hash. Supports existence checks, retrieving the node, and retrieving a node's children, if applicable. **`MerkleWriter`** Produces a `MerkleWriteSession` that allows for writing arbitrary Merkle tree nodes to the underlying physical store. Within the write session, writing a node creates a new `NodeWriteSession` instance. This instance can write a node as well as a node's children. Both session types have `finish`, which ensures their writes go to the underlying physical store. These tiered write sessions match how the Oxen code writes Merkle tree nodes. Additionally, this design allows for implementations to be eager, buffer writes in- memory, or provide transaction semantics.
2fb9ae7 to
4168bb8
Compare
|
STACKED PR: Do not merge until #496 has been merged. |
New interface traits:
MerkleReader,MerkleWriter, and unifyingMerkleStore.MerkleStoreA
MerkleStoreis just aMerkleReader+MerkleWriterwhere theirErrortypeis the same. Includes a blanket impl. for any type that has a reader & writer impl.
with equal
Errortypes.MerkleReaderCan retrieve Merkle tree nodes from storage given a hash. Supports existence
checks, retrieving the node, and retrieving a node's children, if applicable.
MerkleWriterProduces a
MerkleWriteSessionthat allows for writing arbitrary Merkle tree nodesto the underlying physical store. Within the write session, writing a node creates a
new
NodeWriteSessioninstance. This instance can write a node as well as a node'schildren. Both session types have
finish, which ensures their writes go to theunderlying physical store.
These tiered write sessions match how the Oxen code writes Merkle tree nodes.
Additionally, this design allows for implementations to be eager, buffer writes in-
memory, or provide transaction semantics.