Refactor oxen's Merkle tree operations: use MerkleStore#501
Refactor oxen's Merkle tree operations: use MerkleStore#501malcolmgreaves wants to merge 1 commit into
Conversation
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis PR replaces direct Changes
Sequence DiagramsequenceDiagram
participant Caller
participant Session as MerkleWriteSession
participant NodeSession as NodeWriteSession
participant Backend as FileBackend
Caller->>Session: begin()
Note over Caller,Session: Replaces MerkleNodeDB::open_read_write()
Caller->>Session: create_node(node, parent_id)
Session->>NodeSession: node namespace created (NodeWriteSession)
Caller->>NodeSession: add_child(child_ref)
NodeSession->>Backend: persist child reference
alt recurse into child dir
Caller->>Session: create_node(child_node, parent_id)
Session->>NodeSession: child namespace created
Caller->>NodeSession: add_child(...)
Caller->>NodeSession: finish()
end
Caller->>NodeSession: finish()
Caller->>Session: finish()
Session->>Backend: flush/persist all changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Review rate limit: 3/5 reviews remaining, refill in 16 minutes and 55 seconds. Comment |
|
STACKED PR: Do not merge until #498 has been merged. |
**Refactor**
Change all reading & writing of Merkle tree nodes to use the new traits:
- Use `MerkleReader::exists` calls in `commit_merkle_tree.rs`
- Use `MerkleReader::get_{node,children}` in `merkle_tree_node.rs`
- Use `MerkleWriter`'s write sessions in:
+ `create_empty_commit` & `create_initial_commit` (`commits.rs`)
+ `update_metadata` (`entries.rs`)
+ `commit_dir_entires`, `write_commit_entires`, `r_create_dir_node` (`commit_writer.rs`)
+ `run_on_commit`, `rewrite_nodes` (migration `..add_child_counts_to_nodes.rs`)
+ `write_tree`, `p_write_tree` (`tree.rs`)
Access to a `MerkleStore` for these operations is provided via the `LocalRepository`.
**LocalRepository**
Adds method `LocalRepository::merkle_store() -> impl MerkleStore + '_`. Provides the
repository's Merkle tree node storage implementation. The return type is opaque. It is
implemented using a new enum-based dispatch wrapper: a concrete, but private type
implemented by the new `define_merkle_store_dispatch!` macro. Dynamic loading of
different implementations based upon the on-disk format of the Merkle tree is ready to
be implemented.
**`define_merkle_store_dispatch!`**
A macro that creates `merkle_store_dispatch::StoreEnum`, a private type that only
`local_repository.rs` can use. The macro takes a mapping: a type that implements
the `MerkleReader` and `MerkleWriter` traits + the impl's `Error` type to a unique
enum variant name. The macro generates a new wrapper that uses `match` to dispatch
trait method calls to the active `MerkleStore` type. The macro also generates wrappers
for the `MerkleWriteSession` and `NodeWriteSession` traits for each unique variant.
There's also a `StoreError` enum wrapper arround the `Error` types with an
`IntoOxenError` impl. All of these enums use the same dispatch mechanisms.
396ece5 to
70ead12
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rs`:
- Line 202: The call to old_session.create_node currently passes node.parent_id
which sets the directory's parent to the grandparent; change it to use the
current node's hash so the directory's parent is the immediate node — i.e. pass
Some(node.hash) (or equivalent) as the parent_id when calling
old_session.create_node(&dir, ...) so dir records get the correct parent_id;
update the invocation that currently uses node.parent_id to use node.hash
instead.
In `@crates/lib/src/repositories/commits/commit_writer.rs`:
- Around line 511-513: The new commit node is being created with None for parent
linkage (session.create_node(&node, None)), causing missing parent metadata when
callers use commit_dir_entries; update the call to propagate the head/previous
commit hash used elsewhere (as threaded through commit_dir_entries_new and
commit_dir_entries_with_parents) instead of None — locate the variable holding
the previous commit hash in this scope (the same value passed into create_node
in commit_dir_entries_new/commit_dir_entries_with_parents) and pass
Some(previous_commit_hash) (or the appropriate Option-wrapped identifier) into
session.create_node(&node, ...).
In `@crates/lib/src/repositories/tree.rs`:
- Around line 1070-1072: In p_write_tree, replace the panic! on encountering an
unexpected node (the match arm labeled "node =>") with returning a proper
OxenError; specifically return Err(OxenError::DisallowedNodeWrite(node.clone()))
(or construct the appropriate OxenError variant) so the function's Result path
handles the error instead of aborting the process; update any required
use/imports and ensure the match arm signature and types align with
p_write_tree's return type.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 12457e34-a6b6-4d4f-909b-c212393fb662
📒 Files selected for processing (9)
crates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rscrates/lib/src/core/v_latest/commits.rscrates/lib/src/core/v_latest/entries.rscrates/lib/src/core/v_latest/index/commit_merkle_tree.rscrates/lib/src/model/merkle_tree/node/merkle_tree_node.rscrates/lib/src/model/repository/local_repository.rscrates/lib/src/model/repository/local_repository/merkle_store_dispatch.rscrates/lib/src/repositories/commits/commit_writer.rscrates/lib/src/repositories/tree.rs
| // dir, | ||
| // vnode_size | ||
| // ); | ||
| let mut dir_ns = old_session.create_node(&dir, node.parent_id)?; |
There was a problem hiding this comment.
Use the current node's hash as the directory parent ID.
Line 202 passes node.parent_id when creating a rewritten directory node. That stores the grandparent's hash instead of the immediate parent, so migrated directory records get incorrect parent_id values. For example, the root dir under a commit ends up pointing at the previous commit instead of the current commit node. This should be Some(node.hash).
🛠️ Proposed fix
- let mut dir_ns = old_session.create_node(&dir, node.parent_id)?;
+ let mut dir_ns = old_session.create_node(&dir, Some(node.hash))?;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let mut dir_ns = old_session.create_node(&dir, node.parent_id)?; | |
| let mut dir_ns = old_session.create_node(&dir, Some(node.hash))?; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rs`
at line 202, The call to old_session.create_node currently passes node.parent_id
which sets the directory's parent to the grandparent; change it to use the
current node's hash so the directory's parent is the immediate node — i.e. pass
Some(node.hash) (or equivalent) as the parent_id when calling
old_session.create_node(&dir, ...) so dir records get the correct parent_id;
update the invocation that currently uses node.parent_id to use node.hash
instead.
| node => { | ||
| // TODO: change this to `return Err(OxenError::DisallowedNodeWrite(node.clone()));` | ||
| panic!("p_write_tree Unexpected node type: {node:?}"); |
There was a problem hiding this comment.
Return an OxenError here instead of panicking.
p_write_tree already returns Result, so an unexpected node type should fail the write cleanly rather than crash the process. This is especially risky in library code handling persisted tree data.
Suggested fix
node => {
- // TODO: change this to `return Err(OxenError::DisallowedNodeWrite(node.clone()));`
- panic!("p_write_tree Unexpected node type: {node:?}");
+ return Err(OxenError::basic_str(format!(
+ "p_write_tree unexpected node type: {node:?}"
+ )));
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/lib/src/repositories/tree.rs` around lines 1070 - 1072, In
p_write_tree, replace the panic! on encountering an unexpected node (the match
arm labeled "node =>") with returning a proper OxenError; specifically return
Err(OxenError::DisallowedNodeWrite(node.clone())) (or construct the appropriate
OxenError variant) so the function's Result path handles the error instead of
aborting the process; update any required use/imports and ensure the match arm
signature and types align with p_write_tree's return type.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
crates/lib/src/repositories/tree.rs (1)
1070-1072:⚠️ Potential issue | 🟠 MajorReturn an error here instead of panicking.
p_write_treealready returnsResult, so this should fail cleanly instead of aborting the process. With the new shared-session write path, that panic can also stop the write mid-flight.🛠️ Suggested fix
node => { - // TODO: change this to `return Err(OxenError::DisallowedNodeWrite(node.clone()));` - panic!("p_write_tree Unexpected node type: {node:?}"); + return Err(OxenError::basic_str(format!( + "p_write_tree unexpected node type: {node:?}" + ))); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/lib/src/repositories/tree.rs` around lines 1070 - 1072, Replace the panic in p_write_tree with returning an Err: construct and return Err(OxenError::DisallowedNodeWrite(node.clone())) (or wrap it in the appropriate Result::Err variant used by p_write_tree) instead of calling panic!, ensuring the function signature and error type are used consistently where node => { ... } currently panics; this lets callers handle the error rather than aborting the process.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/lib/src/core/v_latest/commits.rs`:
- Around line 371-377: The initial commit session only creates the commit node
but never persists the root directory node, so the root hash stored in
dir_hash_db has no backing node; update the sequence in the commit routine to
call session.create_node(&dir_node, None)? (or the appropriate create_node
overload) to create and finish the dir_node before calling
commit_ns.add_child(&dir_node) and finishing the commit; ensure you still call
commit_ns.finish()? and session.finish()? so the root directory node is actually
written and the stored root hash resolves on a fresh repo (refer to symbols
commit_ns, dir_node, session, commit_node and dir_hash_db).
In `@crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs`:
- Around line 43-48: Replace the current error-upgrading behavior in
from_hash_uncached so missing nodes stay as Option instead of Err: add a new
maybe_from_hash (or change from_hash_uncached signature) that calls
repo.merkle_store().get_node(hash)? and returns Ok(None) when get_node() yields
None, or Ok(Some(MerkleTreeNode::from_record(record))) when present; update
callers like CommitMerkleTree::read_depth* to use maybe_from_hash to avoid
double I/O (exists() + get_node()). Ensure the new function preserves the
existing OxenError propagation for underlying store errors but does not wrap a
missing node into an OxenError.
---
Duplicate comments:
In `@crates/lib/src/repositories/tree.rs`:
- Around line 1070-1072: Replace the panic in p_write_tree with returning an
Err: construct and return Err(OxenError::DisallowedNodeWrite(node.clone())) (or
wrap it in the appropriate Result::Err variant used by p_write_tree) instead of
calling panic!, ensuring the function signature and error type are used
consistently where node => { ... } currently panics; this lets callers handle
the error rather than aborting the process.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d0458663-9d60-4814-b01d-c191bc7c0c27
📒 Files selected for processing (9)
crates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rscrates/lib/src/core/v_latest/commits.rscrates/lib/src/core/v_latest/entries.rscrates/lib/src/core/v_latest/index/commit_merkle_tree.rscrates/lib/src/model/merkle_tree/node/merkle_tree_node.rscrates/lib/src/model/repository/local_repository.rscrates/lib/src/model/repository/local_repository/merkle_store_dispatch.rscrates/lib/src/repositories/commits/commit_writer.rscrates/lib/src/repositories/tree.rs
✅ Files skipped from review due to trivial changes (1)
- crates/lib/src/repositories/commits/commit_writer.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rs
| fn from_hash_uncached(repo: &LocalRepository, hash: &MerkleHash) -> Result<Self, OxenError> { | ||
| let node_db = MerkleNodeDB::open_read_only(repo, hash)?; | ||
| let parent_id = node_db.parent_id; | ||
| let record = repo | ||
| .merkle_store() | ||
| .get_node(hash)? | ||
| // TODO: use a structured error variant | ||
| .ok_or_else(|| OxenError::basic_str(format!("Merkle node not found: {hash}")))?; |
There was a problem hiding this comment.
Avoid turning a missing node into an error on the hot path.
get_node() already gives you Option, but this wrapper upgrades None to Err. That forces callers like CommitMerkleTree::read_depth* to preflight with exists(), so tree loads now hit the store twice on every cache miss (exists() + get_node()). A maybe_from_hash path that preserves the Option semantics would avoid the extra I/O.
♻️ Possible shape
-fn from_hash_uncached(repo: &LocalRepository, hash: &MerkleHash) -> Result<Self, OxenError> {
- let record = repo
- .merkle_store()
- .get_node(hash)?
- .ok_or_else(|| OxenError::basic_str(format!("Merkle node not found: {hash}")))?;
+fn maybe_from_hash_uncached(
+ repo: &LocalRepository,
+ hash: &MerkleHash,
+) -> Result<Option<Self>, OxenError> {
+ let Some(record) = repo.merkle_store().get_node(hash)? else {
+ return Ok(None);
+ };
let parent_id = record.parent_id().copied();
- Ok(MerkleTreeNode {
+ Ok(Some(MerkleTreeNode {
hash: *hash,
node: record.into_node(),
parent_id,
children: Vec::new(),
- })
+ }))
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs` around lines 43 -
48, Replace the current error-upgrading behavior in from_hash_uncached so
missing nodes stay as Option instead of Err: add a new maybe_from_hash (or
change from_hash_uncached signature) that calls
repo.merkle_store().get_node(hash)? and returns Ok(None) when get_node() yields
None, or Ok(Some(MerkleTreeNode::from_record(record))) when present; update
callers like CommitMerkleTree::read_depth* to use maybe_from_hash to avoid
double I/O (exists() + get_node()). Ensure the new function preserves the
existing OxenError propagation for underlying store errors but does not wrap a
missing node into an OxenError.
Refactor
Change all reading & writing of Merkle tree nodes to use the new traits:
MerkleReader::existscalls incommit_merkle_tree.rsMerkleReader::get_{node,children}inmerkle_tree_node.rsMerkleWriter's write sessions in:create_empty_commit&create_initial_commit(commits.rs)update_metadata(entries.rs)commit_dir_entires,write_commit_entires,r_create_dir_node(commit_writer.rs)run_on_commit,rewrite_nodes(migration..add_child_counts_to_nodes.rs)write_tree,p_write_tree(tree.rs)Access to a
MerkleStorefor these operations is provided via theLocalRepository.LocalRepository
Adds method
LocalRepository::merkle_store() -> Box<dyn MerkleStore + '_>. Provides therepository's Merkle tree node storage implementation. The return type is dynamic: it
allows for selecting a compatible
MerkleStoreat runtime.