Refactor oxen's Merkle tree operations: use MerkleStore#513
Refactor oxen's Merkle tree operations: use MerkleStore#513malcolmgreaves wants to merge 1 commit intomg/merkle_dyn_file_backendfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughMigrate Merkle node I/O from per-node MerkleNodeDB read/write to a session-based MerkleStore / MerkleWriteSession model across repository accessors, node loading, tree/commit writers, metadata traversal, and a migration, with explicit per-namespace ChangesSession-backed Merkle write migration
Sequence Diagram(s)sequenceDiagram
participant Migrator as Migrator
participant OldRepo as OldRepo
participant NewRepo as NewRepo
participant OldSession as OldSession
participant NewSession as NewSession
Migrator->>OldRepo: merkle_store().begin()
Migrator->>NewRepo: merkle_store().begin()
OldRepo->>OldSession: create_node(commit_node)
OldSession->>OldSession: add_child(root_dir)
OldSession->>OldSession: finish()
Migrator->>OldSession: rewrite_nodes(dir...)
Migrator->>NewSession: rewrite_nodes(...)
note right of Migrator: rewrite_nodes creates per-node namespaces
Migrator->>NewSession: create_node(vnode)
NewSession->>NewSession: add_child(file/dir)
NewSession->>NewSession: finish()
OldSession->>OldSession: create_node(dir)
OldSession->>OldSession: add_child(vnode)
OldSession->>OldSession: finish()
Migrator->>OldSession: finish()
Migrator->>NewSession: finish()
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
3af1140 to
291ef08
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
crates/lib/src/core/v_latest/index/commit_merkle_tree.rs (1)
208-208: 🏗️ Heavy liftAvoid rebuilding the Merkle store on every existence check.
These guards are on the tree-read hot path, and
repo.merkle_store()allocates a new boxed backend each time. On large traversals this adds a lot of tiny allocations for no functional gain. Construct the store once per top-level read and thread a&dyn MerkleReader/&dyn MerkleStorethrough the helper chain instead.Also applies to: 233-233, 259-259, 283-283, 315-315, 344-344, 376-376
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/lib/src/core/v_latest/index/commit_merkle_tree.rs` at line 208, The code repeatedly calls repo.merkle_store() (e.g., around the existence checks using exists(hash) in commit_merkle_tree.rs), causing per-check allocation; instead, obtain the Merkle store once at the start of the top-level read/traversal and pass a borrowed trait object through the helper chain (use &dyn MerkleReader or &dyn MerkleStore). Modify the public/top-level traversal/read entry function to accept an extra parameter like merkle: &dyn MerkleReader (or construct it once there via let merkle = repo.merkle_store()?; let merkle_ref: &dyn MerkleReader = &*merkle;) and update downstream helpers to take that borrowed trait reference rather than calling repo.merkle_store() themselves; replace all local repo.merkle_store().exists(...) usages with merkle_ref.exists(...) (and similarly for other Merkle calls) so no boxed backend is reallocated on the hot path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rs`:
- Around line 197-200: The migrated directory is being created with the wrong
parent because old_session.create_node(&dir, node.parent_id) uses the original
node's parent; change the parent argument to the current node's identifier (the
node hash) so the rewritten directory is nested under the current node.
Specifically, in the block that builds DirNode (dir.get_opts(), DirNode::new,
dir_node_opts.num_entries) replace the second parameter to
old_session.create_node with the current node's id/hash (e.g., node.id or
node.hash depending on the struct field) instead of node.parent_id.
In `@crates/lib/src/repositories/commits/commit_writer.rs`:
- Around line 804-805: The code updates dir_hash_db (put/delete) while building
nodes using session: &dyn MerkleWriteSession and commit_ns: &mut dyn
NodeWriteSession, which can leave the cache pointing at uncommitted hashes if
later create_node/session.finish() fails; change the logic to buffer all
dir_hash_db mutations (in the block around the functions that perform node
creation and the code referenced in lines 822-834) and only apply (flush) those
buffered put/delete operations after session.finish() returns Ok(), or
alternatively perform the dir_hash_db updates inside the same transactional
write path so they are rolled back if session.finish() fails; ensure you
reference and guard around session.finish() and create_node/commit_ns usages so
no dir_hash_db mutation happens before successful finish().
- Around line 509-511: This branch creates the node namespace with
session.create_node(&node, None) which differs from
commit_dir_entries_with_parents and commit_dir_entries_new; change it to pass
the previous commit id used by those paths (the same parent-id/previous-commit
value they supply) instead of None so the created commit_ns uses the same
namespace semantics; locate the call in commit_dir_entries (around where
repo.merkle_store(), store.begin()?, and let mut commit_ns =
session.create_node(&node, None)? are defined) and replace the None with the
variable or expression representing the prior commit hash used by the other
implementations.
In `@crates/lib/src/repositories/tree.rs`:
- Around line 1068-1070: In write_tree (crates/lib/src/repositories/tree.rs) the
match arm handling an unexpected node currently panics; change it to return an
Err instead: replace the panic!("p_write_tree Unexpected node type: {node:?}")
with return Err(OxenError::DisallowedNodeWrite(node.clone())); so the
Result-returning API fails gracefully on malformed/unsupported node variants
(ensure node is cloned if the error type requires ownership).
---
Nitpick comments:
In `@crates/lib/src/core/v_latest/index/commit_merkle_tree.rs`:
- Line 208: The code repeatedly calls repo.merkle_store() (e.g., around the
existence checks using exists(hash) in commit_merkle_tree.rs), causing per-check
allocation; instead, obtain the Merkle store once at the start of the top-level
read/traversal and pass a borrowed trait object through the helper chain (use
&dyn MerkleReader or &dyn MerkleStore). Modify the public/top-level
traversal/read entry function to accept an extra parameter like merkle: &dyn
MerkleReader (or construct it once there via let merkle = repo.merkle_store()?;
let merkle_ref: &dyn MerkleReader = &*merkle;) and update downstream helpers to
take that borrowed trait reference rather than calling repo.merkle_store()
themselves; replace all local repo.merkle_store().exists(...) usages with
merkle_ref.exists(...) (and similarly for other Merkle calls) so no boxed
backend is reallocated on the hot path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e2173166-7ac4-4dfe-9b8e-b9a7fad7e26a
📒 Files selected for processing (8)
crates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rscrates/lib/src/core/v_latest/commits.rscrates/lib/src/core/v_latest/entries.rscrates/lib/src/core/v_latest/index/commit_merkle_tree.rscrates/lib/src/model/merkle_tree/node/merkle_tree_node.rscrates/lib/src/model/repository/local_repository.rscrates/lib/src/repositories/commits/commit_writer.rscrates/lib/src/repositories/tree.rs
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs (1)
42-47: 🏗️ Heavy liftReuse a borrowed store across the read pipeline.
These helpers are on the inner tree-loading path, and each call now rebuilds a boxed
MerkleStorebefore reading a single node. On large traversals that adds O(nodes) extra allocation/dyn-dispatch overhead on top of the storage I/O. Thread one borrowed reader/store through the load pipeline instead of recreating it per node.Also applies to: 80-88
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs` around lines 42 - 47, The helper from_hash_uncached (and the similar helpers at lines ~80-88) currently calls repo.merkle_store() inside each node read, causing repeated boxing/allocation; change their signatures to accept a borrowed store/reader (e.g., &impl MerkleStore or &dyn MerkleStore) as an extra parameter and thread that borrowed reference through the tree-loading call chain so you call store.get_node(hash)? once per node without recreating the boxed MerkleStore; update all callers in the load pipeline to pass the same borrowed store instead of calling repo.merkle_store() repeatedly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs`:
- Around line 42-47: The helper from_hash_uncached (and the similar helpers at
lines ~80-88) currently calls repo.merkle_store() inside each node read, causing
repeated boxing/allocation; change their signatures to accept a borrowed
store/reader (e.g., &impl MerkleStore or &dyn MerkleStore) as an extra parameter
and thread that borrowed reference through the tree-loading call chain so you
call store.get_node(hash)? once per node without recreating the boxed
MerkleStore; update all callers in the load pipeline to pass the same borrowed
store instead of calling repo.merkle_store() repeatedly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7ff56166-f113-45ae-842c-662826d31d18
📒 Files selected for processing (8)
crates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rscrates/lib/src/core/v_latest/commits.rscrates/lib/src/core/v_latest/entries.rscrates/lib/src/core/v_latest/index/commit_merkle_tree.rscrates/lib/src/model/merkle_tree/node/merkle_tree_node.rscrates/lib/src/model/repository/local_repository.rscrates/lib/src/repositories/commits/commit_writer.rscrates/lib/src/repositories/tree.rs
✅ Files skipped from review due to trivial changes (1)
- crates/lib/src/core/v_latest/commits.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/lib/src/core/v_latest/entries.rs
9ed02e2 to
67ae137
Compare
291ef08 to
844e985
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/lib/src/repositories/commits/commit_writer.rs`:
- Around line 933-940: The code in r_create_dir_node silently continues when
CommitMerkleTree::read_node(repo, node.hash(), false) returns None, dropping
that subtree; instead, treat a missing old directory as a fatal invariant breach
and return an error to abort the commit: replace the log+continue branch with
constructing and returning an appropriate error (e.g., Err(...) or a CommitError
variant) that includes the node.hash() and context so the caller fails the
commit rather than producing a partial tree, ensuring the failure propagates out
of r_create_dir_node (or use the ? operator if converting option to Result).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 59986dce-55e0-4d56-9249-9dd00c775dec
📒 Files selected for processing (8)
crates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rscrates/lib/src/core/v_latest/commits.rscrates/lib/src/core/v_latest/entries.rscrates/lib/src/core/v_latest/index/commit_merkle_tree.rscrates/lib/src/model/merkle_tree/node/merkle_tree_node.rscrates/lib/src/model/repository/local_repository.rscrates/lib/src/repositories/commits/commit_writer.rscrates/lib/src/repositories/tree.rs
✅ Files skipped from review due to trivial changes (2)
- crates/lib/src/model/repository/local_repository.rs
- crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs
🚧 Files skipped from review as they are similar to previous changes (3)
- crates/lib/src/core/v_latest/commits.rs
- crates/lib/src/core/v_latest/entries.rs
- crates/lib/src/core/v_latest/index/commit_merkle_tree.rs
e44ba49 to
c3bbb8d
Compare
844e985 to
460483e
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/lib/src/repositories/commits/commit_writer.rs`:
- Around line 814-834: The code calls commit_ns.add_child(&dir_node) (and
similarly parent_ns.add_child(&vnode_obj) and vnode_ns.add_child(&dir_node>)
before the child's session (created via session.create_node / r_create_dir_node)
is finished, risking parents pointing at partially-written children; move the
add_child(...) calls so they occur after the child's node session is finished
(i.e., after dir_ns.finish() / vnode_ns.finish()), ensuring you create the child
with session.create_node, run r_create_dir_node, call finish() on the child's
namespace (dir_ns or vnode_ns), then add_child(...) on the parent (commit_ns,
parent_ns, or vnode_ns) and only after that update str_val_db or total_written
as needed; apply the same change pattern to the other occurrence referenced (the
vnode/parent blocks).
In `@crates/lib/src/repositories/tree.rs`:
- Around line 1057-1063: The parent is adding child entries (ns.add_child)
before the recursive write (p_write_tree) completes, which can leave parent
referencing unfinished subtrees if the recursion errors; change the order in the
EMerkleTreeNode match arms handling VNode and Directory so you call
p_write_tree(session, child, vnode/dir_node) first and only call
ns.add_child(vnode/dir_node)? after that recursive call returns successfully;
ensure the same adjustment is applied for both the VNode and Directory branches
and preserve error propagation so NodeWriteSession.finish() is only reached for
fully-written children.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7b14c970-bbc1-4939-8aef-4b853374dada
📒 Files selected for processing (8)
crates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rscrates/lib/src/core/v_latest/commits.rscrates/lib/src/core/v_latest/entries.rscrates/lib/src/core/v_latest/index/commit_merkle_tree.rscrates/lib/src/model/merkle_tree/node/merkle_tree_node.rscrates/lib/src/model/repository/local_repository.rscrates/lib/src/repositories/commits/commit_writer.rscrates/lib/src/repositories/tree.rs
✅ Files skipped from review due to trivial changes (5)
- crates/lib/src/model/repository/local_repository.rs
- crates/lib/src/core/v_latest/commits.rs
- crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs
- crates/lib/src/command/migrate/m20250111083535_add_child_counts_to_nodes.rs
- crates/lib/src/core/v_latest/entries.rs
460483e to
8171117
Compare
1fd6e68 to
ee3ce04
Compare
8171117 to
51ab345
Compare
ee3ce04 to
df2dd27
Compare
fa52ee2 to
f9502a5
Compare
3454bd6 to
c5ee136
Compare
f9502a5 to
7ecfb7f
Compare
|
NOTE: Stacked PR! Must merge #512 before merging. |
c5ee136 to
96f735c
Compare
7ecfb7f to
b36f043
Compare
96f735c to
814cb8f
Compare
30c2b85 to
4f65285
Compare
814cb8f to
5babb48
Compare
4f65285 to
3239a5d
Compare
5babb48 to
466df7f
Compare
5070203 to
ba031b3
Compare
466df7f to
07c1e4e
Compare
ba031b3 to
ff359e0
Compare
07c1e4e to
6c2c648
Compare
**Refactor**
Change all reading & writing of Merkle tree nodes to use the new traits:
- Use `MerkleReader::exists` calls in `commit_merkle_tree.rs`
- Use `MerkleReader::get_{node,children}` in `merkle_tree_node.rs`
- Use `MerkleWriter`'s write sessions in:
+ `create_empty_commit` & `create_initial_commit` (`commits.rs`)
+ `update_metadata` (`entries.rs`)
+ `commit_dir_entires`, `write_commit_entires`, `r_create_dir_node` (`commit_writer.rs`)
+ `run_on_commit`, `rewrite_nodes` (migration `..add_child_counts_to_nodes.rs`)
+ `write_tree`, `p_write_tree` (`tree.rs`)
Access to a `MerkleStore` for these operations is provided via the `LocalRepository`.
**LocalRepository**
Adds method `LocalRepository::merkle_store() -> Box<dyn MerkleStore + '_>`. Provides the
repository's Merkle tree node storage implementation. The return type is dynamic: it
allows for selecting a compatible `MerkleStore` at runtime.
**`OxenError` Updates**
Converts 2 string errors into:
- `MerkleNodeNotFound(HexHash)`: no such commit/dir/vnode in lookup
- `DisallowedNodeWrite(std::any::type_name_of_val)`
ff359e0 to
d152216
Compare
Refactor
Change all reading & writing of Merkle tree nodes to use the new traits:
MerkleReader::existscalls incommit_merkle_tree.rsMerkleReader::get_{node,children}inmerkle_tree_node.rsMerkleWriter's write sessions in:create_empty_commit&create_initial_commit(commits.rs)update_metadata(entries.rs)commit_dir_entires,write_commit_entires,r_create_dir_node(commit_writer.rs)run_on_commit,rewrite_nodes(migration..add_child_counts_to_nodes.rs)write_tree,p_write_tree(tree.rs)Access to a
MerkleStorefor these operations is provided via theLocalRepository.LocalRepository
Adds method
LocalRepository::merkle_store() -> Box<dyn MerkleStore + '_>. Provides therepository's Merkle tree node storage implementation. The return type is dynamic: it
allows for selecting a compatible
MerkleStoreat runtime.OxenErrorUpdatesConverts 2 string errors into:
MerkleNodeNotFound(HexHash): no such commit/dir/vnode in lookupDisallowedNodeWrite(std::any::type_name_of_val)