Skip to content

Reorder tree traversal during writing#531

Open
malcolmgreaves wants to merge 1 commit intomg/config_merkle_storefrom
mg/reorder_merkle_write_ops
Open

Reorder tree traversal during writing#531
malcolmgreaves wants to merge 1 commit intomg/config_merkle_storefrom
mg/reorder_merkle_write_ops

Conversation

@malcolmgreaves
Copy link
Copy Markdown
Collaborator

@malcolmgreaves malcolmgreaves commented May 7, 2026

Reorders the tree traversal when writing a new Merkle tree during
a commit from depth-first to a layerwise depth-first approach. Before,
r_create_dir_node would start writing the node & children files
for a directory and keep open file handles while it descended into each
child. It would only close file handles once it recursed up the call stack.
Since each recursive descent created a new node write session (for the
node that was being recursed on), recursion meant that the current stack
frame's session was not being modified by a recursive call. Any modification
to node or children files was occuring only within a single stack frame.

In the new approach, a directory's vnodes are staged in a Vec in the same
stack frame. They're directly written into the children file (and their
offsets stored in the node file) in one iteration. Any files in the directory
(via a vnode) are included in the node and children files in this same pass.
The dir node's write session is finished and the collected vnodes are recursed
one at a time.

One performance benefit of this change is that there is only a constant number
of open file handles for the FileBackend: a node and children file per
call to r_create_dir_node. The file handles are no longer persisted through
recursive calls, meaning that the number of open file handles no longer grows
linearly with respect to the Merkle tree depth.

Regression tests have been added to show that the node and children file
contents are byte identicial to before this change.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 7, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR refactors the merkle directory writer from a recursive pattern into a phased implementation. The caller now writes the root directory hash and invokes the refactored r_create_dir_node with explicit arguments. The new implementation handles empty directories, materializes vnodes, processes children (files and directories), and defers recursion into staged subdirectories until after vnode sessions complete. A regression test validates structural equivalence with the legacy implementation.

Changes

Merkle Directory Writer Phased Refactoring

Layer / File(s) Summary
Caller Update
crates/lib/src/repositories/commits/commit_writer.rs
write_commit_entries writes root DirNode hash to dir_hash_db before calling r_create_dir_node with explicit parent_id, DirNode, and root_path parameters.
New Signature & Empty-Dir Early Return
crates/lib/src/repositories/commits/commit_writer.rs
r_create_dir_node accepts parent_id and DirNode. Directories with no staged entries open a dir-node session, finish immediately to persist empty dirs, and return early.
Directory Session & VNode Materialization
crates/lib/src/repositories/commits/commit_writer.rs
Directory node session is opened, EntryVNode objects are materialized into vnode children and added to the directory, and the session is finished before vnode subtree processing.
VNode Processing & Child Wiring
crates/lib/src/repositories/commits/commit_writer.rs
For each vnode: open vnode session, compute or load subdirectory DirNodes (queuing staged ones for recursion and writing dir_hash_db), add file children with hashes and last_commit_id, finish vnode session.
Deferred Recursion into Staged Subdirs
crates/lib/src/repositories/commits/commit_writer.rs
After all vnode sessions finish, iterate queued staged subdirectories and recursively call r_create_dir_node with the referencing vnode id as parent_id.
Regression Test: Pattern A vs Pattern C
crates/lib/src/repositories/commits/commit_writer.rs
Preserves legacy recursive writer, runs both implementations on pinned inputs, snapshots deserialized merkle-store node triples, and asserts structural equality.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Oxen-AI/Oxen#411: Directly modifies the same commit_writer.rs logic including write_commit_entries, vnode handling, and dir_hash_db management.
  • Oxen-AI/Oxen#523: Introduces the API change to MerkleNodeDB::open_read_write accepting &Path directly, which this PR's refactor relies on.

Suggested reviewers

  • gschoeni
  • jcelliott
  • subygan

Poem

🐰 A writer once nested, now phased and refined,
VNodes dance in order, children find their line,
Empty dirs persist, recursion deferred with grace,
Legacy and new paths meet in test-land space!
Pattern A and C align—the merkle stores rejoice! 🌳

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Reorder tree traversal during writing' accurately summarizes the main change: shifting from depth-first to layerwise depth-first tree traversal in the Merkle tree writer.
Description check ✅ Passed The description clearly explains the traversal reordering, the performance implications regarding file handle management, and mentions regression tests to verify equivalence.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch mg/reorder_merkle_write_ops

Comment @coderabbitai help to get the list of available commands and usage tips.

@malcolmgreaves malcolmgreaves force-pushed the mg/reorder_merkle_write_ops branch from bd9e34b to 154ba43 Compare May 7, 2026 05:50
@malcolmgreaves
Copy link
Copy Markdown
Collaborator Author

NOTE: Stacked PR! #526 must be merged beforehand.

@malcolmgreaves malcolmgreaves force-pushed the mg/reorder_merkle_write_ops branch from 154ba43 to 7285f50 Compare May 7, 2026 06:03
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/lib/src/repositories/commits/commit_writer.rs (1)

2176-2302: 💤 Low value

LGTM - Solid regression test with deterministic inputs.

The test correctly:

  • Pins the timestamp (line 2205) to ensure both patterns get the same commit_id
  • Uses empty existing_nodes to simulate an initial commit scenario
  • Validates both the hash key set and per-node structural equivalence

One optional enhancement: this test covers the initial commit case where existing_nodes is empty. A follow-up test exercising a subsequent commit (where existing_nodes contains prior directory nodes and some subdirs use the "look up old dir node" path at line 951) would strengthen coverage of the refactored code paths.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/lib/src/repositories/commits/commit_writer.rs` around lines 2176 -
2302, Add a second test variant that mirrors
pattern_c_byte_equivalent_to_pattern_a but primes existing_nodes with prior
directory nodes to exercise the "look up old dir node" path: create a prior
commit (use create_commit_data, compute_commit_id, CommitNode::new and
CommitMerkleTree::dir_hash_db_path_from_commit_id), populate a HashMap<PathBuf,
MerkleTreeNode> named existing_nodes with the prior MerkleTreeNode entries for
relevant subdirs, then call split_into_vnodes(&repo, &dir_entries,
&existing_nodes, &new_commit) and run both write_commit_entries_pattern_a_legacy
and write_commit_entries comparisons as in the original test; ensure the
timestamp is pinned again so commit_id is deterministic and assert snapshot
equality exactly like the existing test.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@crates/lib/src/repositories/commits/commit_writer.rs`:
- Around line 2176-2302: Add a second test variant that mirrors
pattern_c_byte_equivalent_to_pattern_a but primes existing_nodes with prior
directory nodes to exercise the "look up old dir node" path: create a prior
commit (use create_commit_data, compute_commit_id, CommitNode::new and
CommitMerkleTree::dir_hash_db_path_from_commit_id), populate a HashMap<PathBuf,
MerkleTreeNode> named existing_nodes with the prior MerkleTreeNode entries for
relevant subdirs, then call split_into_vnodes(&repo, &dir_entries,
&existing_nodes, &new_commit) and run both write_commit_entries_pattern_a_legacy
and write_commit_entries comparisons as in the original test; ensure the
timestamp is pinned again so commit_id is deterministic and assert snapshot
equality exactly like the existing test.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 07bf3888-41e2-4d6e-8b35-eb8c1b4144e7

📥 Commits

Reviewing files that changed from the base of the PR and between 154ba43 and 7285f50.

📒 Files selected for processing (1)
  • crates/lib/src/repositories/commits/commit_writer.rs

@malcolmgreaves malcolmgreaves force-pushed the mg/reorder_merkle_write_ops branch from 7285f50 to c55b4e4 Compare May 7, 2026 06:23
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/lib/src/repositories/commits/commit_writer.rs (1)

836-840: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove the legacy caller-side dir-hash invalidation.

Now that write_commit_entries invalidates from the normalized removed_children list, the old post-write cache_invalidate_dir_hash_db(dir_entries.values()) calls in commit_dir_entries_new and commit_dir_entries become dangerous. If staging emits a removed directory as a leaf name, those old calls can delete dir_0 instead of files/dir_0 and clobber an unrelated cache entry after the correct normalized delete already ran.

Suggested cleanup
@@
-    // Remove all the directories that are staged for removal
-    cache_invalidate_dir_hash_db(&dir_hash_db, dir_entries.values())?;
-
     Ok(node.to_commit())
@@
-    // Remove all the directories that are staged for removal
-    cache_invalidate_dir_hash_db(&dir_hash_db, dir_entries.values())?;
-
     Ok(node.to_commit())
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/lib/src/repositories/commits/commit_writer.rs` around lines 836 - 840,
Remove the legacy caller-side dir-hash invalidation that is now harmful: delete
the calls to cache_invalidate_dir_hash_db(...) that pass dir_entries.values()
from the commit_dir_entries_new and commit_dir_entries functions because
write_commit_entries already invalidates using the normalized removed_children
list; ensure no other code paths call cache_invalidate_dir_hash_db with raw dir
entry names (e.g., dir_entries.values()) so we only invalidate using the
normalized removed_children provided by write_commit_entries.
🧹 Nitpick comments (1)
crates/lib/src/repositories/commits/commit_writer.rs (1)

2279-2283: ⚡ Quick win

Wipe or assert dir_hash_db between the two runs.

Leaving the commit's dir-hash DB in place means run #2 can inherit stale keys from run #1, and this test never compares that DB anyway. A missed or mis-keyed str_val_db::put in the updated path would still pass here as long as the legacy run populated the key first.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/lib/src/repositories/commits/commit_writer.rs` around lines 2279 -
2283, The wipe_tree_nodes function currently only removes .oxen/tree/nodes but
leaves the dir_hash_db in place; change wipe_tree_nodes (or add a companion
helper invoked between runs) to either delete/clear the commit's dir_hash_db
(the on-disk DB under .oxen/tree/dir_hash_db) or assert that it is empty before
the second run so keys from the first run cannot leak; reference the dir_hash_db
instance used by the code that calls str_val_db::put and ensure you call the
DB's clear/remove operation (or perform an explicit emptiness check/assert) on
the same LocalRepository dir_hash_db used by the commit writer.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/lib/src/repositories/commits/commit_writer.rs`:
- Around line 2232-2266: The loop over walkdir currently swallows errors and
skips problematic node dirs (using filter_map(Result::ok) and many `continue`s),
which hides partial-write/malformed-node failures; update the code in
commit_writer.rs (the WalkDir iteration and the processing of each entry, e.g.
where nodes_dir, MerkleHash::new, and store.get_node are used) to propagate
errors instead of continuing: stop using filter_map(Result::ok) and handle Err
from the iterator by returning an Err, and replace the `continue` branches for
metadata failure, strip_prefix failure, unexpected component count, hex parse
failure, and missing store.get_node (None) with appropriate Err returns (with
context) so the snapshot fails on unreadable or missing node directories.

---

Outside diff comments:
In `@crates/lib/src/repositories/commits/commit_writer.rs`:
- Around line 836-840: Remove the legacy caller-side dir-hash invalidation that
is now harmful: delete the calls to cache_invalidate_dir_hash_db(...) that pass
dir_entries.values() from the commit_dir_entries_new and commit_dir_entries
functions because write_commit_entries already invalidates using the normalized
removed_children list; ensure no other code paths call
cache_invalidate_dir_hash_db with raw dir entry names (e.g.,
dir_entries.values()) so we only invalidate using the normalized
removed_children provided by write_commit_entries.

---

Nitpick comments:
In `@crates/lib/src/repositories/commits/commit_writer.rs`:
- Around line 2279-2283: The wipe_tree_nodes function currently only removes
.oxen/tree/nodes but leaves the dir_hash_db in place; change wipe_tree_nodes (or
add a companion helper invoked between runs) to either delete/clear the commit's
dir_hash_db (the on-disk DB under .oxen/tree/dir_hash_db) or assert that it is
empty before the second run so keys from the first run cannot leak; reference
the dir_hash_db instance used by the code that calls str_val_db::put and ensure
you call the DB's clear/remove operation (or perform an explicit emptiness
check/assert) on the same LocalRepository dir_hash_db used by the commit writer.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3a43757a-a676-41f3-bc89-1494f2e6031b

📥 Commits

Reviewing files that changed from the base of the PR and between 7285f50 and c55b4e4.

📒 Files selected for processing (1)
  • crates/lib/src/repositories/commits/commit_writer.rs

Comment on lines +2232 to +2266
for entry in walkdir::WalkDir::new(&nodes_dir)
.follow_links(false)
.min_depth(2)
.max_depth(2)
.into_iter()
.filter_map(Result::ok)
{
let path = entry.path();
let Ok(meta) = path.metadata() else { continue };
if !meta.is_dir() {
continue;
}
let rel = match path.strip_prefix(&nodes_dir) {
Ok(p) => p,
Err(_) => continue,
};
let components: Vec<&str> = rel
.components()
.filter_map(|c| match c {
std::path::Component::Normal(s) => s.to_str(),
_ => None,
})
.collect();
if components.len() != 2 {
continue;
}
let hex = format!("{}{}", components[0], components[1]);
let Ok(hash_value) = u128::from_str_radix(&hex, 16) else {
continue;
};
let hash = MerkleHash::new(hash_value);

let Some(record) = store.get_node(&hash)? else {
continue;
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fail the snapshot on unreadable node directories.

This helper currently drops walk errors and skips hashes that cannot be reopened from the store. That can hide exactly the partial-write or malformed-node regressions this test is supposed to catch; a directory under tree/nodes/<prefix>/<suffix> should be treated as required, not optional.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/lib/src/repositories/commits/commit_writer.rs` around lines 2232 -
2266, The loop over walkdir currently swallows errors and skips problematic node
dirs (using filter_map(Result::ok) and many `continue`s), which hides
partial-write/malformed-node failures; update the code in commit_writer.rs (the
WalkDir iteration and the processing of each entry, e.g. where nodes_dir,
MerkleHash::new, and store.get_node are used) to propagate errors instead of
continuing: stop using filter_map(Result::ok) and handle Err from the iterator
by returning an Err, and replace the `continue` branches for metadata failure,
strip_prefix failure, unexpected component count, hex parse failure, and missing
store.get_node (None) with appropriate Err returns (with context) so the
snapshot fails on unreadable or missing node directories.

@malcolmgreaves malcolmgreaves force-pushed the mg/config_merkle_store branch from 5c4ad32 to 9e10a92 Compare May 7, 2026 19:25
@malcolmgreaves malcolmgreaves force-pushed the mg/reorder_merkle_write_ops branch 3 times, most recently from f22e66f to 8d71608 Compare May 7, 2026 19:27
@malcolmgreaves malcolmgreaves force-pushed the mg/config_merkle_store branch 3 times, most recently from fd61e09 to 64e1254 Compare May 7, 2026 21:10
@malcolmgreaves malcolmgreaves force-pushed the mg/reorder_merkle_write_ops branch from 8d71608 to d2b7fde Compare May 7, 2026 21:14
@malcolmgreaves malcolmgreaves force-pushed the mg/config_merkle_store branch from 64e1254 to 4886e86 Compare May 7, 2026 21:43
@malcolmgreaves malcolmgreaves force-pushed the mg/reorder_merkle_write_ops branch from d2b7fde to 3feb38a Compare May 7, 2026 21:46
@malcolmgreaves malcolmgreaves force-pushed the mg/config_merkle_store branch from 4886e86 to 3d3b1fd Compare May 7, 2026 23:39
@malcolmgreaves malcolmgreaves force-pushed the mg/reorder_merkle_write_ops branch from 3feb38a to 7dc9c8b Compare May 8, 2026 00:00
Reorders the tree traversal when writing a new Merkle tree during
a commit from depth-first to a layerwise depth-first approach. Before,
`r_create_dir_node` would start writing the `node` & `children` files
for a directory and keep open file handles while it descended into each
child. It would only close file handles once it recursed up the call stack.
Since each recursive descent created a _new_ node write session (for the
node that was being recursed on), recursion meant that the current stack
frame's session was not being modified by a recursive call. Any modification
to `node` or `children` files was occuring only within a single stack frame.

In the new approach, a directory's vnodes are staged in a `Vec` in the same
stack frame. They're directly written into the `children` file (and their
offsets stored in the `node` file) in one iteration. Any files in the directory
(via a vnode) are included in the `node` and `children` files in this same pass.
The dir node's write session is finished and the collected vnodes are recursed
one at a time.

One performance benefit of this change is that there is only a constant number
of open file handles for the `FileBackend`: a `node` and `children` file per
call to `r_create_dir_node`. The file handles are no longer persisted through
recursive calls, meaning that the number of open file handles no longer grows
linearly with respect to the Merkle tree depth.

Regression tests have been added to show that the `node` and `children` file
contents are byte identicial to before this change.
@malcolmgreaves malcolmgreaves force-pushed the mg/reorder_merkle_write_ops branch from 7dc9c8b to d1ae637 Compare May 8, 2026 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant