Implemented Merkle{Packer,Unpacker} for FileBackend by malcolmgreaves · Pull Request #504 · Oxen-AI/Oxen

malcolmgreaves · 2026-04-29T23:51:06Z

MerkleTransport
Implemented the new MerkleTransport traits for packing up
and unpacking Merkle tree nodes as they're sent on the wire
from the oxen client to server. Preserves the existing tar-gz
formats specific to the FileBackend's physical store layout.

Supports the two unique tar-gz paths:

upload: only captures {prefix}/{suffix}/{node,children}
everything else: tree/nodes/{prefix}/{suffix}/{node,children}

Progress Bar in Bytes
Changes the progress bar for uploads & downloads to use bytes/s
instead of nodes/s. Nodes are not evenly sized, so progress will
look like it stops when handling a large file. This now hooks
into the reader & writer types that the trait uses.

malcolmgreaves · 2026-04-29T23:52:48Z

        })
    }

+    /// Gunzip + collect tar entries into a deterministic map for byte-compat comparison.


All test code helpers from here down. This is quite verbose, but the point is to extract the existing logic for Merkle tree node packing & unpacking on main and preserve that here. This way, we can run both the copied old code as well as the newly refactored code and prove that they behave the same.

coderabbitai · 2026-04-29T23:53:49Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9700ebaa-17e4-4e27-ac6d-9125f32cb2a6

📥 Commits

Reviewing files that changed from the base of the PR and between f7406e4 and 7d78e5c.

📒 Files selected for processing (4)

.gitignore
Cargo.toml
crates/lib/src/core/db/merkle_node/file_backend.rs
crates/lib/src/core/db/merkle_node/merkle_node_db.rs

🚧 Files skipped from review as they are similar to previous changes (4)

.gitignore
Cargo.toml
crates/lib/src/core/db/merkle_node/merkle_node_db.rs
crates/lib/src/core/db/merkle_node/file_backend.rs

📝 Walkthrough

Summary by CodeRabbit

New Features
- Merkle node transport now supports tar-gz serialization with configurable compression levels
- Added progress tracking capability for merkle operations
Bug Fixes
- Enhanced security by preventing path traversal during archive extraction
- Improved error diagnostics for archive validation and structure issues
- Added strict validation for archive integrity and node identification

Walkthrough

Implements tar-gz Merkle node pack/unpack in FileBackend with legacy and server wire layouts, adds precise MerkleDbError variants, enables additional tokio-util features, and updates .gitignore to ignore .claude/scheduled_tasks.lock.

Changes

Cohort / File(s)	Summary
Config / Dependencies `\.gitignore`, `Cargo.toml`	Added `.claude/scheduled_tasks.lock` to gitignore; expanded workspace `tokio-util` features to include `io` and `io-util`.
Merkle transport implementation `crates/lib/src/core/db/merkle_node/file_backend.rs`	Large addition: `FileBackend` now implements `MerklePacker` and `MerkleUnpacker` using tar-gz serialization, supports two historical wire layouts (different gzip levels), validates tar entries (rejects path traversal and unsupported types), stages VFS repos to temp dirs, and exposes `pack_nodes_byte_estimate`. Extensive tests added.
Error types `crates/lib/src/core/db/merkle_node/merkle_node_db.rs`	Added multiple `MerkleDbError` variants for transport/tar validation and IO errors (`FsTransport`, `CannotReadMerkle`, `UnsupportedTarEntry`, `PathTraversal`, `InvalidTarStructure`, `InvalidNodeIdHex`, `MissingNodeDir`, `MissingTreeNodesDir`).

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant FileBackend
    participant TarBuilder
    participant Compressor
    participant FileSystem

    Caller->>FileBackend: pack_nodes(repo, hashes, options)
    FileBackend->>FileBackend: check node dirs & options
    loop per hash
        FileBackend->>FileBackend: read node files/metadata
        FileBackend->>TarBuilder: add_entry(path, content)
    end
    FileBackend->>TarBuilder: finalize()
    alt gzip layout
        TarBuilder->>Compressor: compress(stream)
        Compressor-->>FileBackend: compressed_stream
    else legacy layout (different gzip level)
        TarBuilder->>Compressor: compress(stream, level)
        Compressor-->>FileBackend: compressed_stream
    end
    FileBackend->>FileSystem: write output
    FileSystem-->>FileBackend: ok
    FileBackend-->>Caller: packed_artifact

sequenceDiagram
    participant Caller
    participant FileBackend
    participant Validator
    participant Extractor
    participant VFSStaging
    participant FileSystem

    Caller->>FileBackend: unpack_nodes(archive, repo, options)
    FileBackend->>FileBackend: decompress(archive)
    loop for each tar entry
        FileBackend->>Validator: validate_entry(path, type)
        Validator-->>FileBackend: ok / reject
        alt valid
            FileBackend->>Extractor: extract(entry)
            alt VFS repo
                Extractor->>VFSStaging: stage to temp
                VFSStaging->>FileSystem: copy into VFS store
            else standard repo
                Extractor->>FileSystem: write to repo path
            end
        else rejected
            FileBackend-->>Caller: error (e.g., PathTraversal/UnsupportedTarEntry)
        end
    end
    FileSystem-->>FileBackend: all_written
    FileBackend-->>Caller: extracted_hashes

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

gschoeni
subygan

Poem

🐰 I hop through tarballs, gzip in tow,
Packing hashes row by row,
I guard against .. and broken names,
Stage VFS safely, no sneaky games.
A byte-estimate hums — off we go! 🥕✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly summarizes the main changes: implementing MerklePacker and MerkleUnpacker traits for FileBackend, which is the primary focus of the changeset.
Description check	✅ Passed	The description is directly related to the changeset, explaining the MerkleTransport trait implementations and progress reporting changes that align with the code modifications.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch mg/merkle_pack_impls

_{Review rate limit: 3/5 reviews remaining, refill in 15 minutes and 34 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/lib/src/core/db/merkle_node/file_backend.rs`:
- Around line 377-389: The code currently calls file.unpack(&dst_path) and only
afterwards calls extract_hash_from_entry_path(&dst_path, oxen_hidden), which
lets malformed entries be written and causes skipped-existing entries to not
report their parsed hash; move the call to extract_hash_from_entry_path before
any side-effecting operations (before the exists/overwrite_existing check and
before file.unpack), so validation/classification runs first and returns or
records InvalidTarStructure/InvalidNodeIdHex errors as required, and when
extract_hash_from_entry_path yields a hash still allow the subsequent existence
check (using dst_path and overwrite_existing) to skip unpack but still include
the parsed hash in the returned set.
- Around line 355-373: The code currently only rejects ParentDir components but
still allows absolute or Windows-prefixed paths which will make dst_path escape
oxen_hidden; update the validation on the local variable path (used to compute
dst_path) to also reject absolute/root/prefix components (e.g.,
Component::RootDir and Component::Prefix) and any path where path.is_absolute()
or the first component is RootDir/Prefix, returning MerkleDbError::PathTraversal
or a new appropriate error; keep the existing ParentDir check and then perform
that extra guard before computing dst_path (the oxen_hidden, tree_nodes_prefix,
and dst_path logic should remain unchanged once the path is verified).
- Around line 349-354: The loop over tar entries in unpack() currently logs and
continues on an unreadable entry leading to partial installs; instead propagate
the error immediately: change the `for entry in entries { let Ok(mut file) =
entry else { ... continue; } }` handling to return an Err constructed from the
entry's error (or map it to the function's error type) rather than calling
log::error and continuing, so unpack() fails fast on corrupt tar entries and the
caller receives the failure.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1f5a48c2-b500-4b25-8ee1-7c341da833bb

📥 Commits

Reviewing files that changed from the base of the PR and between b37ba5f and f7406e4.

📒 Files selected for processing (4)

.gitignore
Cargo.toml
crates/lib/src/core/db/merkle_node/file_backend.rs
crates/lib/src/core/db/merkle_node/merkle_node_db.rs

coderabbitai · 2026-04-30T00:01:24Z

+    for entry in entries {
+        let Ok(mut file) = entry else {
+            log::error!("Could not unpack file in merkle tar archive");
+            // TODO: raise this error to the caller instead!?
+            continue;
+        };


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast on unreadable tar entries.

If one entry is corrupt, this currently logs and continues, so unpack() can return Ok(...) after a partial install. That makes archive corruption look successful.

Suggested change

- let Ok(mut file) = entry else { - log::error!("Could not unpack file in merkle tar archive"); - // TODO: raise this error to the caller instead!? - continue; - }; + let mut file = entry.map_err(MerkleDbError::CannotReadMerkle)?;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/lib/src/core/db/merkle_node/file_backend.rs` around lines 349 - 354, The loop over tar entries in unpack() currently logs and continues on an unreadable entry leading to partial installs; instead propagate the error immediately: change the `for entry in entries { let Ok(mut file) = entry else { ... continue; } }` handling to return an Err constructed from the entry's error (or map it to the function's error type) rather than calling log::error and continuing, so unpack() fails fast on corrupt tar entries and the caller receives the failure.

coderabbitai · 2026-04-30T00:01:24Z

+        let path = file.path()?.into_owned();
+        // Path-traversal guard: refuse any entry whose path resolves above its container.
+        if path.components().any(|c| matches!(c, Component::ParentDir)) {
+            return Err(MerkleDbError::PathTraversal(path.display().to_string()));
+        }
+        // Entry-type validation: only regular files and directories are allowed.
+        let entry_type = file.header().entry_type();
+        if !entry_type.is_file() && !entry_type.is_dir() {
+            return Err(MerkleDbError::UnsupportedTarEntry {
+                path: path.display().to_string(),
+            });
+        }
+        // Server-style entries already contain `tree/nodes/...`; join directly.
+        // Legacy client-push entries begin at `{prefix}/{suffix}/...`; prepend `tree/nodes/`.
+        let dst_path = if path.starts_with(&tree_nodes_prefix) {
+            oxen_hidden.join(&path)
+        } else {
+            oxen_hidden.join(&tree_nodes_prefix).join(&path)
+        };


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Reject absolute and prefixed tar paths before join().

The current guard only blocks ... An entry like /tmp/pwn or a Windows-prefixed path will cause PathBuf::join to discard oxen_hidden and unpack outside the repo.

Suggested change

- if path.components().any(|c| matches!(c, Component::ParentDir)) { + if path.components().any(|c| { + matches!( + c, + Component::ParentDir | Component::RootDir | Component::Prefix(_) + ) + }) { return Err(MerkleDbError::PathTraversal(path.display().to_string())); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let path = file.path()?.into_owned();

// Path-traversal guard: refuse any entry whose path resolves above its container.

if path.components().any(|c| matches!(c, Component::ParentDir)) {

return Err(MerkleDbError::PathTraversal(path.display().to_string()));

}

// Entry-type validation: only regular files and directories are allowed.

let entry_type = file.header().entry_type();

if !entry_type.is_file() && !entry_type.is_dir() {

return Err(MerkleDbError::UnsupportedTarEntry {

path: path.display().to_string(),

});

}

// Server-style entries already contain `tree/nodes/...`; join directly.

// Legacy client-push entries begin at `{prefix}/{suffix}/...`; prepend `tree/nodes/`.

let dst_path = if path.starts_with(&tree_nodes_prefix) {

oxen_hidden.join(&path)

} else {

oxen_hidden.join(&tree_nodes_prefix).join(&path)

};

let path = file.path()?.into_owned();

// Path-traversal guard: refuse any entry whose path resolves above its container.

if path.components().any(|c| {

matches!(

c,

Component::ParentDir | Component::RootDir | Component::Prefix(_)

)

}) {

return Err(MerkleDbError::PathTraversal(path.display().to_string()));

}

// Entry-type validation: only regular files and directories are allowed.

let entry_type = file.header().entry_type();

if !entry_type.is_file() && !entry_type.is_dir() {

return Err(MerkleDbError::UnsupportedTarEntry {

path: path.display().to_string(),

});

}

// Server-style entries already contain `tree/nodes/...`; join directly.

// Legacy client-push entries begin at `{prefix}/{suffix}/...`; prepend `tree/nodes/`.

let dst_path = if path.starts_with(&tree_nodes_prefix) {

oxen_hidden.join(&path)

} else {

oxen_hidden.join(&tree_nodes_prefix).join(&path)

};

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/lib/src/core/db/merkle_node/file_backend.rs` around lines 355 - 373, The code currently only rejects ParentDir components but still allows absolute or Windows-prefixed paths which will make dst_path escape oxen_hidden; update the validation on the local variable path (used to compute dst_path) to also reject absolute/root/prefix components (e.g., Component::RootDir and Component::Prefix) and any path where path.is_absolute() or the first component is RootDir/Prefix, returning MerkleDbError::PathTraversal or a new appropriate error; keep the existing ParentDir check and then perform that extra guard before computing dst_path (the oxen_hidden, tree_nodes_prefix, and dst_path logic should remain unchanged once the path is verified).

coderabbitai · 2026-04-30T00:01:24Z

+        if dst_path.exists() && !overwrite_existing {
+            log::info!("Node already exists at {dst_path:?}, skipping");
+            continue;
+        }
+        file.unpack(&dst_path)?;
+
+        // Extract the merkle hash from this entry's path, if it identifies one.
+        //
+        // After the path-resolution above, `dst_path` is of the form
+        // `<oxen_hidden>/tree/nodes/<rest>`. We classify entries by the SHAPE
+        // of `<rest>`, never by whether components happen to be hex. We assume that
+        // we have the hex-encoded hash as the `{prefix}/{suffix}` dirs.
+        if let Some(hash) = extract_hash_from_entry_path(&dst_path, oxen_hidden)? {


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate and classify the entry before any skip/write side effects.

extract_hash_from_entry_path() is the only place that records parsed hashes and rejects InvalidTarStructure / InvalidNodeIdHex, but it runs only after the SkipExisting early return and after file.unpack(&dst_path). That means malformed entries can already be written to disk before unpack() returns Err, and duplicate entries are omitted from the returned hash set even though the unpack contract says parsed hashes should still be reported.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/lib/src/core/db/merkle_node/file_backend.rs` around lines 377 - 389, The code currently calls file.unpack(&dst_path) and only afterwards calls extract_hash_from_entry_path(&dst_path, oxen_hidden), which lets malformed entries be written and causes skipped-existing entries to not report their parsed hash; move the call to extract_hash_from_entry_path before any side-effecting operations (before the exists/overwrite_existing check and before file.unpack), so validation/classification runs first and returns or records InvalidTarStructure/InvalidNodeIdHex errors as required, and when extract_hash_from_entry_path yields a hash still allow the subsequent existence check (using dst_path and overwrite_existing) to skip unpack but still include the parsed hash in the returned set.

malcolmgreaves · 2026-04-30T02:30:58Z

STACKED PR: Do not merge until #502 has been merged.

**`MerkleTransport`** Implemented the new `MerkleTransport` traits for packing up and unpacking Merkle tree nodes as they're sent on the wire from the oxen client to server. Preserves the existing tar-gz formats specific to the FileBackend's physical store layout. Supports the two unique tar-gz paths: - upload: only captures `{prefix}/{suffix}/{node,children}` - everything else: `tree/nodes/{prefix}/{suffix}/{node,children}` **Progress Bar in Bytes** Changes the progress bar for uploads & downloads to use bytes/s instead of nodes/s. Nodes are not evenly sized, so progress will look like it stops when handling a large file. This now hooks into the reader & writer types that the trait uses.

malcolmgreaves force-pushed the mg/merkle_pack_impls branch from 7eb154c to f7406e4 Compare April 29, 2026 23:51

malcolmgreaves commented Apr 29, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

malcolmgreaves mentioned this pull request Apr 30, 2026

Refactor to use Merkle{Packer,Unpacker} traits throughout #506

Closed

malcolmgreaves force-pushed the mg/merkle_pack_impls branch from f7406e4 to 7d78e5c Compare April 30, 2026 02:38

malcolmgreaves closed this May 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented Merkle{Packer,Unpacker} for FileBackend#504

Implemented Merkle{Packer,Unpacker} for FileBackend#504
malcolmgreaves wants to merge 1 commit into
mg/merkle_packing_interfacesfrom
mg/merkle_pack_impls

malcolmgreaves commented Apr 29, 2026 •

edited

Loading

Uh oh!

malcolmgreaves Apr 29, 2026

Uh oh!

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 30, 2026

Uh oh!

coderabbitai Bot Apr 30, 2026

Uh oh!

coderabbitai Bot Apr 30, 2026

Uh oh!

malcolmgreaves commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

malcolmgreaves commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

malcolmgreaves Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

malcolmgreaves commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

malcolmgreaves commented Apr 29, 2026 •

edited

Loading

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading