Skip to content

Refactor to use Merkle{Packer,Unpacker} traits throughout#506

Closed
malcolmgreaves wants to merge 1 commit intomg/merkle_pack_implsfrom
mg/merkle_pack_refactor
Closed

Refactor to use Merkle{Packer,Unpacker} traits throughout#506
malcolmgreaves wants to merge 1 commit intomg/merkle_pack_implsfrom
mg/merkle_pack_refactor

Conversation

@malcolmgreaves
Copy link
Copy Markdown
Collaborator

Refactors oxen's Merkle tree node transport between clients
and servers to use the new MerklePacker and MerkleUnpacker
traits. The LocalRepository::merkle_store() method has been
updated to return a impl TransportableMerkleStore + '_. And
the private merkle_store_dispatch::StoreEnum has been updated
to include dispatch to the MerkleTransport's methods.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 30, 2026

📝 Walkthrough

Summary by CodeRabbit

  • Performance

    • Large tree uploads now stream packed data, reducing memory use.
  • Bug Fixes

    • Corrupted downloads now surface errors reliably instead of causing internal panics.
  • New Features

    • Progress indicators can increase total byte counts dynamically during uploads.
  • Documentation

    • Clarified merkle store pack/unpack behavior and error semantics.
  • Tests

    • Added regression test to ensure corrupted responses are detected and reported.

Walkthrough

Streaming-based merkle node transport replaces in-memory tar+gzip buffers for uploads and downloads. Client-side create_nodes and node_download_request now stream packing/unpacking via duplex + spawn_blocking and use merkle_store().pack_* / unpack. Progress APIs extended and tar-entry read errors are now propagated.

Changes

Cohort / File(s) Summary
Client Streaming Pipeline
crates/lib/src/api/client/tree.rs
create_nodes streams legacy-format tar-gz into HTTP body using a tokio::io::duplex, spawn_blocking packing, and ReaderStream; progress total is estimated and incremented per chunk. node_download_request streams response into a sync reader and calls merkle_store().unpack(...) inside spawn_blocking. Added regression test for corrupted gzip error propagation.
Merkle Store Transport Interface
crates/lib/src/model/repository/local_repository.rs, crates/lib/src/model/repository/local_repository/merkle_store_dispatch.rs
LocalRepository::merkle_store() now returns impl TransportableMerkleStore. StoreEnum gained MerklePacker/MerkleUnpacker impls delegating pack_nodes, pack_all, and unpack with PackOptions/UnpackOptions, mapping backend errors into StoreError.
Tree Serialization Refactoring
crates/lib/src/repositories/tree.rs
Replaced tar+gzip buffer creation/extraction with merkle_store().pack_* and unpack. Removed compress_full_tree and removed filesystem-based tar extraction/hash parsing.
Progress Tracking Enhancements
crates/lib/src/core/progress/push_progress.rs, crates/lib/src/core/progress/sync_progress.rs
Added public inc_total_bytes(&self, delta: u64) on both PushProgress and SyncProgress to update progress bar total length during streaming operations.
Backend Error Handling & Tests
crates/lib/src/core/db/merkle_node/file_backend.rs
Top-level docs clarified (implements MerkleStore). extract_tar_under now propagates archive.entries() errors as MerkleDbError::CannotReadMerkle instead of silently skipping; test call sites updated to use LocalRepository::merkle_store().
Docs / Trait Clarifications
crates/lib/src/model/merkle_tree.rs, crates/lib/src/model/merkle_tree/merkle_transport.rs
Rewrote TransportableMerkleStore docs to explain store vs transport error separation and IntoOxenError conversion; removed historical error-shape commentary from MerkleTransport docs.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant PackTask as Pack (spawn_blocking)
    participant Duplex as DuplexWriter
    participant HTTP as HTTP POST
    participant Server
    participant UnpackTask as Unpack (spawn_blocking)
    participant Merkle as MerkleStore

    Client->>PackTask: request pack_nodes / pack_all
    PackTask->>Duplex: write tar.gz bytes into duplex writer
    Duplex->>HTTP: stream body chunks (ReaderStream)
    HTTP->>Server: POST streamed tar.gz
    Server-->>HTTP: response (streamed body)
    HTTP->>UnpackTask: bridge response to sync reader
    UnpackTask->>Merkle: merkle_store().unpack(reader, UnpackOptions)
    Merkle-->>UnpackTask: unpack result (hashes / error)
    UnpackTask-->>Client: return result or propagate error
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • CleanCut
  • gschoeni

Poem

🐰
I packed the tar in a burrowed block,
Streamed bytes like carrots down a clock,
Progress hops along, not a byte to waste,
Unpack, verify — not a panic in haste,
Merkle leaves flutter in networked delight.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Refactor to use Merkle{Packer,Unpacker} traits throughout' clearly and concisely summarizes the main change in the changeset, which is a comprehensive refactoring to use the new MerklePacker and MerkleUnpacker traits across multiple files.
Description check ✅ Passed The description is directly related to the changeset, providing clear context about refactoring Merkle tree node transport to use the new MerklePacker and MerkleUnpacker traits, with specific mentions of updated return types and dispatch mechanisms.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch mg/merkle_pack_refactor

Review rate limit: 2/5 reviews remaining, refill in 26 minutes and 11 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@malcolmgreaves
Copy link
Copy Markdown
Collaborator Author

STACKED PR: Do not merge until #504 has been merged.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/lib/src/core/db/merkle_node/file_backend.rs (1)

349-373: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Reject absolute and prefixed tar paths before join.

This guard only blocks ... An entry like /tmp/evil (or a Windows drive/prefix path) will cause join to drop the .oxen/... prefix and write outside the repo. The later extract_hash_from_entry_path check happens after file.unpack, so the escape already occurred.

Suggested fix
         let mut file = entry.map_err(MerkleDbError::CannotReadMerkle)?;
         let path = file.path()?.into_owned();
-        // Path-traversal guard: refuse any entry whose path resolves above its container.
-        if path.components().any(|c| matches!(c, Component::ParentDir)) {
+        // Path-traversal guard: refuse any entry that can escape its container.
+        if path.components().any(|c| {
+            matches!(
+                c,
+                Component::ParentDir | Component::RootDir | Component::Prefix(_)
+            )
+        }) || path.is_absolute()
+        {
             return Err(MerkleDbError::PathTraversal(path.display().to_string()));
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/lib/src/core/db/merkle_node/file_backend.rs` around lines 349 - 373,
Reject absolute/prefixed paths before performing any join/unpack: in the block
that currently inspects path (the local variable `path` in file_backend.rs) add
a guard that returns `MerkleDbError::PathTraversal` if `path.is_absolute()` or
if any component matches `Component::RootDir` or `Component::Prefix(_)` (to
cover Windows drive prefixes), in addition to the existing ParentDir check;
ensure this check is executed before computing `dst_path`, before `file.unpack`
and before calling `extract_hash_from_entry_path`, so `oxen_hidden.join(...)`
cannot be bypassed by absolute/prefixed entries (refer to `path`, `dst_path`,
`tree_nodes_prefix`, `oxen_hidden`, and `extract_hash_from_entry_path`).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@crates/lib/src/core/db/merkle_node/file_backend.rs`:
- Around line 349-373: Reject absolute/prefixed paths before performing any
join/unpack: in the block that currently inspects path (the local variable
`path` in file_backend.rs) add a guard that returns
`MerkleDbError::PathTraversal` if `path.is_absolute()` or if any component
matches `Component::RootDir` or `Component::Prefix(_)` (to cover Windows drive
prefixes), in addition to the existing ParentDir check; ensure this check is
executed before computing `dst_path`, before `file.unpack` and before calling
`extract_hash_from_entry_path`, so `oxen_hidden.join(...)` cannot be bypassed by
absolute/prefixed entries (refer to `path`, `dst_path`, `tree_nodes_prefix`,
`oxen_hidden`, and `extract_hash_from_entry_path`).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7131f0d1-9ef9-4c07-bb2b-441382028cb4

📥 Commits

Reviewing files that changed from the base of the PR and between f7406e4 and 8a84c72.

📒 Files selected for processing (9)
  • crates/lib/src/api/client/tree.rs
  • crates/lib/src/core/db/merkle_node/file_backend.rs
  • crates/lib/src/core/progress/push_progress.rs
  • crates/lib/src/core/progress/sync_progress.rs
  • crates/lib/src/model/merkle_tree.rs
  • crates/lib/src/model/merkle_tree/merkle_transport.rs
  • crates/lib/src/model/repository/local_repository.rs
  • crates/lib/src/model/repository/local_repository/merkle_store_dispatch.rs
  • crates/lib/src/repositories/tree.rs
💤 Files with no reviewable changes (1)
  • crates/lib/src/model/merkle_tree/merkle_transport.rs

Refactors oxen's Merkle tree node transport between clients
and servers to use the new `MerklePacker` and `MerkleUnpacker`
traits. The `LocalRepository::merkle_store()` method has been
updated to return a `impl TransportableMerkleStore + '_`. And
the private `merkle_store_dispatch::StoreEnum` has been updated
to include dispatch to the `MerkleTransport`'s methods.
@malcolmgreaves malcolmgreaves force-pushed the mg/merkle_pack_refactor branch from 8a84c72 to 91b63cb Compare April 30, 2026 02:39
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/lib/src/core/db/merkle_node/file_backend.rs (1)

355-373: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Reject absolute and prefixed tar paths here.

The new guard only blocks ... An entry like /tmp/pwned or C:\temp\pwned still passes, and PathBuf::join will ignore oxen_hidden, so file.unpack() can write outside the repo before the later structure check runs.

Suggested hardening
-        // Path-traversal guard: refuse any entry whose path resolves above its container.
-        if path.components().any(|c| matches!(c, Component::ParentDir)) {
+        // Path-traversal guard: refuse any entry that can escape its container.
+        if path.components().any(|c| {
+            matches!(
+                c,
+                Component::ParentDir | Component::RootDir | Component::Prefix(_)
+            )
+        }) {
             return Err(MerkleDbError::PathTraversal(path.display().to_string()));
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/lib/src/core/db/merkle_node/file_backend.rs` around lines 355 - 373,
The current path-traversal guard only rejects ParentDir (..), but absolute or
prefixed archive paths (e.g. starting with Component::RootDir or
Component::Prefix like "C:") still pass and will cause PathBuf::join to ignore
oxen_hidden; update the validation in the unpack path by rejecting any path
whose components contain Component::RootDir or Component::Prefix (in addition to
Component::ParentDir) or where path.is_absolute() is true, returning
MerkleDbError::PathTraversal (or same error type) for those cases before
computing dst_path; keep the existing checks for entry_type and the existing
dst_path logic otherwise so server-style tree_nodes_prefix handling remains
unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@crates/lib/src/core/db/merkle_node/file_backend.rs`:
- Around line 355-373: The current path-traversal guard only rejects ParentDir
(..), but absolute or prefixed archive paths (e.g. starting with
Component::RootDir or Component::Prefix like "C:") still pass and will cause
PathBuf::join to ignore oxen_hidden; update the validation in the unpack path by
rejecting any path whose components contain Component::RootDir or
Component::Prefix (in addition to Component::ParentDir) or where
path.is_absolute() is true, returning MerkleDbError::PathTraversal (or same
error type) for those cases before computing dst_path; keep the existing checks
for entry_type and the existing dst_path logic otherwise so server-style
tree_nodes_prefix handling remains unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7a98c6c0-1797-4a1f-bddc-d2f99e356f0d

📥 Commits

Reviewing files that changed from the base of the PR and between 8a84c72 and 91b63cb.

📒 Files selected for processing (9)
  • crates/lib/src/api/client/tree.rs
  • crates/lib/src/core/db/merkle_node/file_backend.rs
  • crates/lib/src/core/progress/push_progress.rs
  • crates/lib/src/core/progress/sync_progress.rs
  • crates/lib/src/model/merkle_tree.rs
  • crates/lib/src/model/merkle_tree/merkle_transport.rs
  • crates/lib/src/model/repository/local_repository.rs
  • crates/lib/src/model/repository/local_repository/merkle_store_dispatch.rs
  • crates/lib/src/repositories/tree.rs
💤 Files with no reviewable changes (1)
  • crates/lib/src/model/merkle_tree/merkle_transport.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/lib/src/model/merkle_tree.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant