Implemented Merkle{Packer,Unpacker} for FileBackend#504
Conversation
7eb154c to
f7406e4
Compare
| }) | ||
| } | ||
|
|
||
| /// Gunzip + collect tar entries into a deterministic map for byte-compat comparison. |
There was a problem hiding this comment.
All test code helpers from here down. This is quite verbose, but the point is to extract the existing logic for Merkle tree node packing & unpacking on main and preserve that here. This way, we can run both the copied old code as well as the newly refactored code and prove that they behave the same.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (4)
📝 WalkthroughSummary by CodeRabbit
WalkthroughImplements tar-gz Merkle node pack/unpack in FileBackend with legacy and server wire layouts, adds precise MerkleDbError variants, enables additional Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant FileBackend
participant TarBuilder
participant Compressor
participant FileSystem
Caller->>FileBackend: pack_nodes(repo, hashes, options)
FileBackend->>FileBackend: check node dirs & options
loop per hash
FileBackend->>FileBackend: read node files/metadata
FileBackend->>TarBuilder: add_entry(path, content)
end
FileBackend->>TarBuilder: finalize()
alt gzip layout
TarBuilder->>Compressor: compress(stream)
Compressor-->>FileBackend: compressed_stream
else legacy layout (different gzip level)
TarBuilder->>Compressor: compress(stream, level)
Compressor-->>FileBackend: compressed_stream
end
FileBackend->>FileSystem: write output
FileSystem-->>FileBackend: ok
FileBackend-->>Caller: packed_artifact
sequenceDiagram
participant Caller
participant FileBackend
participant Validator
participant Extractor
participant VFSStaging
participant FileSystem
Caller->>FileBackend: unpack_nodes(archive, repo, options)
FileBackend->>FileBackend: decompress(archive)
loop for each tar entry
FileBackend->>Validator: validate_entry(path, type)
Validator-->>FileBackend: ok / reject
alt valid
FileBackend->>Extractor: extract(entry)
alt VFS repo
Extractor->>VFSStaging: stage to temp
VFSStaging->>FileSystem: copy into VFS store
else standard repo
Extractor->>FileSystem: write to repo path
end
else rejected
FileBackend-->>Caller: error (e.g., PathTraversal/UnsupportedTarEntry)
end
end
FileSystem-->>FileBackend: all_written
FileBackend-->>Caller: extracted_hashes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Review rate limit: 3/5 reviews remaining, refill in 15 minutes and 34 seconds. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/lib/src/core/db/merkle_node/file_backend.rs`:
- Around line 377-389: The code currently calls file.unpack(&dst_path) and only
afterwards calls extract_hash_from_entry_path(&dst_path, oxen_hidden), which
lets malformed entries be written and causes skipped-existing entries to not
report their parsed hash; move the call to extract_hash_from_entry_path before
any side-effecting operations (before the exists/overwrite_existing check and
before file.unpack), so validation/classification runs first and returns or
records InvalidTarStructure/InvalidNodeIdHex errors as required, and when
extract_hash_from_entry_path yields a hash still allow the subsequent existence
check (using dst_path and overwrite_existing) to skip unpack but still include
the parsed hash in the returned set.
- Around line 355-373: The code currently only rejects ParentDir components but
still allows absolute or Windows-prefixed paths which will make dst_path escape
oxen_hidden; update the validation on the local variable path (used to compute
dst_path) to also reject absolute/root/prefix components (e.g.,
Component::RootDir and Component::Prefix) and any path where path.is_absolute()
or the first component is RootDir/Prefix, returning MerkleDbError::PathTraversal
or a new appropriate error; keep the existing ParentDir check and then perform
that extra guard before computing dst_path (the oxen_hidden, tree_nodes_prefix,
and dst_path logic should remain unchanged once the path is verified).
- Around line 349-354: The loop over tar entries in unpack() currently logs and
continues on an unreadable entry leading to partial installs; instead propagate
the error immediately: change the `for entry in entries { let Ok(mut file) =
entry else { ... continue; } }` handling to return an Err constructed from the
entry's error (or map it to the function's error type) rather than calling
log::error and continuing, so unpack() fails fast on corrupt tar entries and the
caller receives the failure.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1f5a48c2-b500-4b25-8ee1-7c341da833bb
📒 Files selected for processing (4)
.gitignoreCargo.tomlcrates/lib/src/core/db/merkle_node/file_backend.rscrates/lib/src/core/db/merkle_node/merkle_node_db.rs
| for entry in entries { | ||
| let Ok(mut file) = entry else { | ||
| log::error!("Could not unpack file in merkle tar archive"); | ||
| // TODO: raise this error to the caller instead!? | ||
| continue; | ||
| }; |
There was a problem hiding this comment.
Fail fast on unreadable tar entries.
If one entry is corrupt, this currently logs and continues, so unpack() can return Ok(...) after a partial install. That makes archive corruption look successful.
Suggested change
- let Ok(mut file) = entry else {
- log::error!("Could not unpack file in merkle tar archive");
- // TODO: raise this error to the caller instead!?
- continue;
- };
+ let mut file = entry.map_err(MerkleDbError::CannotReadMerkle)?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/lib/src/core/db/merkle_node/file_backend.rs` around lines 349 - 354,
The loop over tar entries in unpack() currently logs and continues on an
unreadable entry leading to partial installs; instead propagate the error
immediately: change the `for entry in entries { let Ok(mut file) = entry else {
... continue; } }` handling to return an Err constructed from the entry's error
(or map it to the function's error type) rather than calling log::error and
continuing, so unpack() fails fast on corrupt tar entries and the caller
receives the failure.
| let path = file.path()?.into_owned(); | ||
| // Path-traversal guard: refuse any entry whose path resolves above its container. | ||
| if path.components().any(|c| matches!(c, Component::ParentDir)) { | ||
| return Err(MerkleDbError::PathTraversal(path.display().to_string())); | ||
| } | ||
| // Entry-type validation: only regular files and directories are allowed. | ||
| let entry_type = file.header().entry_type(); | ||
| if !entry_type.is_file() && !entry_type.is_dir() { | ||
| return Err(MerkleDbError::UnsupportedTarEntry { | ||
| path: path.display().to_string(), | ||
| }); | ||
| } | ||
| // Server-style entries already contain `tree/nodes/...`; join directly. | ||
| // Legacy client-push entries begin at `{prefix}/{suffix}/...`; prepend `tree/nodes/`. | ||
| let dst_path = if path.starts_with(&tree_nodes_prefix) { | ||
| oxen_hidden.join(&path) | ||
| } else { | ||
| oxen_hidden.join(&tree_nodes_prefix).join(&path) | ||
| }; |
There was a problem hiding this comment.
Reject absolute and prefixed tar paths before join().
The current guard only blocks ... An entry like /tmp/pwn or a Windows-prefixed path will cause PathBuf::join to discard oxen_hidden and unpack outside the repo.
Suggested change
- if path.components().any(|c| matches!(c, Component::ParentDir)) {
+ if path.components().any(|c| {
+ matches!(
+ c,
+ Component::ParentDir | Component::RootDir | Component::Prefix(_)
+ )
+ }) {
return Err(MerkleDbError::PathTraversal(path.display().to_string()));
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let path = file.path()?.into_owned(); | |
| // Path-traversal guard: refuse any entry whose path resolves above its container. | |
| if path.components().any(|c| matches!(c, Component::ParentDir)) { | |
| return Err(MerkleDbError::PathTraversal(path.display().to_string())); | |
| } | |
| // Entry-type validation: only regular files and directories are allowed. | |
| let entry_type = file.header().entry_type(); | |
| if !entry_type.is_file() && !entry_type.is_dir() { | |
| return Err(MerkleDbError::UnsupportedTarEntry { | |
| path: path.display().to_string(), | |
| }); | |
| } | |
| // Server-style entries already contain `tree/nodes/...`; join directly. | |
| // Legacy client-push entries begin at `{prefix}/{suffix}/...`; prepend `tree/nodes/`. | |
| let dst_path = if path.starts_with(&tree_nodes_prefix) { | |
| oxen_hidden.join(&path) | |
| } else { | |
| oxen_hidden.join(&tree_nodes_prefix).join(&path) | |
| }; | |
| let path = file.path()?.into_owned(); | |
| // Path-traversal guard: refuse any entry whose path resolves above its container. | |
| if path.components().any(|c| { | |
| matches!( | |
| c, | |
| Component::ParentDir | Component::RootDir | Component::Prefix(_) | |
| ) | |
| }) { | |
| return Err(MerkleDbError::PathTraversal(path.display().to_string())); | |
| } | |
| // Entry-type validation: only regular files and directories are allowed. | |
| let entry_type = file.header().entry_type(); | |
| if !entry_type.is_file() && !entry_type.is_dir() { | |
| return Err(MerkleDbError::UnsupportedTarEntry { | |
| path: path.display().to_string(), | |
| }); | |
| } | |
| // Server-style entries already contain `tree/nodes/...`; join directly. | |
| // Legacy client-push entries begin at `{prefix}/{suffix}/...`; prepend `tree/nodes/`. | |
| let dst_path = if path.starts_with(&tree_nodes_prefix) { | |
| oxen_hidden.join(&path) | |
| } else { | |
| oxen_hidden.join(&tree_nodes_prefix).join(&path) | |
| }; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/lib/src/core/db/merkle_node/file_backend.rs` around lines 355 - 373,
The code currently only rejects ParentDir components but still allows absolute
or Windows-prefixed paths which will make dst_path escape oxen_hidden; update
the validation on the local variable path (used to compute dst_path) to also
reject absolute/root/prefix components (e.g., Component::RootDir and
Component::Prefix) and any path where path.is_absolute() or the first component
is RootDir/Prefix, returning MerkleDbError::PathTraversal or a new appropriate
error; keep the existing ParentDir check and then perform that extra guard
before computing dst_path (the oxen_hidden, tree_nodes_prefix, and dst_path
logic should remain unchanged once the path is verified).
| if dst_path.exists() && !overwrite_existing { | ||
| log::info!("Node already exists at {dst_path:?}, skipping"); | ||
| continue; | ||
| } | ||
| file.unpack(&dst_path)?; | ||
|
|
||
| // Extract the merkle hash from this entry's path, if it identifies one. | ||
| // | ||
| // After the path-resolution above, `dst_path` is of the form | ||
| // `<oxen_hidden>/tree/nodes/<rest>`. We classify entries by the SHAPE | ||
| // of `<rest>`, never by whether components happen to be hex. We assume that | ||
| // we have the hex-encoded hash as the `{prefix}/{suffix}` dirs. | ||
| if let Some(hash) = extract_hash_from_entry_path(&dst_path, oxen_hidden)? { |
There was a problem hiding this comment.
Validate and classify the entry before any skip/write side effects.
extract_hash_from_entry_path() is the only place that records parsed hashes and rejects InvalidTarStructure / InvalidNodeIdHex, but it runs only after the SkipExisting early return and after file.unpack(&dst_path). That means malformed entries can already be written to disk before unpack() returns Err, and duplicate entries are omitted from the returned hash set even though the unpack contract says parsed hashes should still be reported.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/lib/src/core/db/merkle_node/file_backend.rs` around lines 377 - 389,
The code currently calls file.unpack(&dst_path) and only afterwards calls
extract_hash_from_entry_path(&dst_path, oxen_hidden), which lets malformed
entries be written and causes skipped-existing entries to not report their
parsed hash; move the call to extract_hash_from_entry_path before any
side-effecting operations (before the exists/overwrite_existing check and before
file.unpack), so validation/classification runs first and returns or records
InvalidTarStructure/InvalidNodeIdHex errors as required, and when
extract_hash_from_entry_path yields a hash still allow the subsequent existence
check (using dst_path and overwrite_existing) to skip unpack but still include
the parsed hash in the returned set.
|
STACKED PR: Do not merge until #502 has been merged. |
**`MerkleTransport`**
Implemented the new `MerkleTransport` traits for packing up
and unpacking Merkle tree nodes as they're sent on the wire
from the oxen client to server. Preserves the existing tar-gz
formats specific to the FileBackend's physical store layout.
Supports the two unique tar-gz paths:
- upload: only captures `{prefix}/{suffix}/{node,children}`
- everything else: `tree/nodes/{prefix}/{suffix}/{node,children}`
**Progress Bar in Bytes**
Changes the progress bar for uploads & downloads to use bytes/s
instead of nodes/s. Nodes are not evenly sized, so progress will
look like it stops when handling a large file. This now hooks
into the reader & writer types that the trait uses.
f7406e4 to
7d78e5c
Compare
MerkleTransportImplemented the new
MerkleTransporttraits for packing upand unpacking Merkle tree nodes as they're sent on the wire
from the oxen client to server. Preserves the existing tar-gz
formats specific to the FileBackend's physical store layout.
Supports the two unique tar-gz paths:
{prefix}/{suffix}/{node,children}tree/nodes/{prefix}/{suffix}/{node,children}Progress Bar in Bytes
Changes the progress bar for uploads & downloads to use bytes/s
instead of nodes/s. Nodes are not evenly sized, so progress will
look like it stops when handling a large file. This now hooks
into the reader & writer types that the trait uses.