Implement Merkle{Packer,Unpacker} for FileBackend#515
Implement Merkle{Packer,Unpacker} for FileBackend#515malcolmgreaves wants to merge 1 commit intomg/merkle_dyn_pack_interfacesfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds tar.gz packing/unpacking transport for Merkle nodes to FileBackend (with VFS staging and byte-estimate), extends MerkleDbError with tar/transport-specific variants, adds extensive wire-format and security tests, and enables additional features for the workspace ChangesDependency Features
Merkle Node Transport & Errors
Sequence Diagram(s)sequenceDiagram
participant Client
participant FileBackend
participant Filesystem
participant TarWriter
Client->>FileBackend: pack_nodes(hashes, PackOptions)
FileBackend->>Filesystem: stat/iterate node directories
Filesystem-->>FileBackend: dir entries & file sizes
FileBackend->>TarWriter: write entries (tree/nodes/... or legacy layout)
TarWriter->>TarWriter: gzip compress (chosen level)
FileBackend->>Client: stream tar.gz bytes
sequenceDiagram
participant Client
participant FileBackend
participant TarReader
participant Validator
participant Filesystem
Client->>FileBackend: unpack(tar_gz_stream, UnpackOptions)
FileBackend->>TarReader: decompress & iterate entries
TarReader->>Validator: validate (no '..', allowed types, layout)
Validator-->>FileBackend: entry ok / error
alt VFS repository
FileBackend->>Filesystem: write to TempDir
FileBackend->>Filesystem: copy_dir_all -> .oxen/
else Regular repository
FileBackend->>Filesystem: write directly under .oxen/
end
FileBackend->>FileBackend: parse node ids -> HashSet<MerkleHash>
FileBackend->>Client: return unpacked HashSet
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/lib/src/core/db/merkle_node/file_backend.rs`:
- Around line 379-395: The code currently calls
extract_hash_from_entry_path(&dst_path, oxen_hidden) after create_dir_all, the
overwrite_existing check and file.unpack(), which lets invalid tar paths be
written and causes hashes to be skipped under UnpackOptions::SkipExisting; fix
by calling extract_hash_from_entry_path early—compute let hash_opt =
extract_hash_from_entry_path(&dst_path, oxen_hidden)? (or equivalent parser that
returns validation) immediately after computing dst_path and before any
create_dir_all / existence check / file.unpack(), then use that parsed result
for validation (return InvalidTarStructure if None/invalid), use hash_opt to
decide whether to skip adding to hashes when skipping existing files, and only
perform create_dir_all and file.unpack() after validation succeeds so invalid
entries are never written and hashes are consistently collected.
- Around line 354-359: The loop in file_backend.rs currently swallows tar entry
errors (for entry in entries { let Ok(mut file) = entry else { log::error!(...);
continue; } }) which allows partial installs; change it to propagate the entry
error as a MerkleDbError::CannotReadMerkle so the unpack fails immediately.
Replace the else/continue with an early Err return that converts the tar entry
error into MerkleDbError::CannotReadMerkle (including the original error message
for context), or use entry.map_err(|e| MerkleDbError::CannotReadMerkle(...))? to
bind mut file; ensure the error type returned by the surrounding function
matches this change.
- Around line 360-377: The code only checks for ParentDir components and misses
absolute/Windows-rooted tar paths which can discard the oxen_hidden prefix when
joined; in extract_tar_under (and similar to unpack_async_tar_archive) add a
guard that rejects path.is_absolute() and any component matching
Component::RootDir or Component::Prefix(_) (e.g., via path.components().any(|c|
matches!(c, Component::RootDir | Component::Prefix(_)))) and return a
MerkleDbError::PathTraversal (or existing appropriate error) before constructing
dst_path so absolute or rooted entries are refused.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a9f3ced5-5195-4725-8535-49844571d33c
📒 Files selected for processing (3)
Cargo.tomlcrates/lib/src/core/db/merkle_node/file_backend.rscrates/lib/src/core/db/merkle_node/merkle_node_db.rs
| let path = file.path()?.into_owned(); | ||
| // Path-traversal guard: refuse any entry whose path resolves above its container. | ||
| if path.components().any(|c| matches!(c, Component::ParentDir)) { | ||
| return Err(MerkleDbError::PathTraversal(path.display().to_string())); | ||
| } | ||
| // Entry-type validation: only regular files and directories are allowed. | ||
| let entry_type = file.header().entry_type(); | ||
| if !entry_type.is_file() && !entry_type.is_dir() { | ||
| return Err(MerkleDbError::UnsupportedTarEntry { | ||
| path: path.display().to_string(), | ||
| }); | ||
| } | ||
| // Server-style entries already contain `tree/nodes/...`; join directly. | ||
| // Legacy client-push entries begin at `{prefix}/{suffix}/...`; prepend `tree/nodes/`. | ||
| let dst_path = if path.starts_with(&tree_nodes_prefix) { | ||
| oxen_hidden.join(&path) | ||
| } else { | ||
| oxen_hidden.join(&tree_nodes_prefix).join(&path) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Rust, what does std::path::PathBuf::join do when the right-hand path is absolute or has a Windows drive prefix? In the tar crate, can Entry::path() surface archive paths with those forms?
💡 Result:
In Rust's standard library, std::path::PathBuf::join(path) creates a new PathBuf by adjoining the given path to self. If the right-hand path is absolute (starts with '/' on Unix or has both prefix and root on Windows), it replaces the current path entirely, returning the absolute path. On Windows, if the path has a prefix (e.g., drive letter like "C:") but no root, self is ignored and the prefixed path is returned unchanged. Examples from docs: Path::new("/etc").join("/bin/sh") yields "/bin/sh".
Citations:
- 1: https://doc.rust-lang.org/stable/std/path/struct.PathBuf.html
- 2: https://doc.rust-lang.org/std/path/struct.Path.html
🏁 Script executed:
# First, let's find and examine the file mentioned in the review
fd "file_backend.rs" | head -5Repository: Oxen-AI/Oxen
Length of output: 107
🏁 Script executed:
# Check the Cargo.toml to see what tar crate version is used
fd "Cargo.toml" | xargs grep -l "tar" | head -3Repository: Oxen-AI/Oxen
Length of output: 114
🏁 Script executed:
# Let's examine the actual code around lines 360-377
wc -l crates/lib/src/core/db/merkle_node/file_backend.rsRepository: Oxen-AI/Oxen
Length of output: 112
🏁 Script executed:
# Read the specific lines mentioned in the review
sed -n '360,377p' crates/lib/src/core/db/merkle_node/file_backend.rsRepository: Oxen-AI/Oxen
Length of output: 1044
🏁 Script executed:
# Get broader context around those lines to understand the function
sed -n '340,400p' crates/lib/src/core/db/merkle_node/file_backend.rsRepository: Oxen-AI/Oxen
Length of output: 2761
🏁 Script executed:
# Check tar crate version in lib's Cargo.toml
grep -A2 -B2 "^tar" crates/lib/Cargo.tomlRepository: Oxen-AI/Oxen
Length of output: 231
🏁 Script executed:
# Check workspace Cargo.toml to see tar version
cat Cargo.toml | grep -A5 "^\[workspace"Repository: Oxen-AI/Oxen
Length of output: 729
🏁 Script executed:
# Also search for any other path validation or safety checks in the codebase
rg "path_traversal|absolute|RootDir" crates/lib/src/core/db/merkle_node/ -A 2 -B 2Repository: Oxen-AI/Oxen
Length of output: 1066
🏁 Script executed:
# Check if there's a util::fs::unpack_async_tar_archive mentioned in the comment
fd "*.rs" -path "*/util/*" | xargs grep -l "unpack_async_tar_archive" | head -3Repository: Oxen-AI/Oxen
Length of output: 225
🏁 Script executed:
# Get the tar version from workspace dependencies
grep "^tar" Cargo.tomlRepository: Oxen-AI/Oxen
Length of output: 71
🏁 Script executed:
# Find and read the path traversal test
sed -n '/test_unpack_rejects_path_traversal/,/^[[:space:]]*\}/p' crates/lib/src/core/db/merkle_node/file_backend.rs | head -100Repository: Oxen-AI/Oxen
Length of output: 903
🏁 Script executed:
# Find util::fs::unpack_async_tar_archive
rg "unpack_async_tar_archive" crates/lib/src/Repository: Oxen-AI/Oxen
Length of output: 1960
🏁 Script executed:
# Read more of the path traversal test
sed -n '/async fn test_unpack_rejects_path_traversal/,/^ \}/p' crates/lib/src/core/db/merkle_node/file_backend.rsRepository: Oxen-AI/Oxen
Length of output: 1339
🏁 Script executed:
# Read the unpack_async_tar_archive function to see what checks it does
sed -n '/pub async fn unpack_async_tar_archive/,/^}\s*$/p' crates/lib/src/util/fs.rs | head -80Repository: Oxen-AI/Oxen
Length of output: 1896
🏁 Script executed:
# Search for any tests with absolute paths in tar entries
rg "^/" crates/lib/src/core/db/merkle_node/file_backend.rs -B3 -A3Repository: Oxen-AI/Oxen
Length of output: 8959
🏁 Script executed:
# Check if there are other tests for extract_tar_under
rg "extract_tar_under" crates/lib/src/core/db/merkle_node/file_backend.rs -B2 -A2Repository: Oxen-AI/Oxen
Length of output: 608
🏁 Script executed:
# Look for test cases with absolute paths in the tar module
rg "test.*tar.*absolute|absolute.*tar.*test" crates/Repository: Oxen-AI/Oxen
Length of output: 38
🏁 Script executed:
# Check if tar crate version 0.4.44 has any security considerations
cat Cargo.toml | grep -A20 "\[workspace.dependencies\]" | grep -A2 "tar"Repository: Oxen-AI/Oxen
Length of output: 266
🏁 Script executed:
# Look for any CVE or security issues related to absolute paths in tar unpacking
rg "absolute.*path|is_absolute" crates/lib/src/core/db/merkle_node/file_backend.rsRepository: Oxen-AI/Oxen
Length of output: 38
🏁 Script executed:
# Search for test with absolute paths or verify the fix suggestion
rg "RootDir|Prefix\|Component" crates/lib/src/Repository: Oxen-AI/Oxen
Length of output: 364
🏁 Script executed:
# See the context of the RootDir check in util/fs.rs
sed -n '/if components.next() == Some/,+5p' crates/lib/src/util/fs.rsRepository: Oxen-AI/Oxen
Length of output: 199
🏁 Script executed:
# See the context of RootDir | Prefix check in fs.rs
rg "Component::RootDir \| Component::Prefix" crates/lib/src/util/fs.rs -B3 -A3Repository: Oxen-AI/Oxen
Length of output: 363
🏁 Script executed:
# Check the workspaces/files.rs context
rg "ParentDir \| Component::Prefix.*RootDir" crates/lib/src/core/v_latest/workspaces/files.rs -B3 -A3Repository: Oxen-AI/Oxen
Length of output: 442
🏁 Script executed:
# Check if there are any GitHub issues or discussions about tar with absolute paths
# Let's look at the imports and see if there's any normalization happening
sed -n '1,50p' crates/lib/src/core/db/merkle_node/file_backend.rs | grep -E "use|import"Repository: Oxen-AI/Oxen
Length of output: 802
🏁 Script executed:
# Let's see the full file.path()?.into_owned() context
sed -n '355,385p' crates/lib/src/core/db/merkle_node/file_backend.rsRepository: Oxen-AI/Oxen
Length of output: 1527
🏁 Script executed:
# Let's create a simple test to see if tar crate allows absolute paths
cat > /tmp/test_tar_absolute.rs << 'EOF'
use tar::Builder;
use std::io::Cursor;
fn main() {
let mut buf = Vec::new();
{
let mut tar = Builder::new(&mut buf);
let mut header = tar::Header::new_old();
header.set_size(0);
header.set_entry_type(tar::EntryType::Regular);
// Try setting an absolute path
let result = header.set_path("/etc/passwd");
println!("set_path with absolute: {:?}", result);
// Try raw bytes approach (like the test does)
let name_bytes = b"/absolute/path";
let old = header.as_old_mut();
old.name[..name_bytes.len()].copy_from_slice(name_bytes);
// Check what Entry::path would see
println!("Raw header name: {:?}", std::str::from_utf8(&old.name));
}
}
EOF
# Note: We can't run this without rustc, but we can reason about itRepository: Oxen-AI/Oxen
Length of output: 38
🏁 Script executed:
# Let's look at what the tar crate documentation says - check if there are any cargo docs
rg "absolute.*path|set_path" crates/lib/Cargo.tomlRepository: Oxen-AI/Oxen
Length of output: 38
🏁 Script executed:
# Let's check the tar crate version and look for any path normalization
# by examining how paths are handled in existing code
rg "file\.path\(\)" crates/lib/src/ -B2 -A2 | head -50Repository: Oxen-AI/Oxen
Length of output: 1786
Add guards for absolute and rooted tar paths.
The current guard only checks for .. components. A tar entry with an absolute path (e.g., /etc/passwd) or Windows drive prefix will cause PathBuf::join to discard the oxen_hidden prefix entirely, writing the file outside the repository. The unpack_async_tar_archive function uses a safer component-iteration approach that skips RootDir and Prefix components; extract_tar_under should do the same. Add checks for path.is_absolute() and Component::RootDir | Component::Prefix(_) before constructing dst_path.
Suggested fix
- if path.components().any(|c| matches!(c, Component::ParentDir)) {
+ if path.is_absolute()
+ || path.components().any(|c| {
+ matches!(
+ c,
+ Component::ParentDir | Component::RootDir | Component::Prefix(_)
+ )
+ })
+ {
return Err(MerkleDbError::PathTraversal(path.display().to_string()));
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/lib/src/core/db/merkle_node/file_backend.rs` around lines 360 - 377,
The code only checks for ParentDir components and misses absolute/Windows-rooted
tar paths which can discard the oxen_hidden prefix when joined; in
extract_tar_under (and similar to unpack_async_tar_archive) add a guard that
rejects path.is_absolute() and any component matching Component::RootDir or
Component::Prefix(_) (e.g., via path.components().any(|c| matches!(c,
Component::RootDir | Component::Prefix(_)))) and return a
MerkleDbError::PathTraversal (or existing appropriate error) before constructing
dst_path so absolute or rooted entries are refused.
b38fcd2 to
c5c8270
Compare
d9c8c29 to
3e16ed9
Compare
c5c8270 to
f12af68
Compare
3e16ed9 to
5009a3a
Compare
c65ffd7 to
5813fbf
Compare
|
NOTE: Stacked PR! Must merge #514 before merging. |
5813fbf to
11eacec
Compare
5009a3a to
484843a
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (3)
crates/lib/src/core/db/merkle_node/file_backend.rs (3)
352-357:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPropagate unreadable tar entries instead of continuing.
Continuing here lets a malformed archive partially install files and still return
Ok(...). Convert the entry error intoMerkleDbError::CannotReadMerkleand abort the unpack.Suggested fix
- let Ok(mut file) = entry else { - log::error!("Could not unpack file in merkle tar archive"); - // TODO: raise this error to the caller instead!? - continue; - }; + let mut file = entry.map_err(MerkleDbError::CannotReadMerkle)?;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/lib/src/core/db/merkle_node/file_backend.rs` around lines 352 - 357, The loop over tar archive entries in file_backend.rs currently swallows entry unpack errors and continues; change this to propagate the failure by converting the entry unpack error into MerkleDbError::CannotReadMerkle and returning Err(...) so unpack aborts instead of partially installing files. Locate the for entry in entries loop where it currently does let Ok(mut file) = entry else { log::error!(...); continue; } and replace the continue path with constructing and returning Err(MerkleDbError::CannotReadMerkle(...)) (include the original entry error message/context) so the caller receives the failure immediately.
358-376:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winReject absolute and rooted tar paths before joining them.
The current guard only rejects
... Absolute paths and Windows-rooted/prefixed paths can still causejointo discardoxen_hidden, which lets an archive write outside the repo.Suggested fix
- if path.components().any(|c| matches!(c, Component::ParentDir)) { + if path.is_absolute() + || path.components().any(|c| { + matches!( + c, + Component::ParentDir | Component::RootDir | Component::Prefix(_) + ) + }) + { return Err(MerkleDbError::PathTraversal(path.display().to_string())); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/lib/src/core/db/merkle_node/file_backend.rs` around lines 358 - 376, The path guard only rejects ParentDir but still allows absolute or platform-rooted paths which can make oxen_hidden.join(...) discard the prefix; update the validation around file.path() to also reject paths that are absolute or contain Component::RootDir or Component::Prefix (or use path.is_absolute()) and return MerkleDbError::PathTraversal with the offending path string; keep the existing checks for entry_type and then safely build dst_path using the now-ensured relative path (variables to edit: the `path` from file.path(), the guard that returns MerkleDbError::PathTraversal, and the dst_path construction that uses `tree_nodes_prefix` and `oxen_hidden`).
372-392:⚠️ Potential issue | 🟠 Major | ⚡ Quick winParse and validate the entry path before any write or skip decision.
extract_hash_from_entry_path()runs too late here. That means invalid tar structures can create directories/files before you returnInvalidTarStructure, andUnpackOptions::SkipExistingdrops hashes from the returned set even though the unpack contract says skipped entries should still be reported.Suggested fix
let dst_path = if path.starts_with(&tree_nodes_prefix) { oxen_hidden.join(&path) } else { oxen_hidden.join(&tree_nodes_prefix).join(&path) }; + let parsed_hash = extract_hash_from_entry_path(&dst_path, oxen_hidden)?; + if let Some(parent) = dst_path.parent() { std::fs::create_dir_all(parent)?; } if dst_path.exists() && !overwrite_existing { log::info!("Node already exists at {dst_path:?}, skipping"); + if let Some(hash) = parsed_hash { + hashes.insert(hash); + } continue; } file.unpack(&dst_path)?; - if let Some(hash) = extract_hash_from_entry_path(&dst_path, oxen_hidden)? { + if let Some(hash) = parsed_hash { hashes.insert(hash); } else { log::warn!(🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/lib/src/core/db/merkle_node/file_backend.rs` around lines 372 - 392, Call extract_hash_from_entry_path(...) and validate the entry path before performing any filesystem side-effects (creating dirs or calling file.unpack) so malformed tar entries cannot create files/dirs before we return InvalidTarStructure; specifically move the extract_hash_from_entry_path(&dst_path, oxen_hidden)? call to precede std::fs::create_dir_all(parent)? and file.unpack(&dst_path)?, and when overwrite_existing is false and dst_path.exists() do not continue without recording the extracted hash into the unpack result set (UnpackOptions::SkipExisting must still report skipped-entry hashes) — update the logic around dst_path, overwrite_existing, and the returned hashes to always extract/record the hash before skipping or writing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/lib/src/core/db/merkle_node/file_backend.rs`:
- Around line 197-529: The impls fail because FileBackend lacks the &'repo
LocalRepository reference and uses a non-existent self.repo; add a repo: &'repo
LocalRepository field to the FileBackend struct (making it FileBackend<'repo>)
and update its constructor/creators to populate that field (you can keep
repo_path if still needed or remove it), then ensure methods referencing
self.repo (write_hashes_tar, write_all_tar, extract_tar_under usage in
MerklePacker::pack_nodes and MerkleUnpacker::unpack) compile against the new
field; alternatively, if you prefer not to store the repo, refactor those
callers (pack_nodes, pack_all, unpack and the helper functions) to accept &Path
and an is_vfs: bool instead of using self.repo.
- Around line 519-523: The VFS branch bypasses UnpackOptions::SkipExisting by
extracting into a TempDir then unconditionally calling util::fs::copy_dir_all,
which overwrites existing files; fix it by making the VFS path respect
overwrite_existing: either call extract_tar_under(reader, &oxen_hidden,
overwrite_existing) directly when self.repo.is_vfs(), or (if a TempDir is
required) replace the unconditional util::fs::copy_dir_all(tmp.path(),
&oxen_hidden) with a copy routine that checks overwrite_existing (the same logic
used by extract_tar_under) and skips existing files when overwrite_existing is
false; update the code around self.repo.is_vfs(), TempDir::new(),
extract_tar_under, and util::fs::copy_dir_all to apply this change.
---
Duplicate comments:
In `@crates/lib/src/core/db/merkle_node/file_backend.rs`:
- Around line 352-357: The loop over tar archive entries in file_backend.rs
currently swallows entry unpack errors and continues; change this to propagate
the failure by converting the entry unpack error into
MerkleDbError::CannotReadMerkle and returning Err(...) so unpack aborts instead
of partially installing files. Locate the for entry in entries loop where it
currently does let Ok(mut file) = entry else { log::error!(...); continue; } and
replace the continue path with constructing and returning
Err(MerkleDbError::CannotReadMerkle(...)) (include the original entry error
message/context) so the caller receives the failure immediately.
- Around line 358-376: The path guard only rejects ParentDir but still allows
absolute or platform-rooted paths which can make oxen_hidden.join(...) discard
the prefix; update the validation around file.path() to also reject paths that
are absolute or contain Component::RootDir or Component::Prefix (or use
path.is_absolute()) and return MerkleDbError::PathTraversal with the offending
path string; keep the existing checks for entry_type and then safely build
dst_path using the now-ensured relative path (variables to edit: the `path` from
file.path(), the guard that returns MerkleDbError::PathTraversal, and the
dst_path construction that uses `tree_nodes_prefix` and `oxen_hidden`).
- Around line 372-392: Call extract_hash_from_entry_path(...) and validate the
entry path before performing any filesystem side-effects (creating dirs or
calling file.unpack) so malformed tar entries cannot create files/dirs before we
return InvalidTarStructure; specifically move the
extract_hash_from_entry_path(&dst_path, oxen_hidden)? call to precede
std::fs::create_dir_all(parent)? and file.unpack(&dst_path)?, and when
overwrite_existing is false and dst_path.exists() do not continue without
recording the extracted hash into the unpack result set
(UnpackOptions::SkipExisting must still report skipped-entry hashes) — update
the logic around dst_path, overwrite_existing, and the returned hashes to always
extract/record the hash before skipping or writing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3c40c3a9-3131-457a-a28f-2563c95f5a61
📒 Files selected for processing (3)
Cargo.tomlcrates/lib/src/core/db/merkle_node/file_backend.rscrates/lib/src/core/db/merkle_node/merkle_node_db.rs
✅ Files skipped from review due to trivial changes (2)
- Cargo.toml
- crates/lib/src/core/db/merkle_node/merkle_node_db.rs
11eacec to
d4d0762
Compare
484843a to
d524e90
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/lib/src/core/db/merkle_node/file_backend.rs (1)
1757-1790: 💤 Low valueTest only exercises VFS with
Overwrite, notSkipExisting.This test verifies the VFS staging branch works, but uses
UnpackOptions::Overwrite. It doesn't test theSkipExistingpolicy which is bypassed in the VFS code path (see earlier comment on lines 519-523). Consider adding a test that pre-populates the target, unpacks withSkipExisting, and asserts the pre-existing files weren't overwritten.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/lib/src/core/db/merkle_node/file_backend.rs` around lines 1757 - 1790, The current test test_unpack_via_vfs_branch only exercises the VFS branch with UnpackOptions::Overwrite; add a new async test (e.g., test_unpack_via_vfs_skip_existing) that follows the same setup (pack_all into a buffer, init a temp repo, clone.set_vfs(Some(true))) but before calling FileBackend::unpack use the clone to pre-populate one or more hashes/files (create files or call FileBackend::new(&clone).unpack or otherwise write known content), then call FileBackend::new(&clone).unpack(&mut &packed[..], UnpackOptions::SkipExisting) and assert the pre-populated items were not changed (by reading content or checking timestamps) and that other items were installed; reference FileBackend::pack_all, FileBackend::unpack, UnpackOptions::SkipExisting, FileBackend::new, exists and clone.set_vfs to locate the relevant code.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@crates/lib/src/core/db/merkle_node/file_backend.rs`:
- Around line 1757-1790: The current test test_unpack_via_vfs_branch only
exercises the VFS branch with UnpackOptions::Overwrite; add a new async test
(e.g., test_unpack_via_vfs_skip_existing) that follows the same setup (pack_all
into a buffer, init a temp repo, clone.set_vfs(Some(true))) but before calling
FileBackend::unpack use the clone to pre-populate one or more hashes/files
(create files or call FileBackend::new(&clone).unpack or otherwise write known
content), then call FileBackend::new(&clone).unpack(&mut &packed[..],
UnpackOptions::SkipExisting) and assert the pre-populated items were not changed
(by reading content or checking timestamps) and that other items were installed;
reference FileBackend::pack_all, FileBackend::unpack,
UnpackOptions::SkipExisting, FileBackend::new, exists and clone.set_vfs to
locate the relevant code.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e4ca3faf-c1d7-4bcd-a447-6e5af9082d90
📒 Files selected for processing (3)
Cargo.tomlcrates/lib/src/core/db/merkle_node/file_backend.rscrates/lib/src/core/db/merkle_node/merkle_node_db.rs
✅ Files skipped from review due to trivial changes (2)
- Cargo.toml
- crates/lib/src/core/db/merkle_node/merkle_node_db.rs
11e30a5 to
5a896ca
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (3)
crates/lib/src/core/db/merkle_node/file_backend.rs (3)
373-375:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winAdd guards for absolute and rooted tar paths.
The path-traversal guard only checks for
Component::ParentDir. A tar entry with an absolute path (e.g.,/etc/passwd) or a Windows drive prefix will causePathBuf::jointo discard theoxen_hiddenprefix entirely, writing files outside the repository. This matches the check inutil::fs::unpack_async_tar_archivewhich also guards againstRootDirandPrefixcomponents.Suggested fix
- if path.components().any(|c| matches!(c, Component::ParentDir)) { + if path.is_absolute() + || path.components().any(|c| { + matches!( + c, + Component::ParentDir | Component::RootDir | Component::Prefix(_) + ) + }) + { return Err(MerkleDbError::PathTraversal(path.display().to_string())); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/lib/src/core/db/merkle_node/file_backend.rs` around lines 373 - 375, The path-traversal check in file_backend.rs only rejects Component::ParentDir but must also reject absolute/rooted tar paths; update the guard in the function that inspects `path.components()` (the block that currently returns `MerkleDbError::PathTraversal(path.display().to_string())`) to also match Component::RootDir and Component::Prefix (i.e., any(|c| matches!(c, Component::ParentDir | Component::RootDir | Component::Prefix))). Use the same MerkleDbError::PathTraversal error with the path display as before; mirror the behavior of util::fs::unpack_async_tar_archive when validating entries.
533-541:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftVFS staging path bypasses
SkipExistingsemantics.When
is_vfsis true, extraction happens into a fresh temp directory where theoverwrite_existingcheck insideextract_tar_underis always a no-op (nothing exists yet). Thenutil::fs::copy_dir_allperforms an unconditional recursive copy that overwrites any colliding files inoxen_hidden, ignoring theUnpackOptions::SkipExistingparameter entirely.The non-VFS path at line 540 correctly respects
UnpackOptions::SkipExistingbecause it invokesextract_tar_underdirectly on the real destination.Possible approaches
Direct extraction for VFS: If the VFS can handle small file streams, extract directly like the non-VFS path.
Copy with skip logic: Replace
copy_dir_allwith a variant that checksoverwrite_existingand skips existing destination files when false.Post-copy cleanup: After extracting to temp, iterate and only copy files that don't exist at the destination (if
!overwrite_existing).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/lib/src/core/db/merkle_node/file_backend.rs` around lines 533 - 541, The VFS branch bypasses UnpackOptions::SkipExisting by extracting into a TempDir then unconditionally calling util::fs::copy_dir_all; change this so that when self.is_vfs and overwrite_existing is false you do a conditional copy instead of copy_dir_all: after extract_tar_under(reader, tmp.path(), overwrite_existing) succeeds, if overwrite_existing is true keep util::fs::copy_dir_all(tmp.path(), &oxen_hidden) but if false iterate the entries under tmp.path() and for each entry compute the corresponding destination under oxen_hidden and only copy the entry when the destination does not already exist (preserving file/dir semantics, symlinks, and metadata); this keeps extract_tar_under, TempDir::new, is_vfs, and oxen_hidden intact while enforcing UnpackOptions::SkipExisting.
365-370:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPropagate tar entry errors instead of continuing.
The
continueon entry errors allows a malformed archive to partially install files and still returnOk(...). The TODO comment indicates this was recognized as problematic. The suggested fix in the past review was to useentry.map_err(MerkleDbError::CannotReadMerkle)?to fail immediately.Suggested fix
- let Ok(mut file) = entry else { - log::error!("Could not unpack file in merkle tar archive"); - // TODO: raise this error to the caller instead!? - continue; - }; + let mut file = entry.map_err(MerkleDbError::CannotReadMerkle)?;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/lib/src/core/db/merkle_node/file_backend.rs` around lines 365 - 370, Replace the current non-failing unpack logic that logs and continues when a tar entry is Err (the `let Ok(mut file) = entry else { log::error!(...); continue; }` pattern) with immediate error propagation by mapping the tar entry error into MerkleDbError::CannotReadMerkle and using the `?` operator (e.g., `let mut file = entry.map_err(MerkleDbError::CannotReadMerkle)?;`) so the function fails fast instead of partially installing files.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@crates/lib/src/core/db/merkle_node/file_backend.rs`:
- Around line 373-375: The path-traversal check in file_backend.rs only rejects
Component::ParentDir but must also reject absolute/rooted tar paths; update the
guard in the function that inspects `path.components()` (the block that
currently returns `MerkleDbError::PathTraversal(path.display().to_string())`) to
also match Component::RootDir and Component::Prefix (i.e., any(|c| matches!(c,
Component::ParentDir | Component::RootDir | Component::Prefix))). Use the same
MerkleDbError::PathTraversal error with the path display as before; mirror the
behavior of util::fs::unpack_async_tar_archive when validating entries.
- Around line 533-541: The VFS branch bypasses UnpackOptions::SkipExisting by
extracting into a TempDir then unconditionally calling util::fs::copy_dir_all;
change this so that when self.is_vfs and overwrite_existing is false you do a
conditional copy instead of copy_dir_all: after extract_tar_under(reader,
tmp.path(), overwrite_existing) succeeds, if overwrite_existing is true keep
util::fs::copy_dir_all(tmp.path(), &oxen_hidden) but if false iterate the
entries under tmp.path() and for each entry compute the corresponding
destination under oxen_hidden and only copy the entry when the destination does
not already exist (preserving file/dir semantics, symlinks, and metadata); this
keeps extract_tar_under, TempDir::new, is_vfs, and oxen_hidden intact while
enforcing UnpackOptions::SkipExisting.
- Around line 365-370: Replace the current non-failing unpack logic that logs
and continues when a tar entry is Err (the `let Ok(mut file) = entry else {
log::error!(...); continue; }` pattern) with immediate error propagation by
mapping the tar entry error into MerkleDbError::CannotReadMerkle and using the
`?` operator (e.g., `let mut file =
entry.map_err(MerkleDbError::CannotReadMerkle)?;`) so the function fails fast
instead of partially installing files.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3e00cbca-b911-4883-be41-02e5c929e3dd
📒 Files selected for processing (3)
Cargo.tomlcrates/lib/src/core/db/merkle_node/file_backend.rscrates/lib/src/core/db/merkle_node/merkle_node_db.rs
✅ Files skipped from review due to trivial changes (2)
- Cargo.toml
- crates/lib/src/core/db/merkle_node/merkle_node_db.rs
98cb275 to
120f270
Compare
d4d0762 to
234e31d
Compare
120f270 to
c28bcd5
Compare
234e31d to
029e820
Compare
c28bcd5 to
35659ea
Compare
2bd348f to
cbf9b0a
Compare
35659ea to
a82b35d
Compare
cbf9b0a to
df8d6fb
Compare
a82b35d to
b12893f
Compare
df8d6fb to
f7950a6
Compare
**`MerklePacker` & `MerkleUnpacker`**
Implemented the new `MerkleTransport` traits for packing up
and unpacking Merkle tree nodes as they're sent on the wire
from the oxen client to server. Preserves the existing tar-gz
formats specific to the FileBackend's physical store layout.
Supports the two unique tar-gz paths:
- upload: only captures `{prefix}/{suffix}/{node,children}`
- everything else: `tree/nodes/{prefix}/{suffix}/{node,children}`
The `PackOptions` enum controls this specific behavior.
**Progress Bar in Bytes**
Changes the progress bar for uploads & downloads to use bytes/s
instead of nodes/s. Nodes are not evenly sized, so progress will
look like it stops when handling a large file. This now hooks
into the reader & writer types that the trait uses.
**Errors**
`MerkleDbError` gains transport-side variants for tar-archive parsing
(`CannotReadMerkle`, `UnsupportedTarEntry`, `PathTraversal`,
`InvalidTarStructure`, `InvalidNodeIdHex`, `MissingNodeDir`,
`MissingTreeNodesDir`, `FsTransport`) plus a corresponding `tokio-util`
features bump (`io`, `io-util`) for async I/O bridging.
**Testing**
Tests cover round-trips, path-traversal rejection, malformed entries,
empty tarballs, and on-disk byte parity with the existing `compress_*` /
`unpack_async_tar_archive` paths.
b12893f to
ac7ffb3
Compare
MerklePacker&MerkleUnpackerImplemented the new
MerkleTransporttraits for packing upand unpacking Merkle tree nodes as they're sent on the wire
from the oxen client to server. Preserves the existing tar-gz
formats specific to the FileBackend's physical store layout.
Supports the two unique tar-gz paths:
{prefix}/{suffix}/{node,children}tree/nodes/{prefix}/{suffix}/{node,children}The
PackOptionsenum controls this specific behavior.Progress Bar in Bytes
Changes the progress bar for uploads & downloads to use bytes/s
instead of nodes/s. Nodes are not evenly sized, so progress will
look like it stops when handling a large file. This now hooks
into the reader & writer types that the trait uses.
Errors
MerkleDbErrorgains transport-side variants for tar-archive parsing(
CannotReadMerkle,UnsupportedTarEntry,PathTraversal,InvalidTarStructure,InvalidNodeIdHex,MissingNodeDir,MissingTreeNodesDir,FsTransport) plus a correspondingtokio-utilfeatures bump (
io,io-util) for async I/O bridging.Testing
Tests cover round-trips, path-traversal rejection, malformed entries,
empty tarballs, and on-disk byte parity with the existing
compress_*/unpack_async_tar_archivepaths.