Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions crates/lib/src/model/merkle_tree.rs
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
pub mod merkle_hash;
pub mod merkle_reader;
pub mod merkle_transport;
pub mod merkle_writer;
pub mod node;
pub mod node_type;

pub use crate::model::merkle_tree::merkle_hash::MerkleHash;
pub use crate::model::merkle_tree::merkle_reader::MerkleReader;
pub use crate::model::merkle_tree::merkle_transport::{
MerklePacker, MerkleTransport, MerkleUnpacker, PackOptions, UnpackOptions,
};
pub use crate::model::merkle_tree::merkle_writer::MerkleWriter;
pub use crate::model::merkle_tree::node::merkle_tree_node_cache;
pub use crate::model::merkle_tree::node_type::{
Expand All @@ -25,3 +29,11 @@ pub trait MerkleStore: MerkleReader + MerkleWriter {}
/// automatically a [`MerkleStore`]. The `?Sized` bound lets the marker apply
/// to `dyn MerkleStore` itself.
impl<T: MerkleReader + MerkleWriter + ?Sized> MerkleStore for T {}

/// A [`MerkleStore`] that can also pack and unpack Merkle tree nodes for transport.
pub trait TransportableMerkleStore: MerkleStore + MerkleTransport {}

/// Any type that is a [`MerkleStore`] and a [`MerkleTransport`] is automatically a
/// [`TransportableMerkleStore`]. The `?Sized` bound lets the marker apply to
/// `dyn TransportableMerkleStore` itself.
impl<T: MerkleStore + MerkleTransport + ?Sized> TransportableMerkleStore for T {}
111 changes: 111 additions & 0 deletions crates/lib/src/model/merkle_tree/merkle_transport.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
use std::collections::HashSet;
use std::io::{Read, Write};

use crate::error::OxenError;
use crate::model::MerkleHash;

/// Wire-format selector for [`MerklePacker::pack_nodes`].
///
/// Two on-the-wire tar-gz layouts have coexisted as long as the merkle transport has
/// existed. Each call site must pick the variant that matches the peer it's writing
/// to; the trait makes no claim that a single canonical format exists.
// **No `Default` impl on purpose.** Picking a wire format is a protocol decision and
// must be made explicitly at every call site. **No `#[non_exhaustive]` on purpose.**
// Adding a future variant should be a deliberate breaking change that surfaces at
// every match arm — compile errors are the forcing function.
#[derive(Copy, Clone, Debug, PartialEq, Eq)]
pub enum PackOptions {
/// Entries appear under `tree/nodes/{prefix}/{suffix}/...`. Compressed with
/// [`flate2::Compression::fast`]. Used by all `repositories::tree::compress_*`
/// helpers — the bytes any server download endpoint emits.
///
/// [`flate2::Compression::fast`]: https://docs.rs/flate2/latest/flate2/struct.Compression.html#method.fast
ServerCanonical,
/// Entries appear under `{prefix}/{suffix}/...` with no `tree/nodes/` prefix.
/// Compressed with [`flate2::Compression::default`]. Required by
/// [`api::client::tree::create_nodes`] so older `oxen-server` deployments
/// (which pre-pend `tree/nodes/` server-side at install time) install entries at
/// the right paths.
///
/// [`flate2::Compression::default`]: https://docs.rs/flate2/latest/flate2/struct.Compression.html#method.default
/// [`api::client::tree::create_nodes`]: crate::api::client::tree::create_nodes
LegacyClientPush,
}

/// Per-call extraction policy for [`MerkleUnpacker::unpack`].
///
/// **No `Default` impl on purpose.** The choice between overwriting and skipping is
/// path-dependent and must be made explicitly at every call site.
// **No `#[non_exhaustive]` on purpose** for the same reason as [`PackOptions`].
#[derive(Copy, Clone, Debug, PartialEq, Eq)]
pub enum UnpackOptions {
/// Overwrite files that already exist on disk. Matches `main`'s
/// `util::fs::unpack_async_tar_archive` — the client download path's behaviour.
Overwrite,
/// Skip files that already exist on disk. Matches `main`'s
/// `repositories::tree::unpack_nodes` — the server-side upload-consumer path.
SkipExisting,
}

/// Produce transport-ready bytes from some subset (or all) of the backend's Merkle tree nodes.
///
/// Writes a tar-gz wire stream directly into the caller-provided sink. No buffer is
/// materialized inside the trait, so memory use is O(compressor window). Callers can
/// plug in HTTP response bodies, pipes, files, or in-memory `Vec<u8>` sinks as the writer.
///
/// dyn-compatible: callers can store this as `Box<dyn MerklePacker + '_>` or
/// `&dyn MerklePacker`. Methods take `&mut dyn Write` instead of generic `W: Write`
/// so the trait carries no per-call type parameters.
pub trait MerklePacker: Send + Sync {
/// Pack the given node `hashes` into `out` as a tar-gz stream, in the layout
/// selected by `opts`.
///
/// Hashes not present in the store are silently skipped, and an empty `hashes`
/// produces a valid but empty tarball. See [`PackOptions`] for the per-variant
/// wire-format details.
fn pack_nodes(
&self,
hashes: &HashSet<MerkleHash>,
opts: PackOptions,
out: &mut dyn Write,
) -> Result<(), OxenError>;
Comment on lines +60 to +71
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Specify deterministic packing order for stable wire output.

Because input is a HashSet, iteration order is non-deterministic unless implementations sort before emitting entries. Please codify deterministic ordering in the trait docs to avoid unstable tar-gz bytes across runs/processes.

Suggested doc-level clarification
     /// Pack the given node `hashes` into `out` as a tar-gz stream, in the layout
     /// selected by `opts`.
+    ///
+    /// Ordering contract: output must be deterministic for the same logical input
+    /// set (e.g., sort hashes before emitting) rather than relying on `HashSet`
+    /// iteration order.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// Pack the given node `hashes` into `out` as a tar-gz stream, in the layout
/// selected by `opts`.
///
/// Hashes not present in the store are silently skipped, and an empty `hashes`
/// produces a valid but empty tarball. See [`PackOptions`] for the per-variant
/// wire-format details.
fn pack_nodes(
&self,
hashes: &HashSet<MerkleHash>,
opts: PackOptions,
out: &mut dyn Write,
) -> Result<(), OxenError>;
/// Pack the given node `hashes` into `out` as a tar-gz stream, in the layout
/// selected by `opts`.
///
/// Ordering contract: output must be deterministic for the same logical input
/// set (e.g., sort hashes before emitting) rather than relying on `HashSet`
/// iteration order.
///
/// Hashes not present in the store are silently skipped, and an empty `hashes`
/// produces a valid but empty tarball. See [`PackOptions`] for the per-variant
/// wire-format details.
fn pack_nodes(
&self,
hashes: &HashSet<MerkleHash>,
opts: PackOptions,
out: &mut dyn Write,
) -> Result<(), OxenError>;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/lib/src/model/merkle_tree/merkle_transport.rs` around lines 60 - 71,
The trait doc for pack_nodes must require a deterministic, stable ordering when
emitting entries because hashes is a HashSet; update the doc for fn pack_nodes
(and mention PackOptions) to state implementations must sort the provided
HashSet<MerkleHash> into a defined order (e.g., lexicographic on the MerkleHash
bytes or hex) before writing to out: &mut dyn Write so the tar-gz wire output is
stable across runs; also update any existing implementations to perform that
sort when iterating and packing, and mention OxenError remains the error type
for failures.


/// Pack every node the backend currently holds into `out` as a tar-gz stream.
///
/// Single-format: only the server-canonical layout has ever been emitted for a
/// whole-tree pack on `main`. There is no legacy whole-tree variant, so this
/// method does not accept [`PackOptions`].
fn pack_all(&self, out: &mut dyn Write) -> Result<(), OxenError>;
}

/// Consume transport bytes and install the nodes they contain into the backend.
///
/// Reads the tar-gz wire format incrementally from `reader`. Nothing buffers the full
/// payload inside the trait. Async callers bridge a `Stream<Item = Bytes>` to a sync
/// [`Read`] via [`tokio_util::io::SyncIoBridge`] inside a [`tokio::task::spawn_blocking`].
///
/// dyn-compatible: callers can store this as `Box<dyn MerkleUnpacker + '_>` or
/// `&dyn MerkleUnpacker`. The reader is taken as `&mut dyn Read` for the same
/// reason as [`MerklePacker`]'s `&mut dyn Write` argument.
pub trait MerkleUnpacker: Send + Sync {
/// Unpack the tar-gz stream from `reader` into the store, applying the existing-file
/// policy in `opts`.
///
/// Returns the set of hashes parsed from the tarball (not necessarily only those
/// newly installed — entries skipped per [`UnpackOptions::SkipExisting`] still appear
/// in the result, matching `main`'s `repositories::tree::unpack_nodes` behaviour).
fn unpack(
&self,
reader: &mut dyn Read,
opts: UnpackOptions,
) -> Result<HashSet<MerkleHash>, OxenError>;
Comment on lines +91 to +101
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Define archive-safety requirements in the unpack contract.

unpack installs filesystem-backed content but the trait contract doesn’t require rejecting unsafe tar entries (absolute paths, .. traversal, symlink escapes). Please make this explicit so all implementations enforce the same security baseline.

Suggested doc-level contract addition
 pub trait MerkleUnpacker: Send + Sync {
     /// Unpack the tar-gz stream from `reader` into the store, applying the existing-file
     /// policy in `opts`.
+    ///
+    /// Security contract:
+    /// - Reject entries that resolve outside the intended node root (absolute paths,
+    ///   `..` traversal, or symlink-based escapes).
+    /// - Reject malformed archive entries whose on-wire path cannot be mapped to a
+    ///   valid `MerkleHash`.
     ///
     /// Returns the set of hashes parsed from the tarball (not necessarily only those
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/lib/src/model/merkle_tree/merkle_transport.rs` around lines 91 - 101,
The unpack contract (fn unpack(&self, reader: &mut dyn Read, opts:
UnpackOptions) -> Result<HashSet<MerkleHash>, OxenError>) must explicitly
require archive-safety: update the doc comment to state that implementations
MUST reject or fail on unsafe tar entries (absolute paths, any path containing
.. that would escape the intended extraction root, and symlinks that point
outside the target store) and MUST sanitize relative paths and enforce
extraction into the store root; then update all implementations to validate each
tar header/path, canonicalize and check it remains within the target store root,
and return an Err(OxenError) when encountering an unsafe entry rather than
writing outside the store (reference UnpackOptions, MerkleHash and OxenError in
the doc so callers and implementers know the expected behavior).

}

/// Marker super-trait: a type that can both pack and unpack Merkle tree nodes for transport.
pub trait MerkleTransport: MerklePacker + MerkleUnpacker {}

/// This blanket impl makes any type that implements [`MerklePacker`] and
/// [`MerkleUnpacker`] automatically a [`MerkleTransport`]. The `?Sized` bound lets
/// the marker apply to `dyn MerkleTransport` itself, so the impl works for both
/// concrete backends and trait-object views over them.
impl<T: MerklePacker + MerkleUnpacker + ?Sized> MerkleTransport for T {}
Loading