MerkleDbNode error definition + HexHash, move node_db_prefix#470
MerkleDbNode error definition + HexHash, move node_db_prefix#470malcolmgreaves merged 5 commits intomainfrom
HexHash, move node_db_prefix#470Conversation
Removed many specific uses of `OxenError::basic_str` in the `MerkleNodeDB` implementation. Replaced with error-message equivalent variants in the new `MerkleDbError` enum. Added a `#[from]` wrapper for this in `OxenError`.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughIntroduce a typed Changes
Sequence Diagram(s)(Skipped — changes are API/type and helper refactors without a new multi-component sequential flow.) Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/lib/src/core/db/merkle_node/merkle_node_db.rs (1)
485-522: Consider removing or updating commented-out code.The commented-out
getmethod referencesMerkleError(lines 491, 495) which doesn't exist - likely a stale reference from before theMerkleDbErrorrename. If this code is no longer needed, consider removing it to reduce confusion. If it's intended for future use, update the error type references.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/lib/src/core/db/merkle_node/merkle_node_db.rs` around lines 485 - 522, The commented-out pub fn get<D>(&self, hash: u128) block contains stale error references (MerkleError) and should either be removed or updated: if you intend to keep it, change MerkleError::ReadBeforeOpen and MerkleError::WriteBeforeOpen to the current error type (e.g., MerkleDbError or the appropriate enum), ensure the Err(...) return types match Result<D, OxenError> semantics, and verify lookup and children_file handling aligns with current field names; otherwise delete the entire commented block to avoid confusion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@crates/lib/src/core/db/merkle_node/merkle_node_db.rs`:
- Around line 485-522: The commented-out pub fn get<D>(&self, hash: u128) block
contains stale error references (MerkleError) and should either be removed or
updated: if you intend to keep it, change MerkleError::ReadBeforeOpen and
MerkleError::WriteBeforeOpen to the current error type (e.g., MerkleDbError or
the appropriate enum), ensure the Err(...) return types match Result<D,
OxenError> semantics, and verify lookup and children_file handling aligns with
current field names; otherwise delete the entire commented block to avoid
confusion.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 024ff882-a1b7-49ba-a8e6-4a23180a6b7a
📒 Files selected for processing (4)
crates/lib/src/core/db/merkle_node/merkle_node_db.rscrates/lib/src/error.rscrates/lib/src/model/merkle_tree/merkle_hash.rscrates/lib/src/repositories/commits/commit_writer.rs
CleanCut
left a comment
There was a problem hiding this comment.
There's a few obvious typos that a quick cargo check will let you fix. The pre-commit hooks must not have run.
| let as_hex = HexHash::new(hash); | ||
| let hash_str = as_hex.as_str(); |
There was a problem hiding this comment.
A couple things you could do for ergonomics if you wanted to:
- Sometimes it is difficult to remember what separate type/function to use to convert something, but easy to look for conversion methods on the type you already have. We could add a method like this to
MerkleHash:
pub fn to_hex_hash(&self) -> HexHash {
HexHash(format!("{self}"))
}- If you make a reference to
HexHashact like a&str, then you don't need a method that does an explicit conversion anymore:
impl Deref for HexHash {
type Target = str;
fn deref(&self) -> &Self::Target {
&self.0
}
}If you do both above, then this code becomes:
| let as_hex = HexHash::new(hash); | |
| let hash_str = as_hex.as_str(); | |
| let hash_str = hash.to_hex_hash(); |
and then you could remove all of this:
impl HexHash {
#[inline(always)]
pub fn new(hash: &MerkleHash) -> Self {
Self(format!("{hash}"))
}
pub(crate) fn as_str(&self) -> &str {
&self.0
}
}There was a problem hiding this comment.
I like to_hex_hash for MekleHash -- I'll add that one.
For &str, I want to be careful with this. In general, the intention is to not treat a HexHash as a string. I want there to be a little bit of friction so that a string can't be confused for a hexidecimal formatted MerkleHash value. With that context, I think having a Deref implementation for HexHash will make it too easy to break the type safety of the HexHash wrapper.
I added the as_str to make this prefix-suffix split function avoid two String allocations. 🤔 I could instead move this logic into HexHash. That way I can avoid having as_str all together and have the function return the 2-layered nested directories directly. I think this makes sense since this directory structure is a primary use case of hex-formatting the hash value.
HexHashHexHash, move node_db_prefix
c8a8da1 to
7595dc1
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/lib/src/core/db/merkle_node/merkle_node_db.rs (2)
439-468:⚠️ Potential issue | 🟡 MinorKeep
num_children()in sync while appending children.
MerkleNodeDB::num_children()falls back toself.num_childrenin write mode, butadd_child()only advancesdata_offset. That leavesnum_children()stuck at0until the DB is reopened and reloaded.💡 Proposed fix
children_file.write_all(&buf)?; self.data_offset += data_len; + self.num_children += 1; Ok(())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/lib/src/core/db/merkle_node/merkle_node_db.rs` around lines 439 - 468, The add_child method currently updates only self.data_offset leaving MerkleNodeDB::num_children() returning stale 0 in write mode; after the writes succeed in add_child (in function add_child<N: TMerkleTreeNode>), increment self.num_children += 1 (do this after all write_all calls complete and before returning) so the in-memory counter stays in sync with appended children; reference the add_child function and the self.num_children and self.data_offset fields.
391-430:⚠️ Potential issue | 🟠 MajorMirror the serialized node bytes into writer state.
MerkleNodeDB::data()andMerkleNodeDB::node()read fromself.datawheneverlookupisNone, butwrite_node()never updates that buffer. Afteropen_read_write(),data()stays empty andnode()will try to decode an empty payload until the DB is reopened from disk.💡 Proposed fix
// Write data node_file.write_all(&buf)?; + self.data = buf; self.dtype = node.node_type(); self.node_id = node.hash(); self.parent_id = parent_id;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/lib/src/core/db/merkle_node/merkle_node_db.rs` around lines 391 - 430, write_node currently writes the serialized node to disk but never updates the in-memory mirror, so subsequent calls to MerkleNodeDB::data()/::node() see an empty buffer; fix by assigning the serialized bytes into the DB state after writing: set self.data = buf (or a copy of buf), set self.data_len = data_len as u32 (or appropriate type), and ensure dtype/node_id/parent_id remain updated; update this in the write_node method of MerkleNodeDB so the in-memory fields (self.data, self.data_len, self.dtype, self.node_id, self.parent_id) reflect the just-written node.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/lib/src/core/db/merkle_node/merkle_node_db.rs`:
- Around line 73-75: HexHash::node_db_prefix() currently slices hash_str[0..3]
unsafely which panics for short hex strings (e.g., MerkleHash::new(0)); update
HexHash::node_db_prefix (and/or the MerkleHash Display formatting) to ensure the
hex string is padded to a fixed minimum width before slicing (for example use
zero-padding like {:032x} for the underlying integer) or check length and return
a shorter-safe prefix when the string is smaller; then rebuild call sites such
as node_db_path (function node_db_path), tree (tree module call sites),
node_sync_status, commit_sync_status, and api::client::tree to rely on the
corrected node_db_prefix so no out-of-bounds slicing occurs.
---
Outside diff comments:
In `@crates/lib/src/core/db/merkle_node/merkle_node_db.rs`:
- Around line 439-468: The add_child method currently updates only
self.data_offset leaving MerkleNodeDB::num_children() returning stale 0 in write
mode; after the writes succeed in add_child (in function add_child<N:
TMerkleTreeNode>), increment self.num_children += 1 (do this after all write_all
calls complete and before returning) so the in-memory counter stays in sync with
appended children; reference the add_child function and the self.num_children
and self.data_offset fields.
- Around line 391-430: write_node currently writes the serialized node to disk
but never updates the in-memory mirror, so subsequent calls to
MerkleNodeDB::data()/::node() see an empty buffer; fix by assigning the
serialized bytes into the DB state after writing: set self.data = buf (or a copy
of buf), set self.data_len = data_len as u32 (or appropriate type), and ensure
dtype/node_id/parent_id remain updated; update this in the write_node method of
MerkleNodeDB so the in-memory fields (self.data, self.data_len, self.dtype,
self.node_id, self.parent_id) reflect the just-written node.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a69793b7-d89e-47ea-b588-75fef40199e6
📒 Files selected for processing (6)
crates/lib/src/api/client/tree.rscrates/lib/src/core/commit_sync_status.rscrates/lib/src/core/db/merkle_node/merkle_node_db.rscrates/lib/src/core/node_sync_status.rscrates/lib/src/model/merkle_tree/merkle_hash.rscrates/lib/src/repositories/tree.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/lib/src/model/merkle_tree/merkle_hash.rs
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
crates/lib/src/model/merkle_tree/merkle_hash.rs (1)
89-94: Consider documenting the panic invariant.The
expect()here is acceptable sinceHexHashcan only be constructed from a validMerkleHash, making the invariant always hold. A brief doc comment would help future maintainers understand why this is safe.📝 Suggested documentation
+/// Converts a `HexHash` back to the original `MerkleHash`. +/// +/// # Panics +/// This conversion cannot fail in practice because `HexHash` can only be +/// constructed from a valid `MerkleHash` via `to_hex_hash()` or `From` impls. impl From<HexHash> for MerkleHash { fn from(value: HexHash) -> Self { Self::from_str(&value.0) .expect("Invariant violation: HexHash was not constructed from a valid MerkleHash!") } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/lib/src/model/merkle_tree/merkle_hash.rs` around lines 89 - 94, Add a brief doc comment above the impl From<HexHash> for MerkleHash (or above the from function) explaining the panic invariant: note that HexHash is only constructible from a valid MerkleHash elsewhere in the codebase, so the call to Self::from_str(...).expect(...) cannot fail at runtime; mention that the expect is intentional to document the invariant and reference the constructor that enforces validity. Ensure the comment references the impl From<HexHash> for MerkleHash and the expect usage so future maintainers understand the safety assumption.crates/lib/src/core/db/merkle_node/merkle_node_db.rs (1)
472-509: Stale commented-out code references non-existent type.Lines 478 and 482 reference
MerkleError::ReadBeforeOpenandMerkleError::WriteBeforeOpen, but the enum is namedMerkleDbError. If this code is intended for future use, update the type name; otherwise consider removing this dead code.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/lib/src/core/db/merkle_node/merkle_node_db.rs` around lines 472 - 509, The commented-out get method references a non-existent MerkleError enum; update those references to the actual MerkleDbError (or remove the stale block if unused) so the commented code matches current types. Specifically, in the commented pub fn get<D>(&self, hash: u128) -> Result<D, OxenError> block update MerkleError::ReadBeforeOpen and MerkleError::WriteBeforeOpen to MerkleDbError::ReadBeforeOpen and MerkleDbError::WriteBeforeOpen (or delete the entire commented get method if you don't intend to keep it), and ensure symbols like lookup, children_file, and MerkleNodeDB.get remain consistent with the active codebase.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/lib/src/model/merkle_tree/merkle_hash.rs`:
- Around line 111-112: The HexHash struct is missing the Clone derive which
causes tests calling hex.clone() to fail; update the struct declaration for
HexHash (the type named HexHash) to derive Clone in addition to Debug,
PartialEq, and Eq so instances can be cloned where used in tests and code that
calls clone().
- Line 159: The test loop in merkle_hash.rs uses the array `[0..1000]`, which
produces a single-element array containing a Range and runs the body once;
replace the iterator expression with the range `0..1000` so the for loop
iterates 1000 times (i.e., change the loop header `for _ in [0..1000] {` to use
the range form `for _ in 0..1000 {` in the test/function containing that loop).
---
Nitpick comments:
In `@crates/lib/src/core/db/merkle_node/merkle_node_db.rs`:
- Around line 472-509: The commented-out get method references a non-existent
MerkleError enum; update those references to the actual MerkleDbError (or remove
the stale block if unused) so the commented code matches current types.
Specifically, in the commented pub fn get<D>(&self, hash: u128) -> Result<D,
OxenError> block update MerkleError::ReadBeforeOpen and
MerkleError::WriteBeforeOpen to MerkleDbError::ReadBeforeOpen and
MerkleDbError::WriteBeforeOpen (or delete the entire commented get method if you
don't intend to keep it), and ensure symbols like lookup, children_file, and
MerkleNodeDB.get remain consistent with the active codebase.
In `@crates/lib/src/model/merkle_tree/merkle_hash.rs`:
- Around line 89-94: Add a brief doc comment above the impl From<HexHash> for
MerkleHash (or above the from function) explaining the panic invariant: note
that HexHash is only constructible from a valid MerkleHash elsewhere in the
codebase, so the call to Self::from_str(...).expect(...) cannot fail at runtime;
mention that the expect is intentional to document the invariant and reference
the constructor that enforces validity. Ensure the comment references the impl
From<HexHash> for MerkleHash and the expect usage so future maintainers
understand the safety assumption.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 80b81088-9f09-4b3a-8937-b8a460320869
📒 Files selected for processing (6)
crates/lib/src/api/client/tree.rscrates/lib/src/core/commit_sync_status.rscrates/lib/src/core/db/merkle_node/merkle_node_db.rscrates/lib/src/core/node_sync_status.rscrates/lib/src/model/merkle_tree/merkle_hash.rscrates/lib/src/repositories/tree.rs
✅ Files skipped from review due to trivial changes (2)
- crates/lib/src/core/commit_sync_status.rs
- crates/lib/src/repositories/tree.rs
|
|
||
| #[test] | ||
| fn test_hex_hash_conversions_and_node_db_prefix() { | ||
| for _ in [0..1000] { |
There was a problem hiding this comment.
Test loop iterates only once instead of 1000 times.
The expression [0..1000] creates a single-element array containing a Range<i32>, so the loop body executes exactly once. Use 0..1000 directly to iterate 1000 times.
🐛 Proposed fix
- for _ in [0..1000] {
+ for _ in 0..1000 {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for _ in [0..1000] { | |
| for _ in 0..1000 { |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/lib/src/model/merkle_tree/merkle_hash.rs` at line 159, The test loop
in merkle_hash.rs uses the array `[0..1000]`, which produces a single-element
array containing a Range and runs the body once; replace the iterator expression
with the range `0..1000` so the for loop iterates 1000 times (i.e., change the
loop header `for _ in [0..1000] {` to use the range form `for _ in 0..1000 {` in
the test/function containing that loop).
7595dc1 to
dada1b4
Compare
dada1b4 to
3029b09
Compare
Refactors the unique errors for
MerkleDbNode's operations into a new error enum.The new
MerkleDbErrorvariant replacements maintain the same error messages.OxenErrorhas a new#[from]wrapper for this error type.Introduces a new
HexHashtype, which is internal to theliboxencrate. This is a zero-sized wrapper around the
Stringproduced by encoding aMerkleHash's 128-bitunsigned integer into a hexadecimal formatted string. Construction is only possible via
this hex-formatting route: it's impossible to mix-up a regular
Stringand aHexHash.The
node_db_prefixfunction has been moved to become a method ofHexHashnow.This also avoids extra
Stringallocations as it creates the 2-layered nested directoryof
{hash prefix}/{hash suffix}, which is used for both Merkle node & versioned file storage.Also, fixes a bare
.unwrap()inr_create_dir_node. The function now propagates theErrto the caller instead of causing apanic!.