FileBackend for Merkle{Reader,Writer}#512
FileBackend for Merkle{Reader,Writer}#512malcolmgreaves wants to merge 1 commit intomg/merkle_dyn_interfacesfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new file-backed Merkle node backend with reader/writer sessions and drop-enforced persistence; refactors MerkleNodeDB write APIs to accept trait objects (&dyn TMerkleTreeNode); and adds EMerkleTreeNode::from_type_and_bytes for type-directed deserialization. Changes
Sequence DiagramssequenceDiagram
participant Client
participant FileBackend
participant FileWriteSession
participant FileNodeSession
participant MerkleNodeDB
Client->>FileBackend: begin()
FileBackend->>FileWriteSession: create()
FileWriteSession-->>Client: session
Client->>FileWriteSession: open_node(node)
FileWriteSession->>FileNodeSession: create(repo, node)
FileNodeSession->>MerkleNodeDB: open_read_write(&dyn node)
MerkleNodeDB-->>FileNodeSession: opened
Client->>FileNodeSession: add_child(&dyn child)
FileNodeSession->>MerkleNodeDB: add_child(&dyn child)
MerkleNodeDB-->>FileNodeSession: ok
Client->>FileNodeSession: finish()
FileNodeSession->>MerkleNodeDB: close/flush
MerkleNodeDB-->>FileNodeSession: closed
FileNodeSession-->>FileWriteSession: closed
FileWriteSession-->>Client: finish() (no-op)
sequenceDiagram
participant Client
participant FileBackend
participant MerkleNodeDB
Client->>FileBackend: exists(hash)
FileBackend->>MerkleNodeDB: exists(hash)
MerkleNodeDB-->>FileBackend: true/false
FileBackend-->>Client: bool
Client->>FileBackend: get_node(hash)
FileBackend->>MerkleNodeDB: open_read & read dtype+bytes (if exists)
MerkleNodeDB-->>FileBackend: dtype + bytes
FileBackend->>FileBackend: EMerkleTreeNode::from_type_and_bytes(dtype, bytes)
FileBackend-->>Client: Ok(Some(node)) / Ok(None)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
crates/lib/src/core/db/merkle_node/file_backend.rs (1)
150-156: ⚡ Quick win
Dropimplementation panics on I/O error.Using
expect()inDropwill panic if the underlyingclose()fails (e.g., disk full, permission error). While the testtest_drop_finishes_file_node_sessionvalidates the happy path, production I/O errors would cause an unrecoverable panic.Consider logging the error instead of panicking, since
Dropcannot propagate errors and callers who need error guarantees should callfinish()explicitly.Proposed fix to avoid panic in Drop
impl Drop for FileNodeSession { fn drop(&mut self) { - self.idempotent_finish() - .expect("Did not explicitly call finish() and encountered an error."); + if let Err(e) = self.idempotent_finish() { + log::error!( + "FileNodeSession::drop failed to close node {}: {}", + self.db.node_id, + e + ); + } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/lib/src/core/db/merkle_node/file_backend.rs` around lines 150 - 156, The Drop impl for FileNodeSession currently calls self.idempotent_finish().expect(...), which will panic on I/O errors during drop; change it to call idempotent_finish() and handle the Result without panicking (e.g., match or if let Err(e) = ...) and log the error instead of calling expect so drop never unwinds; reference the Drop for FileNodeSession, the drop() method, idempotent_finish(), and the explicit finish() API (callers who need to observe errors should use finish()) when locating the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/lib/src/core/db/merkle_node/file_backend.rs`:
- Around line 63-70: get_children currently calls MerkleNodeDB::open_read_only
directly and will propagate an IO error for missing hashes; make its behavior
consistent with get_node by first checking self.exists(hash) and returning
Ok(vec![]) (or Ok(None) if you prefer Option semantics and update signature/call
sites) when the hash doesn't exist, otherwise proceed to open_read_only and
map(); reference get_children, get_node, exists(), and
MerkleNodeDB::open_read_only when making the change.
---
Nitpick comments:
In `@crates/lib/src/core/db/merkle_node/file_backend.rs`:
- Around line 150-156: The Drop impl for FileNodeSession currently calls
self.idempotent_finish().expect(...), which will panic on I/O errors during
drop; change it to call idempotent_finish() and handle the Result without
panicking (e.g., match or if let Err(e) = ...) and log the error instead of
calling expect so drop never unwinds; reference the Drop for FileNodeSession,
the drop() method, idempotent_finish(), and the explicit finish() API (callers
who need to observe errors should use finish()) when locating the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7f28e24e-2669-496b-985b-a95c8e91a0ba
📒 Files selected for processing (7)
crates/lib/src/core/db/merkle_node.rscrates/lib/src/core/db/merkle_node/file_backend.rscrates/lib/src/core/db/merkle_node/merkle_node_db.rscrates/lib/src/model/merkle_tree.rscrates/lib/src/model/merkle_tree/merkle_reader.rscrates/lib/src/model/merkle_tree/merkle_writer.rscrates/lib/src/model/merkle_tree/node.rs
7814565 to
f719e13
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/lib/src/core/db/merkle_node/file_backend.rs`:
- Around line 47-55: The code returns db.node_id which is left as zero by
MerkleNodeDB::open_read_only; change the construction of MerkleNodeRecord in
this block to pass the requested hash (the local variable `hash`) instead of
`db.node_id` so the returned record reports the actual requested node id; update
the call to MerkleNodeRecord::new(...) to use `hash` as the first argument.
- Around line 140-155: The current idempotent_finish marks
FileNodeSession::finished before calling MerkleNodeDB::close which suppresses
retries on failure and the Drop impl panics via expect; change idempotent_finish
so it only sets self.finished = true after MerkleNodeDB::close(&mut self.db)
returns Ok (i.e., call close first and on success set finished), and modify the
Drop implementation for FileNodeSession to not panic—call
self.idempotent_finish() and ignore or log errors (e.g., .ok()) instead of using
expect so drop never aborts during unwinding.
- Around line 79-80: The implementation of MerkleWriter::begin uses a mismatched
self lifetime; change the signature in file_backend.rs from fn begin<'a>(&'a
self) -> Result<Box<dyn MerkleWriteSession + 'a>, OxenError> to fn
begin<'a>(&self) -> Result<Box<dyn MerkleWriteSession + 'a>, OxenError> so it
exactly matches the MerkleWriter trait; keep the body that returns
Box::new(FileWriteSession { repo: self.repo }) and ensure FileWriteSession and
the returned Box implement MerkleWriteSession with the correct lifetimes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 540f04b1-733d-4e51-a878-dcceed63a95b
📒 Files selected for processing (4)
crates/lib/src/core/db/merkle_node.rscrates/lib/src/core/db/merkle_node/file_backend.rscrates/lib/src/core/db/merkle_node/merkle_node_db.rscrates/lib/src/model/merkle_tree/node.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- crates/lib/src/core/db/merkle_node.rs
- crates/lib/src/model/merkle_tree/node.rs
f719e13 to
cf56acd
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/lib/src/model/merkle_tree/node.rs (1)
76-92: ⚡ Quick winAdd a round-trip test for each node variant.
This is now the single dtype→decoder dispatch point. A small table-driven test that serializes each concrete node, then feeds its
node_type()and bytes back throughfrom_type_and_bytes, would catch enum/marker drift early.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/lib/src/model/merkle_tree/node.rs` around lines 76 - 92, Add a table-driven round-trip unit test that for each concrete node type (CommitNode, DirNode, FileNode, VNode, FileChunkNode) constructs a representative instance, serializes it to msgpack bytes (using the same serializer used elsewhere), obtains its dtype via the instance's node_type() or equivalent, then calls EMerkleTreeNode::from_type_and_bytes(dtype, &bytes) and asserts the returned EMerkleTreeNode variant matches and equals the original (or its contained struct). Ensure the test covers all variants of the EMerkleTreeNode enum and fails if any dtype→decoder mapping in from_type_and_bytes is incorrect.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@crates/lib/src/model/merkle_tree/node.rs`:
- Around line 76-92: Add a table-driven round-trip unit test that for each
concrete node type (CommitNode, DirNode, FileNode, VNode, FileChunkNode)
constructs a representative instance, serializes it to msgpack bytes (using the
same serializer used elsewhere), obtains its dtype via the instance's
node_type() or equivalent, then calls
EMerkleTreeNode::from_type_and_bytes(dtype, &bytes) and asserts the returned
EMerkleTreeNode variant matches and equals the original (or its contained
struct). Ensure the test covers all variants of the EMerkleTreeNode enum and
fails if any dtype→decoder mapping in from_type_and_bytes is incorrect.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1ff46bee-c91f-45d7-9289-b4c7ece01845
📒 Files selected for processing (4)
crates/lib/src/core/db/merkle_node.rscrates/lib/src/core/db/merkle_node/file_backend.rscrates/lib/src/core/db/merkle_node/merkle_node_db.rscrates/lib/src/model/merkle_tree/node.rs
✅ Files skipped from review due to trivial changes (1)
- crates/lib/src/core/db/merkle_node.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/lib/src/core/db/merkle_node/file_backend.rs
c0a0819 to
a1b9ba7
Compare
e44ba49 to
c3bbb8d
Compare
a1b9ba7 to
9a95bc3
Compare
c3bbb8d to
9b727e2
Compare
5faad04 to
53bc183
Compare
4fcf7bf to
1fd6e68
Compare
53bc183 to
6827c1e
Compare
1fd6e68 to
ee3ce04
Compare
6827c1e to
b0d9aa5
Compare
3454bd6 to
c5ee136
Compare
|
NOTE: Stacked PR. Must merge #510 before merging. |
b0d9aa5 to
16ce636
Compare
c5ee136 to
96f735c
Compare
16ce636 to
5d23577
Compare
96f735c to
814cb8f
Compare
5d23577 to
173f569
Compare
814cb8f to
5babb48
Compare
| impl Drop for FileNodeSession { | ||
| fn drop(&mut self) { | ||
| self.idempotent_finish() | ||
| .expect("Did not explicitly call finish() and encountered an error."); | ||
| } | ||
| } |
There was a problem hiding this comment.
I am strongly considering removing this. It doesn't make conceptual sense to finish on drop.
Originally, I was thinking "make sure finish() is called if it is forgotten," but this is not that. We'll finish even on an error, which we don't want on a philosophical / conceptual design level.
The file backend writes eagerly and there's no "undo," so the writes will probably still be there. This finish is ensuring the file handles are flushed and closed. But arguably we don't want to do this if we're erroring out 🤔
173f569 to
f7a9eff
Compare
5babb48 to
466df7f
Compare
f7a9eff to
8ff64ae
Compare
466df7f to
07c1e4e
Compare
40bfdbc to
ef3729b
Compare
Implements the current custom file format based Merkle tree node storage and retrieval logic for the `MerkleReader` and `MerkleWriter` traits as the new `FileBackend` type. The implementation internally uses `MerkleDbError`, but this is transformed at the trait boundry into an `OxenError`. The write sessions that `FileBackend` produces are `Box`'d because they are required to be object-safe (aka dyn-compatible). Also refactors `MerkleNodeDB::to_node`'s logic into a new function on the enum `EMerkleTreeNode::from_type_and_bytes` Additionally, `MerkleNodeDB`'s `open_read_write`, `write_node`, and `add_child` switch from generic `<N: TMerkleTreeNode>` to `&dyn TMerkleTreeNode` so they're callable through the dyn-compatible `NodeWriteSession`. Finally, fixes a bug in `MerkleNodeDb::open_read_only`: it was previously incorrectly setting the `node_id: MerkleHash` to `0`. The function now requires the right hash from the path to be provided and uses that.
07c1e4e to
6c2c648
Compare
Implements the current custom file format based Merkle tree node storage
and retrieval logic for the
MerkleReaderandMerkleWritertraits asthe new
FileBackendtype. The implementation internally usesMerkleDbError,but this is transformed at the trait boundry into an
OxenError.The write sessions that
FileBackendproduces areBox'd because they arerequired to be object-safe (aka dyn-compatible).
Also refactors
MerkleNodeDB::to_node's logic into a new function on theenum
EMerkleTreeNode::from_type_and_bytesAdditionally,
MerkleNodeDB'sopen_read_write,write_node, andadd_childswitch from generic
<N: TMerkleTreeNode>to&dyn TMerkleTreeNodeso they'recallable through the dyn-compatible
NodeWriteSession.Finally, fixes a bug in
MerkleNodeDb::open_read_only: it was previouslyincorrectly setting the
node_id: MerkleHashto0. The function now requiresthe right hash from the path to be provided and uses that.