Repo Integrity: atomic file write helpers (step 1.1)#545
Conversation
Foundation for the repository-integrity plan. Today HEAD, workspace
config, and version-store files are written directly to their final
path, so a process killed mid-write leaves a partially-written canonical
file on disk. Add a pair of helpers that always write via a sibling
temp file, fsync it, then rename it over the target — a crash before
the rename leaves the prior contents intact, a crash after leaves the
new contents.
util::fs::atomic_write_to_path(target, contents) // in-memory
util::fs::atomic_write_from_reader(target, reader) // streamed
Both go through a module-private `AtomicTempFile` that uses
`async-tempfile` for the temp creation (its Drop impl cleans up on
cancellation automatically) and runs the fsync / rename / best-effort
parent-dir fsync sequence on `commit`. The struct is private on purpose
— "you must commit, or the write is silently discarded" is the kind of
invariant that's easier to honor when only two callers (the helpers
above) in one file can construct one.
Step 1.2 (atomic HEAD), 1.3 (atomic workspace config), and 1.4
(version-store atomic write) migrate onto these helpers in follow-up
PRs.
Also migrates `OxenError::file_create_error` / `file_rename_error` off
`OxenError::basic_str` to new specific variants:
FileCreate(PathBuf, #[source] io::Error)
FileRename { src, dst, #[source] source: io::Error }
`file_rename_error`'s third parameter is tightened from `impl Debug`
to `std::io::Error`; both real callers already pass an `io::Error`.
|
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds temp-file-backed atomic write helpers (in-memory and streaming), new structured ChangesAtomic Filesystem Write Helpers
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
malcolmgreaves
left a comment
There was a problem hiding this comment.
Great design on AtomicTempFile and the pub parts only exposing the complete write path!
| let target_name = target.file_name().ok_or_else(|| { | ||
| OxenError::file_create_error( | ||
| target, | ||
| std::io::Error::other("target path has no filename component"), | ||
| ) | ||
| })?; |
There was a problem hiding this comment.
Minor, but you could check and error out before successfully creating the parent dir if it doesn't exist.
There was a problem hiding this comment.
Nice catch! ✅ Swapped the order.
| && let Ok(dir) = tokio::fs::File::open(parent).await | ||
| && let Err(err) = dir.sync_all().await | ||
| { | ||
| log::warn!("AtomicTempFile::commit: parent fsync failed for {parent:?}: {err}"); |
There was a problem hiding this comment.
If there's an error in the parent's open, I am pretty sure this will not log the warning. When the 2nd clause (tokio::fs::File::open) returns Err instead of Ok, the evaluation will both short-circuit and it treat the whole expression as false.
Did you want this to log a warning if dir.sync_all() didn't run? It seems like the intention is to warn if the directory fsync doesn't work as expected. Presumably not being able to run it is also a warning?
There was a problem hiding this comment.
Good point! ✅ I refactored the code so we warn in each case.
| Ok(()) | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Both of these functions and the entire AtomicTempFile idea are wonderful 💯
This PR adds two utility functions to write files atomically to the local filesystem.
Some of our most recent repository corruption (both client- and server-side) has occurred when operations were interrupted (i.e. the client or server is killed) while writing files to local storage on disk. It's extremely difficult (and slow) to constantly try to detect and recover from corrupted files every time we interact with storage-on-disk, so we need to do better in preventing the situation in the first place. These functions should facilitate that by making it so that we have high confidence that if a file exists, it contains the content we intended to write to it. These helpers accomplish that by writing to a temp file, and then renaming it over the target . A crash before the rename leaves the prior contents intact (and a temp file that can be cleaned up later). A crash after the rename leaves the new contents.
I also migrated
OxenError::file_create_errorandfile_rename_erroroff of usingOxenError::basic_strunder the hood. Instead, they use new specific variants: