Skip to content

Repo Integrity: atomic file write helpers (step 1.1)#545

Merged
CleanCut merged 2 commits into
mainfrom
cleancut/repo-integrity-1.1-atomic-writes
May 12, 2026
Merged

Repo Integrity: atomic file write helpers (step 1.1)#545
CleanCut merged 2 commits into
mainfrom
cleancut/repo-integrity-1.1-atomic-writes

Conversation

@CleanCut
Copy link
Copy Markdown
Contributor

This PR adds two utility functions to write files atomically to the local filesystem.

Some of our most recent repository corruption (both client- and server-side) has occurred when operations were interrupted (i.e. the client or server is killed) while writing files to local storage on disk. It's extremely difficult (and slow) to constantly try to detect and recover from corrupted files every time we interact with storage-on-disk, so we need to do better in preventing the situation in the first place. These functions should facilitate that by making it so that we have high confidence that if a file exists, it contains the content we intended to write to it. These helpers accomplish that by writing to a temp file, and then renaming it over the target . A crash before the rename leaves the prior contents intact (and a temp file that can be cleaned up later). A crash after the rename leaves the new contents.

util::fs::atomic_write_to_path(target, contents)   // for known-small files
util::fs::atomic_write_from_reader(target, reader) // for large or streamed files

I also migrated OxenError::file_create_error and file_rename_error off of using OxenError::basic_str under the hood. Instead, they use new specific variants:

FileCreate(PathBuf, #[source] io::Error)
FileRename { src, dst, #[source] source: io::Error }

Foundation for the repository-integrity plan. Today HEAD, workspace
config, and version-store files are written directly to their final
path, so a process killed mid-write leaves a partially-written canonical
file on disk. Add a pair of helpers that always write via a sibling
temp file, fsync it, then rename it over the target — a crash before
the rename leaves the prior contents intact, a crash after leaves the
new contents.

    util::fs::atomic_write_to_path(target, contents)   // in-memory
    util::fs::atomic_write_from_reader(target, reader) // streamed

Both go through a module-private `AtomicTempFile` that uses
`async-tempfile` for the temp creation (its Drop impl cleans up on
cancellation automatically) and runs the fsync / rename / best-effort
parent-dir fsync sequence on `commit`. The struct is private on purpose
— "you must commit, or the write is silently discarded" is the kind of
invariant that's easier to honor when only two callers (the helpers
above) in one file can construct one.

Step 1.2 (atomic HEAD), 1.3 (atomic workspace config), and 1.4
(version-store atomic write) migrate onto these helpers in follow-up
PRs.

Also migrates `OxenError::file_create_error` / `file_rename_error` off
`OxenError::basic_str` to new specific variants:

    FileCreate(PathBuf, #[source] io::Error)
    FileRename { src, dst, #[source] source: io::Error }

`file_rename_error`'s third parameter is tightened from `impl Debug`
to `std::io::Error`; both real callers already pass an `io::Error`.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

Review Change Stack
No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f750c4af-2d0b-4a07-adf5-1dbfef63b139

📥 Commits

Reviewing files that changed from the base of the PR and between 19c9d48 and 847ca70.

📒 Files selected for processing (1)
  • crates/lib/src/util/fs.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/lib/src/util/fs.rs

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added atomic file-write operations that guarantee safe, consistent writes for both in-memory and streaming payloads, with buffering and explicit flush before commit.
  • Bug Fixes

    • Improved file-create and file-rename error reporting with structured errors that expose underlying IO errors for clearer diagnostics.
  • Documentation

    • Updated guidance on streamed async IO: recommends large-capacity buffering and explicit flush to prevent silently dropped bytes.
  • Tests

    • Added comprehensive tests covering atomic writes, overwrites, concurrency, and cleanup.

Walkthrough

Adds temp-file-backed atomic write helpers (in-memory and streaming), new structured OxenError variants for create/rename failures, explicit 10MB buffering and flush semantics for streamed writes, and a Tokio test suite validating correctness and concurrency.

Changes

Atomic Filesystem Write Helpers

Layer / File(s) Summary
Error types for filesystem operations
crates/lib/src/error.rs
New FileCreate and FileRename enum variants carry target paths and underlying io::Error sources; file_create_error and file_rename_error now return structured variants.
Atomic write plumbing and imports
crates/lib/src/util/fs.rs
Adds tokio::io::AsyncWriteExt import and private AtomicTempFile implementing create/fsync/rename/parent-fsync/cleanup flow.
Public atomic write APIs (in-memory & streaming)
crates/lib/src/util/fs.rs
pub async fn atomic_write_to_path(...) writes in-memory bytes; pub async fn atomic_write_from_reader(...) streams into a 10MB BufWriter and explicitly flush().await? before commit.
Test suite for atomic write operations
crates/lib/src/util/fs.rs
Tokio tests validate round-trip correctness, overwrite behavior, parent-dir creation, empty payloads, temp filename pattern and cleanup, streaming failure cleanup, and concurrent-writer correctness.
Implementation guidance for streamed async IO
.claude/CLAUDE.md
Adds rule: for unknown-length streamed IO use large-capacity buffering (BufWriter::with_capacity(10 * 1024 * 1024, ...)), optional BufReader, and call flush().await? before fsync/rename/checksum steps.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • Oxen-AI/Oxen#539: Edits to .claude/CLAUDE.md related to guidance in this PR.
  • Oxen-AI/Oxen#410: Prior changes shifting from blocking path IO to reader-based async streaming and large BufWriter usage.

Suggested reviewers

  • malcolmgreaves
  • gschoeni

Poem

🐰 I write with a temp-file, then hop to rename,
Buffers ten megs wide so the bytes stay the same.
A flush and a fsync, a careful small dance,
Now the file is atomic — hooray for chance! 📦✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: introducing atomic file write helpers for repository integrity. It is concise, specific, and clearly indicates the primary purpose of the changeset.
Description check ✅ Passed The description is well-related to the changeset, explaining the motivation (preventing corruption during interruptions), the implementation approach (temp file + rename pattern), and listing the new public APIs and error variants being introduced.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch cleancut/repo-integrity-1.1-atomic-writes

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Collaborator

@malcolmgreaves malcolmgreaves left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great design on AtomicTempFile and the pub parts only exposing the complete write path!

Comment thread crates/lib/src/util/fs.rs Outdated
Comment on lines +251 to +256
let target_name = target.file_name().ok_or_else(|| {
OxenError::file_create_error(
target,
std::io::Error::other("target path has no filename component"),
)
})?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, but you could check and error out before successfully creating the parent dir if it doesn't exist.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! ✅ Swapped the order.

Comment thread crates/lib/src/util/fs.rs Outdated
&& let Ok(dir) = tokio::fs::File::open(parent).await
&& let Err(err) = dir.sync_all().await
{
log::warn!("AtomicTempFile::commit: parent fsync failed for {parent:?}: {err}");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's an error in the parent's open, I am pretty sure this will not log the warning. When the 2nd clause (tokio::fs::File::open) returns Err instead of Ok, the evaluation will both short-circuit and it treat the whole expression as false.

Did you want this to log a warning if dir.sync_all() didn't run? It seems like the intention is to warn if the directory fsync doesn't work as expected. Presumably not being able to run it is also a warning?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! ✅ I refactored the code so we warn in each case.

Comment thread crates/lib/src/util/fs.rs
Comment thread crates/lib/src/util/fs.rs
Ok(())
}
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these functions and the entire AtomicTempFile idea are wonderful 💯

@CleanCut CleanCut enabled auto-merge (squash) May 12, 2026 22:40
@CleanCut CleanCut merged commit 7187d24 into main May 12, 2026
9 checks passed
@CleanCut CleanCut deleted the cleancut/repo-integrity-1.1-atomic-writes branch May 12, 2026 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants