Skip to content

feat(upload): presigned direct-to-storage uploads#189

Merged
zfarrell merged 5 commits into
mainfrom
worktree-sunny-fluttering-gray
Jun 27, 2026
Merged

feat(upload): presigned direct-to-storage uploads#189
zfarrell merged 5 commits into
mainfrom
worktree-sunny-fluttering-gray

Conversation

@zfarrell

Copy link
Copy Markdown
Contributor

Switch managed-table loads to the presigned/direct-to-storage upload path (SDK upload_file, published in hotdata 0.5.0), removing the legacy /v1/files proxy seam. Multipart concurrency defaults to 12, overridable via HOTDATA_UPLOAD_CONCURRENCY.

@codecov

codecov Bot commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 81.31313% with 37 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/databases.rs 61.62% 33 Missing ⚠️
src/sdk.rs 96.42% 4 Missing ⚠️

📢 Thoughts on this report? Let us know!

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

Blocking Issues

  • src/databases.rs (upload_parquet_url / TempGuard) — The downloaded temp file is leaked on every failed --url upload. upload_parquet_path exits via ApiError::exit -> std::process::exit(1) (sdk.rs:280-283), which does not unwind or run destructors, so the TempGuard never drops. This directly contradicts the code comments ("removed on every exit (success or failure)"). On large files this leaves multi-GB files in the temp dir on the common error paths (e.g. 501 PRESIGN_UNSUPPORTED, storage rejection).

Action Required

  • Ensure the temp file is removed before exiting on the upload-failure path — e.g. have upload_parquet_path return Result<String, ApiError> and clean up (drop / remove_file) before calling e.exit(), rather than exiting from inside a function whose caller owns the guard.

Non-blocking nits left inline (insecure temp-file creation; stale Cargo.toml comment).

Comment thread src/databases.rs Outdated
Comment on lines +456 to +457
// The guard's drop removes the temp file once this returns (or unwinds).
upload_parquet_path(api, temp.path(), size)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: the temp file is leaked on the upload-failure path, contradicting the comment ("removed on every exit (success or failure)" / "once this returns (or unwinds)").

upload_parquet_path exits on failure via Err(e) => e.exit(), and ApiError::exit calls std::process::exit(1) (sdk.rs:280-283). std::process::exit neither unwinds nor runs destructors, so the TempGuard in this function's scope is not dropped — the downloaded temp file (potentially many GB) is left behind in the system temp dir on every failed --url upload (e.g. 501 PRESIGN_UNSUPPORTED, storage rejection). Repeated failures can fill the temp partition (often a small tmpfs).

The success path is fine (normal return drops the guard). Only the process::exit error path leaks. Consider having upload_parquet_path return Result<String, ApiError> so this function can clean up (drop/remove the temp file) before exiting, e.g.:

let id = match upload_parquet_path(api, temp.path(), size) {
    Ok(id) => id,
    Err(e) => {
        drop(temp); // remove temp file before exiting
        e.exit();
    }
};

or explicitly std::fs::remove_file(temp.path()) before the exit.

Comment thread src/databases.rs Outdated
Comment on lines +488 to +499
let nanos = SystemTime::now()
.duration_since(UNIX_EPOCH)
.map(|d| d.as_nanos())
.unwrap_or(0);
let path = std::env::temp_dir().join(format!(
"hotdata-upload-{}-{nanos}.parquet",
std::process::id()
));
let guard = TempGuard { path };

let mut reader = pb.wrap_read(resp);
let mut file = std::fs::File::create(guard.path())?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: (not blocking) this hand-rolls a predictable temp filename (hotdata-upload-{pid}-{nanos}.parquet) and opens it with File::create, which is O_CREAT|O_TRUNC without O_EXCL. In a shared/world-writable temp dir this is the classic insecure-temp-file pattern (CWE-377/CWE-59): an attacker who pre-creates a symlink at the path can redirect the write (std::io::copy follows it) and clobber a victim-writable file. tempfile is already a regular dependency (Cargo.toml:57) and is used in the tests — tempfile::Builder::new().suffix(".parquet").tempfile() creates the file atomically with O_EXCL and 0600 perms, and gives you the same path-based handle. Worth using here too. (Note: a tempfile handle still won't auto-delete across process::exit — see the leak comment above.)

claude[bot]
claude Bot previously approved these changes Jun 26, 2026

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both prior review issues are resolved:

  • The temp-file leak on the upload-failure path is fixed: upload_temp_file owns the NamedTempFile and drops it before returning the Err the caller exits on, so cleanup precedes process::exit. Tests pin both the success and failure arms.
  • The insecure temp-file pattern is fixed: download_to_temp now creates the file atomically via tempfile::Builder (O_EXCL + 0600).

LGTM.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior review items resolved: the temp-file leak on the upload-failure path is now fixed (upload_temp_file drops the NamedTempFile before returning the Err the caller exits on, with tests pinning both arms), and the temp file is created atomically via tempfile::Builder (O_EXCL + 0600). No new blocking issues.

@zfarrell zfarrell merged commit 009ec38 into main Jun 27, 2026
14 checks passed
@zfarrell zfarrell deleted the worktree-sunny-fluttering-gray branch June 27, 2026 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant