feat(upload): presigned direct-to-storage uploads#189
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Review
Blocking Issues
- src/databases.rs (upload_parquet_url / TempGuard) — The downloaded temp file is leaked on every failed
--urlupload.upload_parquet_pathexits viaApiError::exit->std::process::exit(1)(sdk.rs:280-283), which does not unwind or run destructors, so theTempGuardnever drops. This directly contradicts the code comments ("removed on every exit (success or failure)"). On large files this leaves multi-GB files in the temp dir on the common error paths (e.g.501 PRESIGN_UNSUPPORTED, storage rejection).
Action Required
- Ensure the temp file is removed before exiting on the upload-failure path — e.g. have
upload_parquet_pathreturnResult<String, ApiError>and clean up (drop /remove_file) before callinge.exit(), rather than exiting from inside a function whose caller owns the guard.
Non-blocking nits left inline (insecure temp-file creation; stale Cargo.toml comment).
| // The guard's drop removes the temp file once this returns (or unwinds). | ||
| upload_parquet_path(api, temp.path(), size) |
There was a problem hiding this comment.
Blocking: the temp file is leaked on the upload-failure path, contradicting the comment ("removed on every exit (success or failure)" / "once this returns (or unwinds)").
upload_parquet_path exits on failure via Err(e) => e.exit(), and ApiError::exit calls std::process::exit(1) (sdk.rs:280-283). std::process::exit neither unwinds nor runs destructors, so the TempGuard in this function's scope is not dropped — the downloaded temp file (potentially many GB) is left behind in the system temp dir on every failed --url upload (e.g. 501 PRESIGN_UNSUPPORTED, storage rejection). Repeated failures can fill the temp partition (often a small tmpfs).
The success path is fine (normal return drops the guard). Only the process::exit error path leaks. Consider having upload_parquet_path return Result<String, ApiError> so this function can clean up (drop/remove the temp file) before exiting, e.g.:
let id = match upload_parquet_path(api, temp.path(), size) {
Ok(id) => id,
Err(e) => {
drop(temp); // remove temp file before exiting
e.exit();
}
};or explicitly std::fs::remove_file(temp.path()) before the exit.
| let nanos = SystemTime::now() | ||
| .duration_since(UNIX_EPOCH) | ||
| .map(|d| d.as_nanos()) | ||
| .unwrap_or(0); | ||
| let path = std::env::temp_dir().join(format!( | ||
| "hotdata-upload-{}-{nanos}.parquet", | ||
| std::process::id() | ||
| )); | ||
| let guard = TempGuard { path }; | ||
|
|
||
| let mut reader = pb.wrap_read(resp); | ||
| let mut file = std::fs::File::create(guard.path())?; |
There was a problem hiding this comment.
nit: (not blocking) this hand-rolls a predictable temp filename (hotdata-upload-{pid}-{nanos}.parquet) and opens it with File::create, which is O_CREAT|O_TRUNC without O_EXCL. In a shared/world-writable temp dir this is the classic insecure-temp-file pattern (CWE-377/CWE-59): an attacker who pre-creates a symlink at the path can redirect the write (std::io::copy follows it) and clobber a victim-writable file. tempfile is already a regular dependency (Cargo.toml:57) and is used in the tests — tempfile::Builder::new().suffix(".parquet").tempfile() creates the file atomically with O_EXCL and 0600 perms, and gives you the same path-based handle. Worth using here too. (Note: a tempfile handle still won't auto-delete across process::exit — see the leak comment above.)
There was a problem hiding this comment.
Both prior review issues are resolved:
- The temp-file leak on the upload-failure path is fixed:
upload_temp_fileowns theNamedTempFileand drops it before returning theErrthe caller exits on, so cleanup precedesprocess::exit. Tests pin both the success and failure arms. - The insecure temp-file pattern is fixed:
download_to_tempnow creates the file atomically viatempfile::Builder(O_EXCL+ 0600).
LGTM.
There was a problem hiding this comment.
Prior review items resolved: the temp-file leak on the upload-failure path is now fixed (upload_temp_file drops the NamedTempFile before returning the Err the caller exits on, with tests pinning both arms), and the temp file is created atomically via tempfile::Builder (O_EXCL + 0600). No new blocking issues.
Switch managed-table loads to the presigned/direct-to-storage upload path (SDK
upload_file, published in hotdata 0.5.0), removing the legacy/v1/filesproxy seam. Multipart concurrency defaults to 12, overridable viaHOTDATA_UPLOAD_CONCURRENCY.