Skip to content

Clean up leaked remote compaction outputs in crash tests#14592

Open
xingbowang wants to merge 3 commits intofacebook:mainfrom
xingbowang:2026_04_09_T263917962
Open

Clean up leaked remote compaction outputs in crash tests#14592
xingbowang wants to merge 3 commits intofacebook:mainfrom
xingbowang:2026_04_09_T263917962

Conversation

@xingbowang
Copy link
Copy Markdown
Contributor

Remote compaction jobs write per-job tmp_output_* directories under the crash-test DB. Those directories need to stay around while a live db_stress process is exercising resumable compaction, but terminal OpenAndCompact() failures were leaving them behind and long crash-test runs could eventually fill /dev/shm.

Changes

db_stress_common.cc

  • Extract DestroyOutputDirectory() from CleanupOutputDirectory() so the low-level destroy (with fault-injection disable/re-enable) can be called independently.
  • Call DestroyOutputDirectory() on terminal OpenAndCompact() failures so the per-job output directory is cleaned up immediately instead of being leaked.

db_crashtest.py

  • Add cleanup_stale_remote_compaction_outputs(dbname) — between crash-test iterations and before the final verification run, prune any leftover tmp_output_* directories that a killed child process may have leaked.
  • Called in both blackbox (between iterations + before verify) and whitebox (between iterations) paths.

db_crashtest_test.py

  • Test that the cleanup helper removes only tmp_output_* entries and leaves other DB contents (.sst files, .backup* dirs) untouched.
  • Test that a missing DB directory is handled gracefully (no-op).

Refs T263917962.

Remote compaction jobs write per-job tmp_output_* directories under the crash-test DB. Those directories need to stay around while a live db_stress process is exercising resumable compaction, but terminal OpenAndCompact() failures were leaving them behind and long crash-test runs could eventually fill /dev/shm.

Clean up tmp_output_* directories once a remote compaction job reaches a terminal failure, and prune any stale tmp_output_* directories between crash-test iterations so leaked state from a killed child process does not accumulate across the full run series.

Also add focused Python coverage for the stale-directory cleanup helper.

Refs T263917962.
@meta-cla meta-cla Bot added the CLA Signed label Apr 9, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 9, 2026

✅ clang-tidy: No findings on changed lines

Completed in 0.0s.

@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented Apr 9, 2026

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D100203332.

# Conflicts:
#	tools/db_crashtest.py
#	tools/db_crashtest_test.py
@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented Apr 15, 2026

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D100203332.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant