Add multi-DB stress testing support#14550
Closed
hx235 wants to merge 2 commits intofacebook:mainfrom
Closed
Conversation
|
| Check | Count |
|---|---|
concurrency-mt-unsafe |
6 |
cppcoreguidelines-avoid-non-const-global-variables |
4 |
| Total | 10 |
Details
db_stress_tool/db_stress_common.cc (2 warning(s))
db_stress_tool/db_stress_common.cc:25:56: warning: variable 'shared_wbm' is non-const and globally accessible, consider making it const [cppcoreguidelines-avoid-non-const-global-variables]
db_stress_tool/db_stress_common.cc:26:49: warning: variable 'shared_rate_limiter' is non-const and globally accessible, consider making it const [cppcoreguidelines-avoid-non-const-global-variables]
db_stress_tool/db_stress_tool.cc (8 warning(s))
db_stress_tool/db_stress_tool.cc:47:5: warning: variable 'g_fault_fs_for_crash' is non-const and globally accessible, consider making it const [cppcoreguidelines-avoid-non-const-global-variables]
db_stress_tool/db_stress_tool.cc:48:12: warning: variable 'g_num_dbs_for_crash' is non-const and globally accessible, consider making it const [cppcoreguidelines-avoid-non-const-global-variables]
db_stress_tool/db_stress_tool.cc:101:7: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/db_stress_tool.cc:465:5: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/db_stress_tool.cc:470:5: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/db_stress_tool.cc:495:7: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/db_stress_tool.cc:548:29: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/db_stress_tool.cc:567:11: warning: function is not thread safe [concurrency-mt-unsafe]
Summary: AnonExpectedState::Open() never initialized the base class persisted_seqno_ pointer, leaving it as nullptr. Any call to SetPersistedSeqno() or GetPersistedSeqno() on an AnonExpectedState (used when expected_values_dir is empty) would dereference a null pointer. Fix by allocating and assigning it in Open(), matching FileExpectedState's pattern. Differential Revision: D98556119
1 similar comment
Summary: Add --num_dbs flag for running N independent StressTest instances sharing one Env. Each DB gets its own paths, fault injection, and stress threads. Motivation: validate shared-resource behavior (block cache, WriteBufferManager, RateLimiter, compressed secondary cache) across concurrent DB instances, and provide the multi-DB harness for future shared components like the auto-tuner. Note: multi-DB mode (--num_dbs > 1) is not yet enabled in CI. The db_crashtest.py randomization is commented out with a TODO at line 481. Enabling it in CI will be done in a follow-up change. Per-DB path layout: --db=/tmp/mydb → /tmp/mydb/db_0, /tmp/mydb/db_1, ... --expected_values_dir=/tmp/ev → /tmp/ev/db_0, /tmp/ev/db_1, ... --secondaries_base=/tmp/sec → /tmp/sec/db_0, /tmp/sec/db_1, ... ComputeDbPaths() appends /db_<i> for num_dbs > 1; single-DB paths are unchanged. Per-DB isolation: - Per-DB FaultInjectionTestFS (independent fault injection per DB) - Per-DB crash callback (async-signal-safe C array of raw pointers) - Per-DB RunStressTestImpl() with own SharedState — all features (backup, checkpoint, secondary DB, continuous verification, remote compaction) work correctly in multi-DB mode via per-DB paths Shared resources (across all DBs): - Env and thread pools (compaction, flush, bottom-pri) - Block cache - WriteBufferManager - RateLimiter Guards (features disabled/rejected in multi-DB mode): - clear_column_family_one_in: pre-existing CF handle race condition - compressed_secondary_cache: per-DB CompressedCacheSetCapacityThread instances race on the shared global cache (SetCapacity(0)/SetCapacity(size)) - test_multi_ops_txns: uses a single global key_spaces_path file that persists key range descriptors — multiple DBs would overwrite each other's ranges, causing layout corruption on reopen Python driver (db_crashtest.py) changes: - cleanup_after_success() forwards num_dbs, expected_values_dir, secondaries_base, test_secondary, and continuous_verification_interval to --destroy_db_and_exit so all directories are properly cleaned - Lambda params (expected_values_dir, num_dbs, test_secondary) materialized early in blackbox/whitebox because cleanup_after_success reads them from cmd_params which stores raw lambdas from default_params Bonus cleanup in --destroy_db_and_exit: - Cleans expected_values_dir via Env::Default() (always local, even when the DB itself lives on a remote env_uri/fs_uri) - Cleans secondaries_base including auto-generated default path when test_secondary or continuous_verification_interval is enabled These were pre-existing leaks in single-DB mode too, but fixed here because the Python forwarding above requires the C++ side to handle these flags. Differential Revision: D96575551
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Add --num_dbs flag for running N independent StressTest instances sharing
one Env. Each DB gets its own paths, fault injection, and stress threads.
Motivation: validate shared-resource behavior (block cache,
WriteBufferManager, RateLimiter, compressed secondary cache) across
concurrent DB instances, and provide the multi-DB harness for future
shared components like the auto-tuner.
Note: multi-DB mode (--num_dbs > 1) is not yet enabled in CI. The
db_crashtest.py randomization is commented out with a TODO at line 481.
Enabling it in CI will be done in a follow-up change.
Per-DB path layout:
--db=/tmp/mydb → /tmp/mydb/db_0, /tmp/mydb/db_1, ...
--expected_values_dir=/tmp/ev → /tmp/ev/db_0, /tmp/ev/db_1, ...
--secondaries_base=/tmp/sec → /tmp/sec/db_0, /tmp/sec/db_1, ...
Per-DB isolation:
(backup, checkpoint, secondary DB, continuous verification, remote
compaction) work correctly in multi-DB mode via per-DB paths
Shared resources (across all DBs):
Guards (features disabled/rejected in multi-DB mode):
instances race on the shared global cache (SetCapacity(0)/SetCapacity(size))
persists key range descriptors — multiple DBs would overwrite each
other's ranges, causing layout corruption on reopen
Python driver (db_crashtest.py) changes:
secondaries_base, test_secondary, and continuous_verification_interval
to --destroy_db_and_exit so all directories are properly cleaned
materialized early in blackbox/whitebox because cleanup_after_success
reads them from cmd_params which stores raw lambdas from default_params
Bonus cleanup in --destroy_db_and_exit:
when the DB itself lives on a remote env_uri/fs_uri)
test_secondary or continuous_verification_interval is enabled
These were pre-existing leaks in single-DB mode too, but fixed here
because the Python forwarding above requires the C++ side to handle
these flags.
Test Plan:
make db_stress
Result: PASSED
Single-DB backward compatibility (verification successful):
./db_stress --max_key=10000 --ops_per_thread=5000 --destroy_db_initially --db=/tmp/rocksdb_v_single
Result: PASSED
Multi-DB basic, 3 DBs (all verified):
./db_stress --num_dbs=3 --max_key=10000 --ops_per_thread=5000 --destroy_db_initially --db=/tmp/rocksdb_v_multi
Result: PASSED
Multi-DB + explicit expected_values_dir (all verified):
./db_stress --num_dbs=3 --max_key=10000 --ops_per_thread=5000 --destroy_db_initially --expected_values_dir=/tmp/test_ev --db=/tmp/rocksdb_v_evdir
Result: PASSED
[ongoing] CI stress test large-scale rehearsal