-
Notifications
You must be signed in to change notification settings - Fork 6.8k
db_stress: fix crashes during concurrent column family drop+recreate … #14601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -110,6 +110,10 @@ void StressTest::CleanUpColumnFamilies() { | |
| delete cf; | ||
| } | ||
| column_families_.clear(); | ||
| for (auto* cf : old_column_families_) { | ||
| delete cf; | ||
| } | ||
| old_column_families_.clear(); | ||
| for (auto* cf : secondary_cfhs_) { | ||
| delete cf; | ||
| } | ||
|
|
@@ -796,9 +800,17 @@ Status StressTest::SetOptions(ThreadState* thread) { | |
| } | ||
|
|
||
| Options StressTest::GetOptions(int cf_id) { | ||
| auto cfh = column_families_[cf_id]; | ||
| assert(cfh); | ||
| return db_->GetOptions(cfh); | ||
|
Comment on lines
-799
to
-801
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The old code treat CF id as the index of column_families_. This is right, if CF is never deleted and recreated. Given we do have code that recreate CF (although this is never called, that's why the bug was never triggered), we should not hold this assumption anymore. I think the fix is correct. Meantime, once we re-enable the drop/recreate test, accessing it would be better guarded by a mutex for thread safety. Although since it is just a pointer of array that is never resized, most hardware would guarantee thread safety on raw pointer read/write, but tsan might warn about this. |
||
| // cf_id is the RocksDB internal column family ID, not the index into | ||
| // column_families_. After a CF drop+recreate, the new CF gets a higher ID | ||
| // that may exceed column_families_.size(). Search for the matching handle. | ||
| for (auto* cfh : column_families_) { | ||
| if (cfh && cfh->GetID() == static_cast<uint32_t>(cf_id)) { | ||
| return db_->GetOptions(cfh); | ||
| } | ||
| } | ||
|
Comment on lines
-799
to
+810
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On cf id not found, we used to cause segfault, accessing nullptr. Now we just return default options_. Although with delete/recreate, this is a valid case we need to handle properly for remote compaction. I think it is better to surface CF not found error back to caller in remote compaction and have remote compaction just abort the compaction gracefully. |
||
| // CF was dropped and recreated with a new ID. Return base options since the | ||
| // remote compaction for the old CF will fail anyway. | ||
| return options_; | ||
| } | ||
|
|
||
| void StressTest::ProcessRecoveredPreparedTxns(SharedState* shared) { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since you are here, we should probably fix the type, uint32_t is the right type for cf_id