PS-11020 [8.0]: MyRocks - concurrent ADD INDEX could pollute L0 or crash the server. by polchawa-percona · Pull Request #5921 · percona/percona-server

polchawa-percona · 2026-04-09T10:54:33Z

Problem:

Concurrent ADD INDEX could lead to L0 level being polluted with the bulk-loaded data that was supposed to go to LMAX. Moreover, if the user has set rocksdb_bulk_load_fail_if_not_bottommost_level session variable to true, the server could crash when concurrently running ADD INDEX.

The same issues occurred when ADD INDEX was running while a new table was concurrently created.

Solution:

Serialize bulk loads together with all index id allocations so they could ingest SST files only having a guarantee that all RocksDB keys are the highest when inserted (to avoid the risk of collision with overlapping key ranges in merged SST files that include both smaller and higher index id values).

Wait when ingesting bulk-loaded SST, or when creating a table, until all bulk loads for the smaller index id are finished.

… the server. Problem: Concurrent ADD INDEX could lead to L0 level being polluted with the bulk-loaded data that was supposed to go to LMAX. Moreover, if the user has set rocksdb_bulk_load_fail_if_not_bottommost_level session variable to true, the server could crash when concurrently running ADD INDEX. The same issues occurred when ADD INDEX was running while a new table was concurrently created. Solution: Serialize bulk loads together with all index id allocations so they could ingest SST files only having guarantee that all RocksDB keys are the highest when inserted (to avoid risk of collision with overlapping key ranges in merged SST files that include both smaller and higher index id values). Wait when ingesting bulk loaded SST, or when creating a table up, until all bulk loads for smaller index id are finished.

polchawa-percona · 2026-04-09T11:00:53Z

This is draft of PR - I still want to double check that I don’t need to remove index id in other cases than finish_bulk_load and I want to add some assertions.

inikep · 2026-04-09T11:46:53Z

Commits reviewed:

3b006d7a34831060fbbd4805b6623900eb4cdfb4 - Added MTR test
1bdec346d3383e073aa102f56abd9a2083fce2a6 - BUG PS-11020 MyRocks - concurrent ADD INDEX could pollute L0 or crash the server.

Scope:

Static review only.
I did not build or run tests.

Findings

1. High: multi-index `ALTER` drops reservations for indexes that have not been populated yet

inplace_populate_sk() populates added secondary indexes one by one, but it passes the full indexes set into finish_bulk_load() on every iteration. finish_bulk_load() then waits on and removes reservations for every index in that set immediately after the first ingest.

That means ALTER TABLE ... ADD INDEX ..., ADD INDEX ... can clear the reservation for later indexes before their own bulk load starts. A concurrent table/index creation with a larger index_id can then proceed between the first and second index population steps, reintroducing the same ordering violation this patch is trying to prevent for the remaining indexes.

References:

storage/rocksdb/ha_rocksdb.cc:14905-14907
storage/rocksdb/ha_rocksdb.cc:3993-4009

2. High: reserved index ids are leaked when `finish_bulk_load()` has no work to ingest

The new reservation is removed only in the post-ingest path. However, finish_bulk_load() has multiple successful early returns before that block, including the m_curr_bulk_load.size() == 0 case and the sst_commit_list.size() == 0 case.

For ALTER TABLE ... ADD INDEX on an empty table, or any other path where no SST files are produced, the new index id remains reserved forever even if the DDL succeeds. Future DDL that gets a larger index_id will keep waiting on that stale reservation.

References:

storage/rocksdb/ha_rocksdb.cc:3588-3593
storage/rocksdb/ha_rocksdb.cc:3931-3951
storage/rocksdb/ha_rocksdb.cc:4004-4009
storage/rocksdb/ha_rocksdb.cc:8900-8915

3. Medium: rollback/error paths do not clear the new reservation state

On failure, commit_inplace_alter_table(commit=false) cleans up uncommitted keydefs and ongoing create-index metadata, but it never removes entries from m_index_id_reserved_for_create.

So any error after create_key_def() reserves the id, but before the successful ingest cleanup runs, leaves a stale in-memory reservation behind until restart. This can happen on allocation failures, scan errors, duplicate-key failures before ingest, or any other abort path before the post-ingest cleanup.

References:

storage/rocksdb/ha_rocksdb.cc:14986-15034
storage/rocksdb/ha_rocksdb.cc:8900-8915

4. Medium: the new MTR test is timing-based and misses the leaked-reservation cases

The new test replaces deterministic synchronization with --sleep 5, so it now depends on machine speed/load rather than a specific engine state. That makes it vulnerable to CI flakiness.

It also inserts a row specifically to make the new secondary index non-empty, so it does not cover the no-work reservation leak described above.

References:

mysql-test/suite/rocksdb/t/alter_table_add_secondary_key.test:39-91

- always reclaim index id reservation in finish_bulk_load - finish_bulk_load is invoked within a for-loop (not outside)

polchawa-percona requested review from inikep and percona-ysorokin April 9, 2026 10:54

polchawa-percona self-assigned this Apr 9, 2026

polchawa-percona added the bug label Apr 9, 2026

Added MTR test

3b006d7

polchawa-percona changed the title ~~BUG PS-11020 MyRocks - concurrent ADD INDEX could pollute L0 or crash the server.~~ PS-11020 [8.0]: MyRocks - concurrent ADD INDEX could pollute L0 or crash the server. Apr 9, 2026

polchawa-percona added 3 commits April 10, 2026 10:51

Addressed review comments:

bb16ac2

- always reclaim index id reservation in finish_bulk_load - finish_bulk_load is invoked within a for-loop (not outside)

Formatting fix.

600e554

Formatting fix.

d54c2d6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PS-11020 [8.0]: MyRocks - concurrent ADD INDEX could pollute L0 or crash the server.#5921

PS-11020 [8.0]: MyRocks - concurrent ADD INDEX could pollute L0 or crash the server.#5921
polchawa-percona wants to merge 5 commits into
percona:8.0from
polchawa-percona:PS-11020

polchawa-percona commented Apr 9, 2026

Uh oh!

polchawa-percona commented Apr 9, 2026 •

edited

Loading

Uh oh!

inikep commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

polchawa-percona commented Apr 9, 2026

Uh oh!

polchawa-percona commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

inikep commented Apr 9, 2026

Findings

1. High: multi-index ALTER drops reservations for indexes that have not been populated yet

2. High: reserved index ids are leaked when finish_bulk_load() has no work to ingest

3. Medium: rollback/error paths do not clear the new reservation state

4. Medium: the new MTR test is timing-based and misses the leaked-reservation cases

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

polchawa-percona commented Apr 9, 2026 •

edited

Loading

1. High: multi-index `ALTER` drops reservations for indexes that have not been populated yet

2. High: reserved index ids are leaked when `finish_bulk_load()` has no work to ingest