Revert CQ shared store: Delete from index on remove or roll over (#13959)#16142
Revert CQ shared store: Delete from index on remove or roll over (#13959)#16142lhoguin merged 2 commits intorabbitmq:mainfrom
Conversation
6f9c8f9 to
f3939cc
Compare
|
@lukebakken Loïc will be back next week. We will have a day or two to decide before the final |
…bbitmq#13959)" This reverts commit 0278980.
Restore three independent improvements from the reverted commit that are unrelated to the broken current_file_removes mechanism: - Relax index_update_fields assertion: true= -> _= so a missing key does not crash the process - Add prioritise_cast/3 to rabbit_msg_store_gc so delete requests are processed before compaction requests, avoiding unnecessary compaction of files that are already pending deletion - compact_file/2 early-exit guard was already present after the revert
f3939cc to
69fd9ff
Compare
|
Well this is something that shouldn't be rushed! |
lhoguin
left a comment
There was a problem hiding this comment.
Perhaps I went a bit too far on this. Deletion was too expensive so the PR fixed that, at the cost of making some compactions more expensive. A plain revert would mean we'd just go back to that state, but since we keep some changes I'm OK with merging this now and seeing how it fares overall in the wild.
|
In particular perhaps the prioritisation of delete is good enough to avoid the bigger issues of deletes. |
|
At some point we should rework the file format to avoid these issues entirely. We need the file itself to contain information about messages existing, or holes existing, and what size they are. The in-memory index should only ever refer to existing messages and act more like a cache over what is in the files, to be used by readers. |
|
@lukebakken so that you don't have to guess: the earliest this can ship now is |
|
@Mergifyio backport v4.3.x (the PR will be marked as delayed for |
✅ Backports have been createdDetails
|
|
Thanks @lhoguin |
Summary
Reverts 0278980 ("CQ shared store: Delete from index on remove or roll over", PR #13959), which introduced a regression in the classic queue message store GC. Three independent improvements from that commit are retained.
Fixes #16141.
Problem
PR #13959 replaced the
scan_and_vacuum_message_filecall indelete_filewith an eager index cleanup mechanism (current_file_removes). As a side effect, messages removed from non-current files now producenot_foundindex lookups duringscan_and_vacuum_message_fileinstead ofpreviously_validones. This was noted in the PR review by @gomoripeti but not addressed before merge.Under high throughput with many queues, the byte-by-byte
scan_next_bytescanning mode triggered bynot_foundentries causes GC compaction to fall far enough behind the publish rate that disk usage can grow without bound. The stall also causes consumer latency spikes and broker unresponsiveness on established TCP connections.Reproduction: 100 classic queues at 500 msg/s (120 KB messages) with a slow-ack consumer queue in the same vhost (acks held 1-30 min, up to 1000 in flight). On an m7g.large with a 196 GB EBS volume, disk fell from 185.4 GB to ~169 GB in ~100 minutes. Consumer latency reached a median of 1.5s and a max of 568s.
Reproduction scripts: https://github.com/lukebakken/rmq-gc-lag
Changes
Two commits:
current_file_removesmechanism:index_update_fieldsassertion (true=to_=) so a missing key does not crash the processprioritise_cast/3torabbit_msg_store_gcso delete requests are processed before compaction requests, avoiding unnecessary compaction of files already pending deletioncompact_file/2early-exit guard (file already deleted) was already present after the revertTesting
Ran the reproduction workload against this branch for 60 minutes (three consecutive 20-minute monitoring windows) at 500 msg/s with ~1000 unacked messages. Disk held stable in a 0.5 GB oscillation band (184.96-185.47 GB). Ready messages held at 0 throughout. No latency spikes.
For comparison, the same workload against unpatched
mainlost ~16 GB of disk in ~100 minutes with ready messages growing to 3500-4200.