Skip to content

For 4.3.1: Revert CQ shared store: Delete from index on remove or roll over (#13959) (backport #16142)#16195

Merged
michaelklishin merged 2 commits intov4.3.xfrom
mergify/bp/v4.3.x/pr-16142
Apr 27, 2026
Merged

For 4.3.1: Revert CQ shared store: Delete from index on remove or roll over (#13959) (backport #16142)#16195
michaelklishin merged 2 commits intov4.3.xfrom
mergify/bp/v4.3.x/pr-16142

Conversation

@mergify
Copy link
Copy Markdown

@mergify mergify Bot commented Apr 20, 2026

Summary

Reverts 0278980 ("CQ shared store: Delete from index on remove or roll over", PR #13959), which introduced a regression in the classic queue message store GC. Three independent improvements from that commit are retained.

Fixes #16141.

Problem

PR #13959 replaced the scan_and_vacuum_message_file call in delete_file with an eager index cleanup mechanism (current_file_removes). As a side effect, messages removed from non-current files now produce not_found index lookups during scan_and_vacuum_message_file instead of previously_valid ones. This was noted in the PR review by @gomoripeti but not addressed before merge.

Under high throughput with many queues, the byte-by-byte scan_next_byte scanning mode triggered by not_found entries causes GC compaction to fall far enough behind the publish rate that disk usage can grow without bound. The stall also causes consumer latency spikes and broker unresponsiveness on established TCP connections.

Reproduction: 100 classic queues at 500 msg/s (120 KB messages) with a slow-ack consumer queue in the same vhost (acks held 1-30 min, up to 1000 in flight). On an m7g.large with a 196 GB EBS volume, disk fell from 185.4 GB to ~169 GB in ~100 minutes. Consumer latency reached a median of 1.5s and a max of 568s.

Reproduction scripts: https://github.com/lukebakken/rmq-gc-lag

Changes

Two commits:

  1. Revert 0278980 in full.
  2. Restore three independent improvements from that commit that are unrelated to the broken current_file_removes mechanism:
    • Relax index_update_fields assertion (true= to _=) so a missing key does not crash the process
    • Add prioritise_cast/3 to rabbit_msg_store_gc so delete requests are processed before compaction requests, avoiding unnecessary compaction of files already pending deletion
    • compact_file/2 early-exit guard (file already deleted) was already present after the revert

Testing

Ran the reproduction workload against this branch for 60 minutes (three consecutive 20-minute monitoring windows) at 500 msg/s with ~1000 unacked messages. Disk held stable in a 0.5 GB oscillation band (184.96-185.47 GB). Ready messages held at 0 throughout. No latency spikes.

For comparison, the same workload against unpatched main lost ~16 GB of disk in ~100 minutes with ready messages growing to 3500-4200.


This is an automatic backport of pull request #16142 done by [Mergify](https://mergify.com).

Restore three independent improvements from the reverted commit that
are unrelated to the broken current_file_removes mechanism:

- Relax index_update_fields assertion: true= -> _= so a missing key
  does not crash the process
- Add prioritise_cast/3 to rabbit_msg_store_gc so delete requests are
  processed before compaction requests, avoiding unnecessary compaction
  of files that are already pending deletion
- compact_file/2 early-exit guard was already present after the revert

(cherry picked from commit 69fd9ff)
@michaelklishin michaelklishin changed the title Revert CQ shared store: Delete from index on remove or roll over (#13959) (backport #16142) DO NOT MERGE For 4.3.1: Revert CQ shared store: Delete from index on remove or roll over (#13959) (backport #16142) Apr 20, 2026
@michaelklishin michaelklishin added this to the 4.3.1 milestone Apr 20, 2026
@michaelklishin michaelklishin changed the title DO NOT MERGE For 4.3.1: Revert CQ shared store: Delete from index on remove or roll over (#13959) (backport #16142) For 4.3.1: Revert CQ shared store: Delete from index on remove or roll over (#13959) (backport #16142) Apr 27, 2026
@michaelklishin michaelklishin merged commit cb0a57f into v4.3.x Apr 27, 2026
189 checks passed
@michaelklishin michaelklishin deleted the mergify/bp/v4.3.x/pr-16142 branch April 27, 2026 23:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants