Skip to content

Fix dlx checkout command redirection (backport #16203) (backport #16204)#16205

Merged
ansd merged 2 commits intov4.2.xfrom
mergify/bp/v4.2.x/pr-16204
Apr 23, 2026
Merged

Fix dlx checkout command redirection (backport #16203) (backport #16204)#16205
ansd merged 2 commits intov4.2.xfrom
mergify/bp/v4.2.x/pr-16204

Conversation

@mergify
Copy link
Copy Markdown

@mergify mergify Bot commented Apr 23, 2026

This commit supersedes #15548.

What?

Fix the following genuine CI flake:

make -C deps/rabbit ct-rabbit_fifo_dlx_integration t=cluster_size_3:single_dlx_worker

Sometimes this test case failed with the logs showing the following:

2026-02-24 09:06:19.413770+00:00 [debug] <0.2377.0> queue 'single_dlx_worker_source' in vhost '/': vote granted for term 3 votes 2
2026-02-24 09:06:19.414048+00:00 [debug] <0.2377.0> started rabbit_fifo_dlx_worker <0.2600.0> for queue 'single_dlx_worker_source' in vhost '/'
2026-02-24 09:06:19.414096+00:00 [notice] <0.2377.0> queue 'single_dlx_worker_source' in vhost '/': candidate -> leader in term: 3 machine version: 7, last applied 5
2026-02-24 09:06:19.414388+00:00 [debug] <0.2602.0> queue 'single_dlx_worker_source' in vhost '/': updating leader record to current node rmq-ct-cluster_size_3-1-28000@localhost
2026-02-24 09:06:19.414279+00:00 [info] <0.2377.0> queue 'single_dlx_worker_source' in vhost '/': leader saw request_vote_rpc from {'%2F_single�[118;1:3u_dlx_worker_source','rmq-ct-cluster_size_3-3-28144@localhost'} for term 4 abdicates term: 3!
2026-02-24 09:06:19.417479+00:00 [notice] <0.2377.0> queue 'single_dlx_worker_source' in vhost '/': leader -> follower in term: 4 machine version: 7, last applied 5
2026-02-24 09:06:19.417533+00:00 [debug] <0.2377.0> queue 'single_dlx_worker_source' in vhost '/': is not new, setting election timeout.
2026-02-24 09:06:19.417740+00:00 [info] <0.2377.0> queue 'single_dlx_worker_source' in vhost '/': declining vote for {'%2F_single_dlx_worker_source','rmq-ct-cluster_size_3-3-28144@localhost'} for term 4, candidate last log {index, term} was: {5,2}  last log entry {index, term} is: {{6,3}}
2026-02-24 09:06:19.417824+00:00 [debug] <0.2377.0> queue 'single_dlx_worker_source' in vhost '/': leader call - leader not known. Command will be forwarded once leader is known.
2026-02-24 09:06:19.418190+00:00 [debug] <0.2377.0> queue 'single_dlx_worker_source' in vhost '/' declining pre-vote to {'%2F_single_dlx_worker_source','rmq-ct-cluster_size_3-2-28072@localhost'} for term 3, current term 4
2026-02-24 09:06:19.428043+00:00 [debug] <0.2377.0> queue 'single_dlx_worker_source' in vhost '/': resetting last index to 5 from 6 in term 4
2026-02-24 09:06:19.428157+00:00 [info] <0.2377.0> queue 'single_dlx_worker_source' in vhost '/': detected a new leader {'%2F_single_dlx_worker_source','rmq-ct-cluster_size_3-3-28144@localhost'} in term 4
2026-02-24 09:06:19.428280+00:00 [debug] <0.2377.0> queue 'single_dlx_worker_source' in vhost '/': mem table overwriting detected whilst staging entries, opening new mem table
2026-02-24 09:06:19.436299+00:00 [debug] <0.2377.0> queue 'single_dlx_worker_source' in vhost '/': enabling ra cluster changes in 4, index 6
2026-02-24 09:06:19.436411+00:00 [debug] <0.2377.0> Terminating <31028.1894.0> since <31122.2516.0> becomes active rabbit_fifo_dlx_worker
2026-02-24 09:06:19.437003+00:00 [debug] <0.2377.0> Terminating <31122.2516.0> since <0.2600.0> becomes active rabbit_fifo_dlx_worker
2026-02-24 09:06:19.437107+00:00 [warning] <0.2600.0> Failed to process command {dlx,{checkout,<0.2600.0>,2}} on quorum queue leader {'%2F_single_dlx_worker_source',
2026-02-24 09:06:19.437107+00:00 [warning] <0.2600.0>                                                                                 'rmq-ct-cluster_size_3-1-28000@localhost'} because actual leader is {'%2F_single_dlx_worker_source',
2026-02-24 09:06:19.437107+00:00 [warning] <0.2600.0>                                                                                                                                                      'rmq-ct-cluster_size_3-3-28144@localhost'}.

In this case a quorum queue could end up without a dlx worker.

How?

This commit supersedes #15548.

In this commit, we use ra:pipeline_command/4 with a selective receive instead of ra:process_command/3 for the dlx checkout command. This prevents Ra from automatically redirecting the checkout command to a new leader if a failover happens while the command is being processed.


This is an automatic backport of pull request #16203 done by [Mergify](https://mergify.com).
This is an automatic backport of pull request #16204 done by [Mergify](https://mergify.com).

This commit supersedes #15548.

 ## What?

Fix the following genuine CI flake:
```
make -C deps/rabbit ct-rabbit_fifo_dlx_integration t=cluster_size_3:single_dlx_worker
```

Sometimes this test case failed with the logs showing the following:
```text
2026-02-24 09:06:19.413770+00:00 [debug] <0.2377.0> queue 'single_dlx_worker_source' in vhost '/': vote granted for term 3 votes 2
2026-02-24 09:06:19.414048+00:00 [debug] <0.2377.0> started rabbit_fifo_dlx_worker <0.2600.0> for queue 'single_dlx_worker_source' in vhost '/'
2026-02-24 09:06:19.414096+00:00 [notice] <0.2377.0> queue 'single_dlx_worker_source' in vhost '/': candidate -> leader in term: 3 machine version: 7, last applied 5
2026-02-24 09:06:19.414388+00:00 [debug] <0.2602.0> queue 'single_dlx_worker_source' in vhost '/': updating leader record to current node rmq-ct-cluster_size_3-1-28000@localhost
2026-02-24 09:06:19.414279+00:00 [info] <0.2377.0> queue 'single_dlx_worker_source' in vhost '/': leader saw request_vote_rpc from {'%2F_single�[118;1:3u_dlx_worker_source','rmq-ct-cluster_size_3-3-28144@localhost'} for term 4 abdicates term: 3!
2026-02-24 09:06:19.417479+00:00 [notice] <0.2377.0> queue 'single_dlx_worker_source' in vhost '/': leader -> follower in term: 4 machine version: 7, last applied 5
2026-02-24 09:06:19.417533+00:00 [debug] <0.2377.0> queue 'single_dlx_worker_source' in vhost '/': is not new, setting election timeout.
2026-02-24 09:06:19.417740+00:00 [info] <0.2377.0> queue 'single_dlx_worker_source' in vhost '/': declining vote for {'%2F_single_dlx_worker_source','rmq-ct-cluster_size_3-3-28144@localhost'} for term 4, candidate last log {index, term} was: {5,2}  last log entry {index, term} is: {{6,3}}
2026-02-24 09:06:19.417824+00:00 [debug] <0.2377.0> queue 'single_dlx_worker_source' in vhost '/': leader call - leader not known. Command will be forwarded once leader is known.
2026-02-24 09:06:19.418190+00:00 [debug] <0.2377.0> queue 'single_dlx_worker_source' in vhost '/' declining pre-vote to {'%2F_single_dlx_worker_source','rmq-ct-cluster_size_3-2-28072@localhost'} for term 3, current term 4
2026-02-24 09:06:19.428043+00:00 [debug] <0.2377.0> queue 'single_dlx_worker_source' in vhost '/': resetting last index to 5 from 6 in term 4
2026-02-24 09:06:19.428157+00:00 [info] <0.2377.0> queue 'single_dlx_worker_source' in vhost '/': detected a new leader {'%2F_single_dlx_worker_source','rmq-ct-cluster_size_3-3-28144@localhost'} in term 4
2026-02-24 09:06:19.428280+00:00 [debug] <0.2377.0> queue 'single_dlx_worker_source' in vhost '/': mem table overwriting detected whilst staging entries, opening new mem table
2026-02-24 09:06:19.436299+00:00 [debug] <0.2377.0> queue 'single_dlx_worker_source' in vhost '/': enabling ra cluster changes in 4, index 6
2026-02-24 09:06:19.436411+00:00 [debug] <0.2377.0> Terminating <31028.1894.0> since <31122.2516.0> becomes active rabbit_fifo_dlx_worker
2026-02-24 09:06:19.437003+00:00 [debug] <0.2377.0> Terminating <31122.2516.0> since <0.2600.0> becomes active rabbit_fifo_dlx_worker
2026-02-24 09:06:19.437107+00:00 [warning] <0.2600.0> Failed to process command {dlx,{checkout,<0.2600.0>,2}} on quorum queue leader {'%2F_single_dlx_worker_source',
2026-02-24 09:06:19.437107+00:00 [warning] <0.2600.0>                                                                                 'rmq-ct-cluster_size_3-1-28000@localhost'} because actual leader is {'%2F_single_dlx_worker_source',
2026-02-24 09:06:19.437107+00:00 [warning] <0.2600.0>                                                                                                                                                      'rmq-ct-cluster_size_3-3-28144@localhost'}.
```

 ## How?

This commit supersedes #15548.

In this commit, we use `ra:pipeline_command/4` with a selective receive instead of
`ra:process_command/3` for the dlx checkout command. This prevents Ra from
automatically redirecting the checkout command to a new leader if a failover
happens while the command is being processed.

(cherry picked from commit 7768f99)
(cherry picked from commit 4495252)

# Conflicts:
#	deps/rabbit/src/rabbit_fifo_dlx_client.erl
@mergify mergify Bot added the conflicts label Apr 23, 2026
@mergify
Copy link
Copy Markdown
Author

mergify Bot commented Apr 23, 2026

Cherry-pick of 4495252 has failed:

On branch mergify/bp/v4.2.x/pr-16204
Your branch is up to date with 'origin/v4.2.x'.

You are currently cherry-picking commit 449525258d.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   deps/rabbit/test/rabbit_fifo_dlx_integration_SUITE.erl

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   deps/rabbit/src/rabbit_fifo_dlx_client.erl

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

@ansd ansd added this to the 4.2.7 milestone Apr 23, 2026
@ansd ansd merged commit 2299dc6 into v4.2.x Apr 23, 2026
300 checks passed
@ansd ansd deleted the mergify/bp/v4.2.x/pr-16204 branch April 23, 2026 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant