introduce a regression test for #8056#8068
Conversation
|
(Failures on integration tests look to be a GH failure, if someone can retry them.) |
Lately gh jobs are pretty unstable, I will re-run them |
|
Thank you! |
| // Run multiple times to make hitting the race condition very likely. | ||
| for _ in 0..512 { |
There was a problem hiding this comment.
Since this is a race condition that we have to run multiple iterations in order to catch, I wonder if this might make sense to try and reproduce using loom (perhaps in addition to a normal test). It would be nice if we could use the model checker to deterministically simulate the interleaving that reproduces the bug, rather than just trying to run some code a whole bunch of times and hope one of them hits the race. It's hard to say for sure without a better understanding of the bug, but I wonder if loom's deadlock detection could catch this...
There was a problem hiding this comment.
I originally thought loom would be a good fit, but this ended up being a quicker path for me to get a reproducer. Is there an existing loom test you think would be a good model for this type of test?
Co-authored-by: Eliza Weisman <eliza@elizas.website>
|
Ok, took me a while, but there's now a description of the bug + how this test attempts to replicate it. Let me know if they make sense. |
|
Since this is a racy test that's not guaranteed to catch the issue every time, can you please try pushing the queue sharding code to see if the error triggers in CI as well. This will verify that the perf characteristics of the github instances do not adjust timings so that it does not occur. |
|
Ok, cherry-picked the sharded queue on this. Assuming the failures look good, I can then rebase it back off. |
|
Looks good, thanks! |
|
Ok, rebased that commit out |
refs #8067