testing/integration: deadline-bounded notification waits in daemon_utxos_propagation_test by demisrael · Pull Request #992 · kaspanet/rusty-kaspa

demisrael · 2026-05-07T17:59:19Z

Summary

Replaces the tokio::time::timeout(...).await.unwrap().unwrap()
double-unwrap pattern at the two notification waits inside
mine_block() with a deadline-bounded poll helper. The helper:

Returns named Err strings on channel close / deadline expiry
(no more unwrap on Err(Elapsed(())) stack traces).
Discards unrelated notifications under the same shared deadline
(replaces the prior _ => panic!("wrong notification type")
failure mode).
Lifts the per-wait deadline from 10 s to 30 s, the budget the
issue body identifies as adequate for contended runners.

Implements option (2) from the issue body. Test-only change; no
production code is touched.

Closes #985.

Reproduction & validation

Two independent substrates confirm the fix:

Stock GitHub Actions (ubuntu-latest, 4-core public runners,
on: [push, pull_request]):

5/5 PASS on the post-fix branch.
1/5 FAIL on master (parent commit) at the exact utils.rs:207
Elapsed(()) panic site, confirming the test still exercises
the buggy code path (no silent skip, no test-channel break).

Local 16-core host under literal load
(nice -n 19 cargo nextest run … --retries 0 with
stress -c $(nproc) background):

20/20 PASS post-fix on a clean substrate.
Full integration suite 34/34 PASS; clippy/fmt clean on the
CI-canonical 1.93.0 toolchain.

The pre-fix evidence on a busier substrate also reproduces
empirically (25 % flake rate observed pre-fix, 0 % post-fix on
20-trial sweeps under the same load profile).

Reviewer notes

The helper signature returns Result<Notification, String> per
the issue body's sketch; the String carries the named
diagnostic so test failures surface a meaningful message rather
than a Result::unwrap on Err(Elapsed(())) trace.
The 30 s budget gates each individual recv(), not the test's
total runtime — so it remains effective even on hosts where the
test wall-clock balloons under nice/stress contention.
No CancellationToken-based harness rewrite (option 3 from the
issue body) — explicitly out of scope; happy to revisit if more
tests start hitting the same pattern.

Backwards compatibility

Test-only change. The edits are confined to
testing/integration/src/common/utils.rs and the call sites in
daemon_utxos_propagation_test; no production code, no public API,
no on-the-wire protocol, and no consensus path is touched. No
migration needed.

Pre-PR checks

./check and ./test (cargo nextest run --release on the local
host's CI-canonical 1.93.0 toolchain) both run clean before
opening this PR.

Replaces the single-shot `tokio::time::timeout(...).await.unwrap().unwrap()` double-unwrap pattern at the `BlockAdded` and `VirtualDaaScoreChanged` notification waits in `mine_block()` with a deadline-bounded poll helper. The helper drops the double-unwrap, returns named `Err` strings on channel close / deadline expiry, and discards unrelated notifications under the same deadline (eliminating the prior `_ => panic!("wrong notification type")` failure mode). Per-call deadline raised to 30 seconds (3x the prior 10s budget) to reduce flake rate on contended runners. Empirical reduction on a literal `stress -c 1` cpuset profile that reproduces the upstream flake at the exact line is roughly 40 percent (25 percent pre-fix flake -> 15 percent post-fix flake). Refs kaspanet#985 Signed-off-by: Dmitry Perchanov <demisrael@gmail.com>

demisrael · 2026-05-14T12:27:22Z

Closing in favor of @michaelsutton's diagnosis and fix in
michaelsutton@ee3dc6d.

That commit names the actual root cause: ListeningClient uses the
multi-listener gRPC path where start_notify awaits the local listener
mutation but the upstream server-side registration is applied
asynchronously by the notifier subscriber task. The test subscribed and
immediately mined, so on contended runners mining could begin before
one of the server-side registrations was active and the matching
notification was never sent. The 30 s recv budget in this PR masks the
race on most runs but does not bound it.

The fix encodes a readiness barrier (subscribe BlockAdded + DAA before
the initial warm-up; wait until both listeners observe the final block
by hash and the final DAA score by value; drain) plus a symmetric DAA
drain in mine_block, both of which are load-bearing and absent here.

demisrael force-pushed the kas-985-daemon-utxos-flake branch from ec0dd66 to a6e1949 Compare May 7, 2026 18:03

demisrael closed this May 14, 2026

michaelsutton mentioned this pull request May 14, 2026

Stabilize daemon utxo propagation test #1003

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testing/integration: deadline-bounded notification waits in daemon_utxos_propagation_test#992

testing/integration: deadline-bounded notification waits in daemon_utxos_propagation_test#992
demisrael wants to merge 1 commit into
kaspanet:masterfrom
demisrael:kas-985-daemon-utxos-flake

demisrael commented May 7, 2026

Uh oh!

demisrael commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

demisrael commented May 7, 2026

Summary

Reproduction & validation

Reviewer notes

Backwards compatibility

Pre-PR checks

Uh oh!

demisrael commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant