Draft
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates message routing/state handling to support subnet deletion by (1) pruning stale outgoing streams to subnets removed from the registry/network topology and (2) avoiding critical-error metrics for best-effort responses that become unroutable due to subnet deletion. It also adds a state machine test that exercises stream pruning plus expected handling of unroutable requests/responses after deletion.
Changes:
- Add
ReplicatedState::discard_streams_for_deleted_subnets()to remove streams whose destination subnet no longer exists inNetworkTopology. - Invoke stream pruning in
StateMachineImpl::execute_round()after the induction phase. - Treat unroutable best-effort responses as non-critical in
StreamBuilderImpl, and add a test covering the deleted-subnet scenario end-to-end.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| rs/replicated_state/src/replicated_state.rs | Adds an API to discard streams whose destination subnet is no longer present in the applied topology. |
| rs/messaging/src/state_machine.rs | Calls the new pruning method after induction to clean up stale streams before execution/routing. |
| rs/messaging/src/routing/stream_builder.rs | Downgrades logging/metrics for unroutable best-effort responses to avoid raising critical errors for expected deletion cases. |
| rs/messaging/src/state_machine/tests.rs | Adds a regression test ensuring stale streams are removed and no critical errors are emitted for expected unroutable best-effort messages. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR implements DSM adjustments to support subnet deletion under the following assumptions:
To this end, two changes are needed:
When a subnet is deleted from the registry, the local subnet's outgoing stream to it is now explicitly discarded as part of the same round's execution, after the induction phase. This is safe because all certified stream slices from the deleted subnet (produced before its key was removed from the registry) are inducted before the stream is dropped, and no new certified slices can arrive after deletion. The stream builder's existing route() returning None for deleted subnets already prevents a fresh stream from being recreated. Prior to this change, stale streams to deleted subnets would persist in ReplicatedState indefinitely.
No critical errors are raised for discarded best-effort responses that cannot be routed to the deleted subnet (which is expected).
A new test ensures that the (outbound) stream is indeed deleted, no critical errors are raised when dropping a bounded-wait response to the deleted subnet (both canister request and subnet message), and rejects responses are generated for requests already in the output queues to the mgmt canister and regular canister on the deleted subnet. Another test ensures that the (inbound) certified stream is garbage collected.