fix(ws): break reconnect-handler Arc cycle that leaked WebSocket clients (#325)#334
Open
skyc1e wants to merge 3 commits intoPolymarket:mainfrom
Open
fix(ws): break reconnect-handler Arc cycle that leaked WebSocket clients (#325)#334skyc1e wants to merge 3 commits intoPolymarket:mainfrom
skyc1e wants to merge 3 commits intoPolymarket:mainfrom
Conversation
`SubscriptionManager::start_reconnection_handler` (both `clob::ws::subscription` and `rtds::subscription`) spawns a detached `tokio` task that holds a strong `Arc<SubscriptionManager>` clone. The task also owns a cloned `ConnectionManager`, whose `watch::Sender` is what the task awaits on for state changes. Because the `Sender` only closes when every clone of it drops, and the strong `Arc` clone inside the spawned task prevents the owning manager (and therefore that `ConnectionManager` clone) from ever dropping, the task can never exit on its own — a reference cycle that leaked the entire channel (task, WebSocket, subscription manager) for the lifetime of the process every time a user dropped a `WsClient`. `start_reconnection_handler` now returns its `JoinHandle<()>`. A new `ws::task::AbortOnDrop` wrapper (thin `JoinHandle` + `Drop` that calls `abort()`) is stored next to the `Arc<SubscriptionManager>` inside `clob::ws::ChannelResources` and `rtds::ClientInner`. When the owning client drops, the wrapper drops, the task is aborted, its stack locals (including the strong `Arc` clone) are released, and the whole graph can drop normally. Public API is unchanged. A regression test in `clob::ws::subscription` constructs a `SubscriptionManager`, drops it together with the `AbortOnDrop`, and asserts the `Weak` downgrade cannot upgrade — i.e. the strong count reached zero. Unit tests for `AbortOnDrop` cover the cancel-pending and already-finished paths. Closes Polymarket#325
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #334 +/- ##
==========================================
+ Coverage 85.54% 86.89% +1.34%
==========================================
Files 32 33 +1
Lines 5167 5325 +158
==========================================
+ Hits 4420 4627 +207
+ Misses 747 698 -49
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Codecov flagged the reconnect-handle plumbing on the RTDS side as uncovered — `Client::new`, `Client::authenticate`, `Client::deauthenticate`, and `SubscriptionManager::start_reconnection_handler` each had zero patch coverage, so the regression fence for issue Polymarket#325 only caught the clob::ws path. Add focused regression tests without widening the public surface: * `rtds::subscription::reconnect_handler_tests` mirrors the clob::ws test: constructs an `Arc<SubscriptionManager>`, calls `start_reconnection_handler`, wraps in `AbortOnDrop`, drops, and asserts the `Weak` downgrade cannot upgrade. Proves the same Arc cycle is broken on the RTDS code path. * `rtds::client::teardown_tests` covers the three `reconnect_handle` field sites in `Client::{new, authenticate, deauthenticate}` via `tokio::test`s that poll a `Weak<SubscriptionManager>` after drop. `authenticate` and `deauthenticate` tests also assert the manager is _still_ reachable mid-round-trip, ensuring the `AbortOnDrop` is forwarded into the new `ClientInner` rather than dropped early. * A `#[cfg(test)]`-only `inner_subscriptions_for_test` accessor on `Client<S>` provides the weak-ref probe without exposing the private `ClientInner.subscriptions` field. Gated behind `#[expect(clippy::multiple_inherent_impl, ...)]` with a reason, per crate style. These tests exercise the same `AbortOnDrop::new` + `Drop` paths the existing `ws::task::tests` already cover, transitively raising `ws/task.rs` coverage as well.
Zorak556
added a commit
to Zorak556/polymarket-client-sdk
that referenced
this pull request
Apr 17, 2026
… sites Local fork patches in b270478 and e79fc91 made `subscribe_user_events`, `subscribe_orders`, `subscribe_trades`, and `heartbeats_active` async without updating the SDK's own tests/examples. The breakage was masked by b4a5614's library compilation errors; PR Polymarket#334 fixed the lib and exposed the stale call sites. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #325.
SubscriptionManager::start_reconnection_handlerspawned a detachedtokiotask that held a strongArc<SubscriptionManager>— but the task also owned a clonedConnectionManagerwhosewatch::Senderis what the task awaits on. TheSendercouldn't close until every clone dropped, and the strongArcinside the task kept the manager (and thatSenderclone) alive indefinitely. A classic "task keeps itself alive by waiting on a signal only it can send" cycle, producing one leaked WebSocket + task + subscription manager per droppedWsClient/rtds::Client.The same bug existed in both
src/clob/ws/subscription.rsandsrc/rtds/subscription.rs; both are fixed here.Fix
Smallest diff that actually breaks the cycle:
start_reconnection_handlernow returnsJoinHandle<()>(bothclob::wsandrtds). Signature is the only public-ish change; all internal callers updated.ws::task::AbortOnDrop— thinJoinHandle<()>wrapper whoseDropcallsabort(). Single helper lives alongside the cycle it's paired with, documented with the full failure mode so future readers don't have to reconstruct it.ChannelResources(clob) andClientInner<S>(rtds) each store anAbortOnDropnext to theirArc<SubscriptionManager>. When the owning client (or a single removed channel, viadeauthenticate) drops, the wrapper drops, the task is aborted, its stack locals — including the strongArcclone — are released, and the whole graph drops normally.Why
AbortOnDropinstead of a directDrop for ChannelResources/Drop for ClientInner:ClientInnerhas a field-move flow (authenticate/deauthenticate) that destructures the struct. Onceimpl Dropis added to a struct, partial-move-and-rebuild patterns stop compiling. Wrapping the handle isolates the Drop to the wrapper and lets the outer struct stay move-friendly.Scope
src/clob/ws/subscription.rs— returnJoinHandle<()>, doc the contract; add the regression test.src/clob/ws/client.rs—ChannelResourcesgainsreconnect_handle: AbortOnDrop(annotated#[expect(dead_code, reason = "Field held only for its Drop side effect")]).src/rtds/subscription.rs— returnJoinHandle<()>, doc the contract.src/rtds/client.rs—ClientInnergains the samereconnect_handle: AbortOnDropfield;authenticate/deauthenticatethread it through.src/ws/task.rs(new,pub(crate)) — theAbortOnDropwrapper + module-level docs for the cycle.src/ws/mod.rs— register the new private submodule.No public API / type breakage. The
start_reconnection_handlerreturn type changes from()toJoinHandle<()>, but the method ispubon a type exposed primarily through the higher-levelClients, and no internal or downstream caller (that I can see) relied on the()return.Test plan
clob::ws::subscription::reconnect_handler_tests::aborting_reconnect_handle_releases_subscription_manager— constructs aSubscriptionManager+ callsstart_reconnection_handler, confirmsArc::strong_count >= 2while the task is live (proves the cycle pre-condition), drops both, and pollsWeak::strong_countuntil it reaches zero (or fails after 2s with the observed count). Without this PR's fix the strong count never decrements and the test times out.ws::task::tests::abort_on_drop_cancels_pending_task— pending future never completes after wrapper is dropped.ws::task::tests::abort_on_drop_is_noop_for_finished_task—abort()on an already-finished task doesn't panic.cargo check/cargo test— I don't have a Rust toolchain available in this sandbox, so the tests above are my best-effort and the broader crate test suite is unverified from my end. Happy to iterate if CI surfaces anything.Notes
JoinHandle+Drop"); I followed it.ConnectionManagercontract, no new dependencies. The existingtokiofeature-gating onws/rtdsalready covers the new module.clob::wsbut missing the RTDS side would leave a silent leak on that path.Note
Medium Risk
Touches WebSocket client lifecycle by changing reconnection-handler task ownership/cancellation; if mishandled it could stop automatic resubscription or abort tasks unexpectedly, but changes are localized and covered by new regression tests.
Overview
Fixes a WebSocket resource leak (issue #325) by making both CLOB and RTDS
SubscriptionManager::start_reconnection_handlerreturn aJoinHandle<()>and ensuring callers retain and cancel that task on teardown.Introduces
ws::task::AbortOnDrop(registered asws::task) and stores it alongside each channel/client’sArc<SubscriptionManager>(ChannelResourcesin CLOB WS andClientInnerin RTDS), so dropping the client/channel aborts the reconnection task and breaks theArc/watchreference cycle.Adds regression tests in both subscription modules plus RTDS client teardown tests to assert
SubscriptionManagerinstances are released after dropping the abort handle/client.Reviewed by Cursor Bugbot for commit d42a0d4. Bugbot is set up for automated code reviews on this repo. Configure here.