reverseproxy: Optionally detach stream (websockets) from config lifecycle#7649
reverseproxy: Optionally detach stream (websockets) from config lifecycle#7649francislavoie wants to merge 17 commits intomasterfrom
Conversation
|
I changed |
|
can you do a bandwidth test through the proxy with cpu usage statistics before and after? |
|
I don't think this will change bandwidth at all, the copy loop itself did not change. This will only affect memory usage and whether connections can be held through config reloads. But I can do some more benchmarking anyway to prove there's no regression, sure. |
a3b339d to
5a1ace3
Compare
|
I had AI write and run a benchmark Benchmark
ConclusionNo significant websocket CPU or bandwidth regression observed in this run matrix. Within margin of error. Benchmark code
|
|
This is amazing! I'm glad you where able to find a solution! This should be merged asap then caddy won't have any competition :) |
WeidiDeng
left a comment
There was a problem hiding this comment.
A quick review.
I'll look into the details of the new copying mechanics and logs later, possibly testing locally.
facbed8 to
db86fda
Compare
There was a problem hiding this comment.
Thank you for the effort on this topic!
I have found some time for a quick review of the core of the changes in reverseproxy.go and streaming.go. Unfortunately I think the issue discussed on line 542 in reverseproxy.go is substantial and will require some minor redesign. Other comments are just nitpicks.
Please take my review with a grain of salt, as it’s been a while since I last contributed to this project and I also did not run anything, just read the code.
|
I re-ran the memory usage report, got these numbers after the changes: Current memory report (this branch, current HEAD):
So essentially, the detached/retain memory usage is basically the same (still ~20MB lower than legacy), but now legacy and close_delay now uses the same amount of memory as before this PR. The changes since the initial PR make it so this will not have any significant effect on existing configs, but enabling retain mode can reduce memory usage when there's a lot of connections. I think I should probably rename |
|
Thanks Francis. Given the performance, do you think that this should just be the default behavior then? |
|
Eventually yes I think we will likely want to change the default, but I think playing it safe with an opt-in is better. We'll want to hear how stable it is out in the wild. |
12139f2 to
eeb13f1
Compare
Summary
This PR hardens upgraded-stream handling in reverse_proxy to reduce config-retention pressure across reloads, improves safety around hijacked response paths, and adds stream-level observability.
Motivation
Long-lived upgraded streams (for example WebSocket/upgrade tunnels) should not unnecessarily retain full handler/config state across reloads. At the same time, middleware and response-writer behavior must remain safe once a connection is hijacked, and operators need clearer visibility into stream lifecycle and traffic.
What Changed
Added reload behavior controls for upgraded streams:
stream_detached.Added handshake logging control:
stream_logs>skip_handshakeoption to suppress handshake access logging when desired.logger_name access.Refactored upgraded-stream tunnel lifecycle:
Made response/middleware paths hijack-aware:
Added reverse_proxy stream observability metrics:
Tests
Expanded unit coverage for:
switchProtocolCopiersent/received accounting contract.Added integration coverage for:
Compatibility
stream_detachedis enabled.stream_logs>skip_handshakeis explicitly enabled.Assistance Disclosure
Made heavy use of Github Copilot to plan and implement all this, using a mix of many different models through a long session.
I had AI produce a report of the memory usage before and after this PR:
Memory / GC report — upgrade stream lifecycle
Test configuration
runtime.GC()+debug.FreeOSMemory()1a3e900(parent of this PR)8e991e1Scenario matrix
Without commit
8e991e1(baseline)With commit
8e991e1* Streams intentionally detached — expected flat memory until connections close naturally.
✓ Mid snapshot taken while streams are still alive confirms memory reflects live connection state.
Stream survival at every intermediate echo check (every 6 reloads):
Why "before" heap is ~20 MiB lower with this commit
The most significant number is objects before first reload:
219,675(baseline) vs91,280(with commit) — a reduction of 128,395 objects (~58%) with 1,200 live streams. This directly drives the lower baseline heap.Root cause in old code:
handleUpgradeResponseruns entirely inside the ServeHTTP goroutine and blocks for the full lifetime of the stream. That goroutine's stack keeps alive:h *Handler— full handler config tree (upstreams, health checks, load balancer state, all middleware)req *http.Request— full HTTP request including headers, context, body, TLS staterw http.ResponseWriter— the entire middleware response-writer chain (logging recorder, encode writer, intercept recorder, metrics recorder, etc.)res *http.Response— the upstream responsebackConnCloseChcontext-cancellation plumbingWith 1,200 streams that's 1,200 retained copies of all of the above.
With this commit:
handleDetachedUpgradeTunneldetaches to a goroutine that captures onlytunnelState(shared, small),conn,backConn, and a handful of scalar config values (buffer size, timeout duration). ServeHTTP returns immediately; the middleware chain, request, response, and handler are all released to the GC.Per-stream detached object footprint: from ~183 objects down to ~76 objects.
Summary