reverseproxy: Optionally detach stream (websockets) from config lifecycle by francislavoie · Pull Request #7649 · caddyserver/caddy

francislavoie · 2026-04-13T08:59:37Z

Summary

This PR hardens upgraded-stream handling in reverse_proxy to reduce config-retention pressure across reloads, improves safety around hijacked response paths, and adds stream-level observability.

Motivation

Long-lived upgraded streams (for example WebSocket/upgrade tunnels) should not unnecessarily retain full handler/config state across reloads. At the same time, middleware and response-writer behavior must remain safe once a connection is hijacked, and operators need clearer visibility into stream lifecycle and traffic.

What Changed

Added reload behavior controls for upgraded streams:
- New opt-in detached mode via stream_detached.
- Legacy behavior preserved as default (close upgraded streams on reload, with existing delay behavior).
- In detached mode, only streams targeting removed upstreams are closed.
Added handshake logging control:
- New stream_logs > skip_handshake option to suppress handshake access logging when desired.
- Default behavior remains unchanged (handshake is logged).
- Can have stream close logs emitted to access logs with logger_name access.
- Can change the log level from default DEBUG, usually to INFO to match access logs.
Refactored upgraded-stream tunnel lifecycle:
- Decoupled tunnel tracking from full config references.
- Threaded upstream address into finalize/upgrade handling so per-upstream cleanup is possible.
- Kept H2 extended-connect and detached HTTP/1.1 upgrade paths explicit.
Made response/middleware paths hijack-aware:
- Response recorder now tracks hijacked state and treats 101 as final upgrade response.
- Post-hijack write/finalize behavior is guarded.
- Returned raw hijacked connection to keep upgraded stream traffic off recorder hot path.
- Encode close/finalization path now skips inappropriate post-hijack processing.
Added reverse_proxy stream observability metrics:
- Active stream gauge.
- Stream total counter by close result.
- Stream duration histogram.
- Directional byte counters (to backend / from backend).
- Once-only lifecycle accounting helper to prevent double-finalization effects.

Tests

Expanded unit coverage for:
- Reload cleanup behavior (legacy close-all vs detached selective close).
- Caddyfile option parsing for new stream options.
- Stream metrics lifecycle accounting.
- Response recorder hijack/101 behavior.
- switchProtocolCopier sent/received accounting contract.
- Extended connect streaming (websockets over HTTP/2 or HTTP/3)
Added integration coverage for:
- Reload stress with long-lived upgraded streams and heap profile signal logging.
- Upgrade behavior through encode/intercept middleware chains.
- Targeted H2 upgrade/stream regression checks.

Compatibility

Backward compatible by default:
- Existing reload semantics remain unless stream_detached is enabled.
Logging behavior is unchanged unless stream_logs > skip_handshake is explicitly enabled.

Assistance Disclosure

Made heavy use of Github Copilot to plan and implement all this, using a mix of many different models through a long session.

I had AI produce a report of the memory usage before and after this PR:

Memory / GC report — upgrade stream lifecycle

Test configuration

Parameter	Value
Streams opened	1,200 upgraded WebSocket-style connections
Config reloads	24 (50 ms apart, spread across test)
Echo checks	Every 6 reloads (so at reloads 6, 12, 18, 24)
Heap snapshot points	before first reload · mid (after all reloads, before close_delay fires) · after full cleanup
GC before each snapshot	`runtime.GC()` + `debug.FreeOSMemory()`
Baseline commit	`1a3e900` (parent of this PR)
Test commit	`8e991e1`

Scenario matrix

Without commit 8e991e1 (baseline)

Scenario	Before	Mid	After	Delta	Objects before	Objects after
legacy (close on reload)	118.4 MiB	23.3 MiB	22.4 MiB	−96.0 MiB	219,675	39,550
close_delay=3s	120.6 MiB	120.9 MiB ✓	24.1 MiB	−96.5 MiB	219,941	42,388
stream_detached	—	—	—	N/A (not available)	—	—

With commit 8e991e1

Scenario	Before	Mid	After	Delta	Objects before	Objects after
legacy (close on reload)	100.4 MiB	20.5 MiB	20.1 MiB	−80.3 MiB	91,280	38,566
close_delay=3s	100.6 MiB	98.3 MiB ✓	20.5 MiB	−80.1 MiB	92,640	41,308
stream_detached	100.1 MiB	98.1 MiB ✓	98.1 MiB	−2.0 MiB*	91,807	86,771

* Streams intentionally detached — expected flat memory until connections close naturally.

✓ Mid snapshot taken while streams are still alive confirms memory reflects live connection state.

Stream survival at every intermediate echo check (every 6 reloads):

Scenario	Reload 6	Reload 12	Reload 18	Reload 24
legacy (both)	0/1200	0/1200	0/1200	0/1200
close_delay (both)	1200/1200	1200/1200	1200/1200	1200/1200 → 0 after delay
detached (with commit)	1200/1200	1200/1200	1200/1200	1200/1200

Why "before" heap is ~20 MiB lower with this commit

The most significant number is objects before first reload: 219,675 (baseline) vs 91,280 (with commit) — a reduction of 128,395 objects (~58%) with 1,200 live streams. This directly drives the lower baseline heap.

Root cause in old code: handleUpgradeResponse runs entirely inside the ServeHTTP goroutine and blocks for the full lifetime of the stream. That goroutine's stack keeps alive:

h *Handler — full handler config tree (upstreams, health checks, load balancer state, all middleware)
req *http.Request — full HTTP request including headers, context, body, TLS state
rw http.ResponseWriter — the entire middleware response-writer chain (logging recorder, encode writer, intercept recorder, metrics recorder, etc.)
res *http.Response — the upstream response
A second goroutine per stream for backConnCloseCh context-cancellation plumbing

With 1,200 streams that's 1,200 retained copies of all of the above.

With this commit: handleDetachedUpgradeTunnel detaches to a goroutine that captures only tunnelState (shared, small), conn, backConn, and a handful of scalar config values (buffer size, timeout duration). ServeHTTP returns immediately; the middleware chain, request, response, and handler are all released to the GC.

Per-stream detached object footprint: from ~183 objects down to ~76 objects.

Summary

	Baseline (no commit)	With commit	Improvement
Heap per 1200 active streams	~119 MiB	~100 MiB	−19 MiB (~16%)
Objects per 1200 active streams	~220K	~92K	−128K (~58%)
Heap after legacy cleanup	~22 MiB	~20 MiB	comparable
close_delay: streams alive during all 24 reloads	✓	✓	same behaviour
stream_detached: streams alive during all 24 reloads	N/A	✓	new capability
stream_detached: streams echo correctly at every reload	N/A	✓	new capability

francislavoie · 2026-04-13T09:48:33Z

I changed stream_log_skip_handshake into stream_logs with nested structure to allow changing the log level and logger name; it defaults to DEBUG under name http.handlers.reverse_proxy.stream, I figure people might want to use the access logs name and INFO level to see both the open and close parts of a websocket be logged, but these logs aren't identical format to typical access logs so I figure we shouldn't be on by default.

Geramy · 2026-04-18T00:17:19Z

can you do a bandwidth test through the proxy with cpu usage statistics before and after?

francislavoie · 2026-04-18T01:23:01Z

I don't think this will change bandwidth at all, the copy loop itself did not change. This will only affect memory usage and whether connections can be held through config reloads. But I can do some more benchmarking anyway to prove there's no regression, sure.

francislavoie · 2026-04-18T01:48:00Z

I had AI write and run a benchmark

Benchmark

Case	Avg Throughput MB/s	Avg Messages/s	Avg CPU (%)
master (old), retain=false	288.970	147,952.5	558.890
this (new), retain=false	288.084	147,498.9	560.957
this (new), retain=true	288.492	147,708.0	560.853

Comparison	Throughput Delta	Messages/s Delta	CPU Delta
this (new) retain=false vs master (old) retain=false	-0.31%	-0.31%	+0.37%
this (new) retain=true vs this (new) retain=false	+0.14%	+0.14%	-0.02%

Conclusion

No significant websocket CPU or bandwidth regression observed in this run matrix. Within margin of error.

Benchmark code

bench_websocket_regression.sh

#!/usr/bin/env bash
set -euo pipefail

ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)"
TMP_DIR="${TMPDIR:-/tmp}/caddy-ws-bench-$$"
MASTER_WT="$TMP_DIR/wt-master"
RESULTS_CSV="$ROOT_DIR/caddytest/integration/ws_benchmark_results.csv"

DURATION="${CADDY_BENCH_DURATION:-10s}"
CLIENTS="${CADDY_BENCH_CLIENTS:-32}"
PAYLOAD_BYTES="${CADDY_BENCH_PAYLOAD_BYTES:-1024}"
RUNS="${CADDY_BENCH_RUNS:-3}"

mkdir -p "$TMP_DIR"

cleanup() {
  git -C "$ROOT_DIR" worktree remove --force "$MASTER_WT" >/dev/null 2>&1 || true
  rm -rf "$TMP_DIR"
}
trap cleanup EXIT

echo "Preparing benchmark tooling and binaries..."

WSBENCH_BIN="$TMP_DIR/wsbench"
CURRENT_BIN="$TMP_DIR/caddy-current"
MASTER_BIN="$TMP_DIR/caddy-master"

GOFLAGS="${GOFLAGS:-}"

(
  cd "$ROOT_DIR"
  go build $GOFLAGS -o "$WSBENCH_BIN" ./caddytest/integration/wsbench
  go build $GOFLAGS -o "$CURRENT_BIN" ./cmd/caddy
)

git -C "$ROOT_DIR" worktree add --detach "$MASTER_WT" master >/dev/null
(
  cd "$MASTER_WT"
  go build $GOFLAGS -o "$MASTER_BIN" ./cmd/caddy
)

CLK_TCK="$(getconf CLK_TCK)"

run_case() {
  local ref_name="$1"
  local caddy_bin="$2"
  local retain="$3"

  local init_cfg="$TMP_DIR/init-${ref_name}-${retain}.caddyfile"
  cat > "$init_cfg" <<EOF
{
  admin 127.0.0.1:2999
  http_port 9080
  https_port 9443
  grace_period 1ns
  skip_install_trust
}

127.0.0.1:9080 {
  respond "boot"
}
EOF

  local caddy_log="$TMP_DIR/caddy-${ref_name}-${retain}.log"
  "$caddy_bin" run --config "$init_cfg" --adapter caddyfile >"$caddy_log" 2>&1 &
  local caddy_pid=$!

  local started=0
  for _ in $(seq 1 50); do
    if curl -fsS "http://127.0.0.1:2999/config/" >/dev/null 2>&1; then
      started=1
      break
    fi
    sleep 0.1
  done

  if [[ "$started" -ne 1 ]]; then
    echo "ERROR,$ref_name,$retain,failed_to_start" >> "$RESULTS_CSV"
    kill "$caddy_pid" >/dev/null 2>&1 || true
    wait "$caddy_pid" >/dev/null 2>&1 || true
    return
  fi

  for run in $(seq 1 "$RUNS"); do
    local cpu_before cpu_after
    cpu_before="$(awk '{print $14+$15}' "/proc/$caddy_pid/stat")"

    local bench_out code
    set +e
    bench_out="$($WSBENCH_BIN \
      --admin 127.0.0.1:2999 \
      --proxy 127.0.0.1:9080 \
      --duration "$DURATION" \
      --clients "$CLIENTS" \
      --payload-bytes "$PAYLOAD_BYTES" \
      --retain="$retain" 2>"$TMP_DIR/wsbench-${ref_name}-${retain}-${run}.err")"
    code=$?
    set -e
    if [[ "$code" -ne 0 ]]; then
      if [[ "$code" -eq 3 ]]; then
        echo "UNSUPPORTED,$ref_name,$retain,$run" >> "$RESULTS_CSV"
        continue
      fi
      echo "ERROR,$ref_name,$retain,$run,wsbench_failed" >> "$RESULTS_CSV"
      continue
    fi

    cpu_after="$(awk '{print $14+$15}' "/proc/$caddy_pid/stat")"

    IFS=',' read -r elapsed_sec messages bytes_total mbps msg_per_sec err_count <<< "$bench_out"

    local cpu_jiffies cpu_sec cpu_pct
    cpu_jiffies=$((cpu_after - cpu_before))
    cpu_sec="$(awk -v j="$cpu_jiffies" -v clk="$CLK_TCK" 'BEGIN { printf "%.6f", j/clk }')"
    cpu_pct="$(awk -v c="$cpu_sec" -v e="$elapsed_sec" 'BEGIN { if (e <= 0) { print "0.00" } else { printf "%.2f", (100*c)/e } }')"

    echo "RESULT,$ref_name,$retain,$run,$elapsed_sec,$messages,$bytes_total,$mbps,$msg_per_sec,$cpu_sec,$cpu_pct,$err_count" >> "$RESULTS_CSV"
  done

  curl -fsS -X POST "http://127.0.0.1:2999/stop" >/dev/null 2>&1 || true
  kill "$caddy_pid" >/dev/null 2>&1 || true
  wait "$caddy_pid" >/dev/null 2>&1 || true
}

cat > "$RESULTS_CSV" <<EOF
kind,ref,retain,run,elapsed_sec,messages,bytes_total,mbps,msg_per_sec,cpu_sec,cpu_pct,errors
EOF

echo "Running matrix: master/current x retain(false/true)"
run_case "master" "$MASTER_BIN" "false"
run_case "master" "$MASTER_BIN" "true"
run_case "current" "$CURRENT_BIN" "false"
run_case "current" "$CURRENT_BIN" "true"

echo
echo "Raw results written to: $RESULTS_CSV"
echo "Computing averages..."

awk -F',' '
BEGIN {
  print "";
  print "Averages by case:";
  print "ref,retain,samples,avg_mbps,avg_cpu_pct,avg_msg_per_sec,avg_errors";
}
$1 == "RESULT" {
  key = $2 "," $3;
  cnt[key]++;
  mbps[key] += $8;
  cpu[key] += $11;
  mps[key] += $9;
  errs[key] += $12;
}
END {
  for (k in cnt) {
    printf "%s,%d,%.3f,%.3f,%.1f,%.3f\n", k, cnt[k], mbps[k]/cnt[k], cpu[k]/cnt[k], mps[k]/cnt[k], errs[k]/cnt[k];
  }
}
' "$RESULTS_CSV"

wsbench/main.go

package main

import (
	"bufio"
	"errors"
	"flag"
	"fmt"
	"io"
	"net"
	"net/http"
	"net/textproto"
	"os"
	"strconv"
	"strings"
	"sync"
	"sync/atomic"
	"time"
)

type config struct {
	adminAddr   string
	proxyAddr   string
	duration    time.Duration
	clients     int
	payloadSize int
	retain      bool
}

type benchClient struct {
	conn   net.Conn
	reader *bufio.Reader
}

type backend struct {
	addr   string
	ln     net.Listener
	server *http.Server
	mu     sync.Mutex
	conns  map[net.Conn]struct{}
}

func main() {
	cfg, err := parseFlags()
	if err != nil {
		fatalf("invalid flags: %v", err)
	}

	be, err := startBackend()
	if err != nil {
		fatalf("start backend: %v", err)
	}
	defer be.close()

	if err := loadProxyConfig(cfg.adminAddr, cfg.proxyAddr, be.addr, cfg.retain); err != nil {
		if cfg.retain && strings.Contains(strings.ToLower(err.Error()), "stream_retain_on_reload") {
			fatalExitf(3, "retain mode unsupported by this ref: %v", err)
		}
		fatalf("load proxy config: %v", err)
	}

	clients := make([]*benchClient, 0, cfg.clients)
	for i := 0; i < cfg.clients; i++ {
		client, openErr := openUpgradedClient(cfg.proxyAddr)
		if openErr != nil {
			closeClients(clients)
			fatalf("open upgraded client %d: %v", i, openErr)
		}
		clients = append(clients, client)
	}
	defer closeClients(clients)

	payload := make([]byte, cfg.payloadSize)
	for i := range payload {
		payload[i] = byte('a' + (i % 26))
	}

	var (
		messages atomic.Uint64
		bytes    atomic.Uint64
		errorsN  atomic.Uint64
	)

	deadline := time.Now().Add(cfg.duration)
	start := time.Now()

	var wg sync.WaitGroup
	for _, client := range clients {
		wg.Add(1)
		go func(c *benchClient) {
			defer wg.Done()
			for time.Now().Before(deadline) {
				if err := c.echo(payload); err != nil {
					errorsN.Add(1)
					return
				}
				messages.Add(1)
				bytes.Add(uint64(len(payload) * 2))
			}
		}(client)
	}
	wg.Wait()

	elapsed := time.Since(start)
	elapsedSec := elapsed.Seconds()
	if elapsedSec <= 0 {
		elapsedSec = 1e-9
	}
	msgCount := messages.Load()
	bytesTotal := bytes.Load()
	mbps := (float64(bytesTotal) / (1024 * 1024)) / elapsedSec
	msgPerSec := float64(msgCount) / elapsedSec

	fmt.Printf("%.6f,%d,%d,%.6f,%.6f,%d\n", elapsedSec, msgCount, bytesTotal, mbps, msgPerSec, errorsN.Load())
}

func parseFlags() (config, error) {
	cfg := config{}
	flag.StringVar(&cfg.adminAddr, "admin", "127.0.0.1:2999", "caddy admin host:port")
	flag.StringVar(&cfg.proxyAddr, "proxy", "127.0.0.1:9080", "reverse proxy host:port")
	flag.DurationVar(&cfg.duration, "duration", 10*time.Second, "benchmark duration")
	flag.IntVar(&cfg.clients, "clients", 32, "concurrent upgraded clients")
	flag.IntVar(&cfg.payloadSize, "payload-bytes", 1024, "payload bytes per message")
	flag.BoolVar(&cfg.retain, "retain", false, "enable stream_retain_on_reload in reverse_proxy config")
	flag.Parse()

	if cfg.adminAddr == "" || cfg.proxyAddr == "" {
		return cfg, errors.New("admin and proxy addresses are required")
	}
	if cfg.duration <= 0 {
		return cfg, errors.New("duration must be > 0")
	}
	if cfg.clients <= 0 {
		return cfg, errors.New("clients must be > 0")
	}
	if cfg.payloadSize <= 0 {
		return cfg, errors.New("payload-bytes must be > 0")
	}
	return cfg, nil
}

func startBackend() (*backend, error) {
	ln, err := net.Listen("tcp", "127.0.0.1:0")
	if err != nil {
		return nil, err
	}

	be := &backend{
		addr:  ln.Addr().String(),
		ln:    ln,
		conns: make(map[net.Conn]struct{}),
	}
	be.server = &http.Server{Handler: http.HandlerFunc(be.serveHTTP)}

	go func() {
		_ = be.server.Serve(ln)
	}()

	return be, nil
}

func (b *backend) serveHTTP(w http.ResponseWriter, r *http.Request) {
	if !strings.EqualFold(r.Header.Get("Connection"), "Upgrade") || !strings.EqualFold(r.Header.Get("Upgrade"), "stress-stream") {
		http.Error(w, "upgrade required", http.StatusUpgradeRequired)
		return
	}

	hijacker, ok := w.(http.Hijacker)
	if !ok {
		http.Error(w, "hijack unsupported", http.StatusInternalServerError)
		return
	}

	conn, rw, err := hijacker.Hijack()
	if err != nil {
		return
	}
	b.trackConn(conn)

	_, _ = rw.WriteString("HTTP/1.1 101 Switching Protocols\r\nConnection: Upgrade\r\nUpgrade: stress-stream\r\n\r\n")
	_ = rw.Flush()

	go func() {
		defer b.untrackConn(conn)
		defer conn.Close()
		_, _ = io.Copy(conn, conn)
	}()
}

func (b *backend) trackConn(conn net.Conn) {
	b.mu.Lock()
	b.conns[conn] = struct{}{}
	b.mu.Unlock()
}

func (b *backend) untrackConn(conn net.Conn) {
	b.mu.Lock()
	delete(b.conns, conn)
	b.mu.Unlock()
}

func (b *backend) close() {
	_ = b.server.Close()
	_ = b.ln.Close()

	b.mu.Lock()
	defer b.mu.Unlock()
	for conn := range b.conns {
		_ = conn.Close()
	}
	clear(b.conns)
}

func loadProxyConfig(adminAddr, proxyAddr, backendAddr string, retain bool) error {
	host, port, err := net.SplitHostPort(proxyAddr)
	if err != nil {
		return err
	}
	httpPort, err := strconv.Atoi(port)
	if err != nil {
		return err
	}

	directives := ""
	if retain {
		directives = "\n\t\tstream_retain_on_reload"
	}

	config := fmt.Sprintf(`
{
	admin %s
	http_port %d
	https_port 9443
	grace_period 1ns
	skip_install_trust
}

%s:%d {
	reverse_proxy %s {%s
	}
}
`, adminAddr, httpPort, host, httpPort, backendAddr, directives)

	req, err := http.NewRequest(http.MethodPost, "http://"+adminAddr+"/load", strings.NewReader(config))
	if err != nil {
		return err
	}
	req.Header.Set("Content-Type", "text/caddyfile")

	client := &http.Client{Timeout: 15 * time.Second}
	resp, err := client.Do(req)
	if err != nil {
		return err
	}
	defer resp.Body.Close()

	body, _ := io.ReadAll(resp.Body)
	if resp.StatusCode != http.StatusOK {
		return fmt.Errorf("status=%d body=%s", resp.StatusCode, strings.TrimSpace(string(body)))
	}
	return nil
}

func openUpgradedClient(proxyAddr string) (*benchClient, error) {
	conn, err := net.DialTimeout("tcp", proxyAddr, 5*time.Second)
	if err != nil {
		return nil, err
	}

	request := strings.Join([]string{
		"GET /upgrade HTTP/1.1",
		"Host: " + proxyAddr,
		"Connection: Upgrade",
		"Upgrade: stress-stream",
		"",
		"",
	}, "\r\n")
	if _, err := io.WriteString(conn, request); err != nil {
		_ = conn.Close()
		return nil, err
	}

	reader := bufio.NewReader(conn)
	tproto := textproto.NewReader(reader)
	statusLine, err := tproto.ReadLine()
	if err != nil {
		_ = conn.Close()
		return nil, err
	}
	if !strings.Contains(statusLine, "101") {
		_ = conn.Close()
		return nil, fmt.Errorf("unexpected upgrade status: %s", statusLine)
	}

	headers, err := tproto.ReadMIMEHeader()
	if err != nil {
		_ = conn.Close()
		return nil, err
	}
	if !strings.EqualFold(headers.Get("Connection"), "Upgrade") {
		_ = conn.Close()
		return nil, fmt.Errorf("unexpected upgrade headers: %v", headers)
	}

	return &benchClient{conn: conn, reader: reader}, nil
}

func (c *benchClient) echo(payload []byte) error {
	deadline := time.Now().Add(2 * time.Second)
	if err := c.conn.SetWriteDeadline(deadline); err != nil {
		return err
	}
	if _, err := c.conn.Write(payload); err != nil {
		return err
	}
	if err := c.conn.SetReadDeadline(deadline); err != nil {
		return err
	}

	buf := make([]byte, len(payload))
	if _, err := io.ReadFull(c.reader, buf); err != nil {
		return err
	}
	if !equalBytes(buf, payload) {
		return errors.New("payload mismatch")
	}
	return nil
}

func closeClients(clients []*benchClient) {
	for _, client := range clients {
		if client != nil && client.conn != nil {
			_ = client.conn.Close()
		}
	}
}

func equalBytes(a, b []byte) bool {
	if len(a) != len(b) {
		return false
	}
	for i := range a {
		if a[i] != b[i] {
			return false
		}
	}
	return true
}

func fatalf(format string, args ...any) {
	_, _ = fmt.Fprintf(os.Stderr, format+"\n", args...)
	os.Exit(1)
}

func fatalExitf(code int, format string, args ...any) {
	_, _ = fmt.Fprintf(os.Stderr, format+"\n", args...)
	os.Exit(code)
}

ws_benchmark_results.csv

kind,ref,retain,run,elapsed_sec,messages,bytes_total,mbps,msg_per_sec,cpu_sec,cpu_pct,errors
RESULT,master,false,1,10.000151,1501963,3076020224,293.347710,150194.027639,56.390000,563.89,0
RESULT,master,false,2,10.000140,1449172,2967904256,283.037447,144915.173071,54.870000,548.69,0
RESULT,master,false,3,10.000117,1487499,3046397952,290.523736,148748.152953,56.410000,564.09,0
UNSUPPORTED,master,true,1
UNSUPPORTED,master,true,2
UNSUPPORTED,master,true,3
RESULT,current,false,1,10.000107,1517738,3108327424,296.430023,151772.171727,57.810000,578.09,0
RESULT,current,false,2,10.000185,1397386,2861846528,272.921912,139736.018726,53.010000,530.09,0
RESULT,current,false,3,10.000112,1509902,3092279296,294.899419,150988.502572,57.470000,574.69,0
RESULT,current,true,1,10.000187,1489710,3050926080,290.953558,148968.221594,56.590000,565.89,0
RESULT,current,true,2,10.000092,1450261,2970134528,283.251498,145024.767135,54.980000,549.79,0
RESULT,current,true,3,10.000406,1491372,3054329856,291.271780,149131.151285,56.690000,566.88,0

Ran with: CADDY_BENCH_DURATION=10s CADDY_BENCH_RUNS=3 CADDY_BENCH_CLIENTS=32 CADDY_BENCH_PAYLOAD_BYTES=1024 ./caddytest/integration/bench_websocket_regression.sh

Geramy · 2026-04-18T02:09:19Z

This is amazing! I'm glad you where able to find a solution! This should be merged asap then caddy won't have any competition :)

WeidiDeng

A quick review.

I'll look into the details of the new copying mechanics and logs later, possibly testing locally.

mmm444

Thank you for the effort on this topic!

I have found some time for a quick review of the core of the changes in reverseproxy.go and streaming.go. Unfortunately I think the issue discussed on line 542 in reverseproxy.go is substantial and will require some minor redesign. Other comments are just nitpicks.

Please take my review with a grain of salt, as it’s been a while since I last contributed to this project and I also did not run anything, just read the code.

francislavoie · 2026-04-21T12:33:57Z

I re-ran the memory usage report, got these numbers after the changes:

Current memory report (this branch, current HEAD):

legacy: before 117.7 MiB, mid 23.1 MiB, after 22.2 MiB, delta -95.6 MiB, objects 221040 -> 38501
close_delay: before 119.6 MiB, mid 119.9 MiB, after 23.3 MiB, delta -96.2 MiB, objects 221364 -> 41024
retain: before 99.5 MiB, mid 97.8 MiB, after 97.8 MiB, delta -1.8 MiB, objects 89152 -> 84580

So essentially, the detached/retain memory usage is basically the same (still ~20MB lower than legacy), but now legacy and close_delay now uses the same amount of memory as before this PR.

The changes since the initial PR make it so this will not have any significant effect on existing configs, but enabling retain mode can reduce memory usage when there's a lot of connections.

I think I should probably rename stream_retain_on_reload to stream_detached. The name would more directly relate to the behaviour rather than to the side effect of its relationship to config reloads. Edit: I did so

mholt · 2026-04-21T14:39:58Z

Thanks Francis. Given the performance, do you think that this should just be the default behavior then?

francislavoie · 2026-04-21T16:37:49Z

Eventually yes I think we will likely want to change the default, but I think playing it safe with an opt-in is better. We'll want to hear how stable it is out in the wild.

…ycle

francislavoie added this to the 2.x milestone Apr 13, 2026

francislavoie requested review from WeidiDeng and mholt April 13, 2026 08:59

francislavoie added the feature ⚙️ New feature or request label Apr 13, 2026

francislavoie force-pushed the proxy-stream-detached branch from a3b339d to 5a1ace3 Compare April 18, 2026 01:26

WeidiDeng requested changes Apr 18, 2026

View reviewed changes

francislavoie requested a review from WeidiDeng April 18, 2026 18:20

francislavoie force-pushed the proxy-stream-detached branch 4 times, most recently from facbed8 to db86fda Compare April 19, 2026 22:16

mmm444 reviewed Apr 20, 2026

View reviewed changes

francislavoie requested a review from mmm444 April 21, 2026 12:17

francislavoie mentioned this pull request Apr 24, 2026

caddyhttp, reverseproxy: experimental WebTransport passthrough #7669

Open

7 tasks

francislavoie added 4 commits April 25, 2026 05:38

reverseproxy: Optionally detach stream (websockets) from config lifec…

b68e9bf

…ycle

lint

daea778

Improved logging facilities

307dfd0

Adjustments from Weidi's review

7ef9ecd

WeidiDeng and others added 13 commits April 25, 2026 05:38

record bytes read and written for response writers unless detached

b9b1202

simplify streaming handling

e7055d8

correctly close detached streams

cee04ab

make handleUpgradeTunnel a standalone func

ccc76ac

fix tests

6ba6cf5

fix tests

f970f39

change the log level if hijacking without writing a status code first

ed44e4d

only clean up connections when stopped

733aaba

Move type and const down to the bottom

1b8d60c

Rename to tunnelTracker, reflow some comments

e3b1bf8

Add note about capturing h

558ec22

Rename to stream_detached

97f5fe0

More comments

eeb13f1

francislavoie force-pushed the proxy-stream-detached branch from 12139f2 to eeb13f1 Compare April 25, 2026 09:42

Uh oh!

Conversation

francislavoie commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

What Changed

Tests

Compatibility

Assistance Disclosure

Memory / GC report — upgrade stream lifecycle

Test configuration

Scenario matrix

Why "before" heap is ~20 MiB lower with this commit

Summary

Uh oh!

francislavoie commented Apr 13, 2026

Uh oh!

Geramy commented Apr 18, 2026

Uh oh!

francislavoie commented Apr 18, 2026

Uh oh!

francislavoie commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark

Conclusion

Benchmark code

Uh oh!

Geramy commented Apr 18, 2026

Uh oh!

WeidiDeng left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mmm444 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

francislavoie commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mholt commented Apr 21, 2026

Uh oh!

francislavoie commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

francislavoie commented Apr 13, 2026 •

edited

Loading

francislavoie commented Apr 18, 2026 •

edited

Loading

mmm444 left a comment •

edited

Loading

francislavoie commented Apr 21, 2026 •

edited

Loading