Skip to content

reverseproxy: Optionally detach stream (websockets) from config lifecycle#7649

Open
francislavoie wants to merge 17 commits intomasterfrom
proxy-stream-detached
Open

reverseproxy: Optionally detach stream (websockets) from config lifecycle#7649
francislavoie wants to merge 17 commits intomasterfrom
proxy-stream-detached

Conversation

@francislavoie
Copy link
Copy Markdown
Member

@francislavoie francislavoie commented Apr 13, 2026

Summary

This PR hardens upgraded-stream handling in reverse_proxy to reduce config-retention pressure across reloads, improves safety around hijacked response paths, and adds stream-level observability.

Motivation

Long-lived upgraded streams (for example WebSocket/upgrade tunnels) should not unnecessarily retain full handler/config state across reloads. At the same time, middleware and response-writer behavior must remain safe once a connection is hijacked, and operators need clearer visibility into stream lifecycle and traffic.

What Changed

  1. Added reload behavior controls for upgraded streams:

    • New opt-in detached mode via stream_detached.
    • Legacy behavior preserved as default (close upgraded streams on reload, with existing delay behavior).
    • In detached mode, only streams targeting removed upstreams are closed.
  2. Added handshake logging control:

    • New stream_logs > skip_handshake option to suppress handshake access logging when desired.
    • Default behavior remains unchanged (handshake is logged).
    • Can have stream close logs emitted to access logs with logger_name access.
    • Can change the log level from default DEBUG, usually to INFO to match access logs.
  3. Refactored upgraded-stream tunnel lifecycle:

    • Decoupled tunnel tracking from full config references.
    • Threaded upstream address into finalize/upgrade handling so per-upstream cleanup is possible.
    • Kept H2 extended-connect and detached HTTP/1.1 upgrade paths explicit.
  4. Made response/middleware paths hijack-aware:

    • Response recorder now tracks hijacked state and treats 101 as final upgrade response.
    • Post-hijack write/finalize behavior is guarded.
    • Returned raw hijacked connection to keep upgraded stream traffic off recorder hot path.
    • Encode close/finalization path now skips inappropriate post-hijack processing.
  5. Added reverse_proxy stream observability metrics:

    • Active stream gauge.
    • Stream total counter by close result.
    • Stream duration histogram.
    • Directional byte counters (to backend / from backend).
    • Once-only lifecycle accounting helper to prevent double-finalization effects.

Tests

  1. Expanded unit coverage for:

    • Reload cleanup behavior (legacy close-all vs detached selective close).
    • Caddyfile option parsing for new stream options.
    • Stream metrics lifecycle accounting.
    • Response recorder hijack/101 behavior.
    • switchProtocolCopier sent/received accounting contract.
    • Extended connect streaming (websockets over HTTP/2 or HTTP/3)
  2. Added integration coverage for:

    • Reload stress with long-lived upgraded streams and heap profile signal logging.
    • Upgrade behavior through encode/intercept middleware chains.
    • Targeted H2 upgrade/stream regression checks.

Compatibility

  1. Backward compatible by default:
    • Existing reload semantics remain unless stream_detached is enabled.
  2. Logging behavior is unchanged unless stream_logs > skip_handshake is explicitly enabled.

Assistance Disclosure

Made heavy use of Github Copilot to plan and implement all this, using a mix of many different models through a long session.


I had AI produce a report of the memory usage before and after this PR:

Memory / GC report — upgrade stream lifecycle

Test configuration

Parameter Value
Streams opened 1,200 upgraded WebSocket-style connections
Config reloads 24 (50 ms apart, spread across test)
Echo checks Every 6 reloads (so at reloads 6, 12, 18, 24)
Heap snapshot points before first reload · mid (after all reloads, before close_delay fires) · after full cleanup
GC before each snapshot runtime.GC() + debug.FreeOSMemory()
Baseline commit 1a3e900 (parent of this PR)
Test commit 8e991e1

Scenario matrix

Without commit 8e991e1 (baseline)

Scenario Before Mid After Delta Objects before Objects after
legacy (close on reload) 118.4 MiB 23.3 MiB 22.4 MiB −96.0 MiB 219,675 39,550
close_delay=3s 120.6 MiB 120.9 MiB ✓ 24.1 MiB −96.5 MiB 219,941 42,388
stream_detached N/A (not available)

With commit 8e991e1

Scenario Before Mid After Delta Objects before Objects after
legacy (close on reload) 100.4 MiB 20.5 MiB 20.1 MiB −80.3 MiB 91,280 38,566
close_delay=3s 100.6 MiB 98.3 MiB ✓ 20.5 MiB −80.1 MiB 92,640 41,308
stream_detached 100.1 MiB 98.1 MiB ✓ 98.1 MiB −2.0 MiB* 91,807 86,771

* Streams intentionally detached — expected flat memory until connections close naturally.

✓ Mid snapshot taken while streams are still alive confirms memory reflects live connection state.

Stream survival at every intermediate echo check (every 6 reloads):

Scenario Reload 6 Reload 12 Reload 18 Reload 24
legacy (both) 0/1200 0/1200 0/1200 0/1200
close_delay (both) 1200/1200 1200/1200 1200/1200 1200/1200 → 0 after delay
detached (with commit) 1200/1200 1200/1200 1200/1200 1200/1200

Why "before" heap is ~20 MiB lower with this commit

The most significant number is objects before first reload: 219,675 (baseline) vs 91,280 (with commit) — a reduction of 128,395 objects (~58%) with 1,200 live streams. This directly drives the lower baseline heap.

Root cause in old code: handleUpgradeResponse runs entirely inside the ServeHTTP goroutine and blocks for the full lifetime of the stream. That goroutine's stack keeps alive:

  • h *Handler — full handler config tree (upstreams, health checks, load balancer state, all middleware)
  • req *http.Request — full HTTP request including headers, context, body, TLS state
  • rw http.ResponseWriter — the entire middleware response-writer chain (logging recorder, encode writer, intercept recorder, metrics recorder, etc.)
  • res *http.Response — the upstream response
  • A second goroutine per stream for backConnCloseCh context-cancellation plumbing

With 1,200 streams that's 1,200 retained copies of all of the above.

With this commit: handleDetachedUpgradeTunnel detaches to a goroutine that captures only tunnelState (shared, small), conn, backConn, and a handful of scalar config values (buffer size, timeout duration). ServeHTTP returns immediately; the middleware chain, request, response, and handler are all released to the GC.

Per-stream detached object footprint: from ~183 objects down to ~76 objects.

Summary

Baseline (no commit) With commit Improvement
Heap per 1200 active streams ~119 MiB ~100 MiB −19 MiB (~16%)
Objects per 1200 active streams ~220K ~92K −128K (~58%)
Heap after legacy cleanup ~22 MiB ~20 MiB comparable
close_delay: streams alive during all 24 reloads same behaviour
stream_detached: streams alive during all 24 reloads N/A new capability
stream_detached: streams echo correctly at every reload N/A new capability

@francislavoie francislavoie added this to the 2.x milestone Apr 13, 2026
@francislavoie francislavoie added the feature ⚙️ New feature or request label Apr 13, 2026
@francislavoie
Copy link
Copy Markdown
Member Author

I changed stream_log_skip_handshake into stream_logs with nested structure to allow changing the log level and logger name; it defaults to DEBUG under name http.handlers.reverse_proxy.stream, I figure people might want to use the access logs name and INFO level to see both the open and close parts of a websocket be logged, but these logs aren't identical format to typical access logs so I figure we shouldn't be on by default.

@Geramy
Copy link
Copy Markdown

Geramy commented Apr 18, 2026

can you do a bandwidth test through the proxy with cpu usage statistics before and after?

@francislavoie
Copy link
Copy Markdown
Member Author

I don't think this will change bandwidth at all, the copy loop itself did not change. This will only affect memory usage and whether connections can be held through config reloads. But I can do some more benchmarking anyway to prove there's no regression, sure.

@francislavoie francislavoie force-pushed the proxy-stream-detached branch from a3b339d to 5a1ace3 Compare April 18, 2026 01:26
@francislavoie
Copy link
Copy Markdown
Member Author

francislavoie commented Apr 18, 2026

I had AI write and run a benchmark

Benchmark

Case Avg Throughput MB/s Avg Messages/s Avg CPU (%) Avg Errors
master (old), retain=false 288.970 147,952.5 558.890 0.000
this (new), retain=false 288.084 147,498.9 560.957 0.000
this (new), retain=true 288.492 147,708.0 560.853 0.000
Comparison Throughput Delta Messages/s Delta CPU Delta
this (new) retain=false vs master (old) retain=false -0.31% -0.31% +0.37%
this (new) retain=true vs this (new) retain=false +0.14% +0.14% -0.02%

Conclusion

No significant websocket CPU or bandwidth regression observed in this run matrix. Within margin of error.

Benchmark code

bench_websocket_regression.sh

#!/usr/bin/env bash
set -euo pipefail

ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)"
TMP_DIR="${TMPDIR:-/tmp}/caddy-ws-bench-$$"
MASTER_WT="$TMP_DIR/wt-master"
RESULTS_CSV="$ROOT_DIR/caddytest/integration/ws_benchmark_results.csv"

DURATION="${CADDY_BENCH_DURATION:-10s}"
CLIENTS="${CADDY_BENCH_CLIENTS:-32}"
PAYLOAD_BYTES="${CADDY_BENCH_PAYLOAD_BYTES:-1024}"
RUNS="${CADDY_BENCH_RUNS:-3}"

mkdir -p "$TMP_DIR"

cleanup() {
  git -C "$ROOT_DIR" worktree remove --force "$MASTER_WT" >/dev/null 2>&1 || true
  rm -rf "$TMP_DIR"
}
trap cleanup EXIT

echo "Preparing benchmark tooling and binaries..."

WSBENCH_BIN="$TMP_DIR/wsbench"
CURRENT_BIN="$TMP_DIR/caddy-current"
MASTER_BIN="$TMP_DIR/caddy-master"

GOFLAGS="${GOFLAGS:-}"

(
  cd "$ROOT_DIR"
  go build $GOFLAGS -o "$WSBENCH_BIN" ./caddytest/integration/wsbench
  go build $GOFLAGS -o "$CURRENT_BIN" ./cmd/caddy
)

git -C "$ROOT_DIR" worktree add --detach "$MASTER_WT" master >/dev/null
(
  cd "$MASTER_WT"
  go build $GOFLAGS -o "$MASTER_BIN" ./cmd/caddy
)

CLK_TCK="$(getconf CLK_TCK)"

run_case() {
  local ref_name="$1"
  local caddy_bin="$2"
  local retain="$3"

  local init_cfg="$TMP_DIR/init-${ref_name}-${retain}.caddyfile"
  cat > "$init_cfg" <<EOF
{
  admin 127.0.0.1:2999
  http_port 9080
  https_port 9443
  grace_period 1ns
  skip_install_trust
}

127.0.0.1:9080 {
  respond "boot"
}
EOF

  local caddy_log="$TMP_DIR/caddy-${ref_name}-${retain}.log"
  "$caddy_bin" run --config "$init_cfg" --adapter caddyfile >"$caddy_log" 2>&1 &
  local caddy_pid=$!

  local started=0
  for _ in $(seq 1 50); do
    if curl -fsS "http://127.0.0.1:2999/config/" >/dev/null 2>&1; then
      started=1
      break
    fi
    sleep 0.1
  done

  if [[ "$started" -ne 1 ]]; then
    echo "ERROR,$ref_name,$retain,failed_to_start" >> "$RESULTS_CSV"
    kill "$caddy_pid" >/dev/null 2>&1 || true
    wait "$caddy_pid" >/dev/null 2>&1 || true
    return
  fi

  for run in $(seq 1 "$RUNS"); do
    local cpu_before cpu_after
    cpu_before="$(awk '{print $14+$15}' "/proc/$caddy_pid/stat")"

    local bench_out code
    set +e
    bench_out="$($WSBENCH_BIN \
      --admin 127.0.0.1:2999 \
      --proxy 127.0.0.1:9080 \
      --duration "$DURATION" \
      --clients "$CLIENTS" \
      --payload-bytes "$PAYLOAD_BYTES" \
      --retain="$retain" 2>"$TMP_DIR/wsbench-${ref_name}-${retain}-${run}.err")"
    code=$?
    set -e
    if [[ "$code" -ne 0 ]]; then
      if [[ "$code" -eq 3 ]]; then
        echo "UNSUPPORTED,$ref_name,$retain,$run" >> "$RESULTS_CSV"
        continue
      fi
      echo "ERROR,$ref_name,$retain,$run,wsbench_failed" >> "$RESULTS_CSV"
      continue
    fi

    cpu_after="$(awk '{print $14+$15}' "/proc/$caddy_pid/stat")"

    IFS=',' read -r elapsed_sec messages bytes_total mbps msg_per_sec err_count <<< "$bench_out"

    local cpu_jiffies cpu_sec cpu_pct
    cpu_jiffies=$((cpu_after - cpu_before))
    cpu_sec="$(awk -v j="$cpu_jiffies" -v clk="$CLK_TCK" 'BEGIN { printf "%.6f", j/clk }')"
    cpu_pct="$(awk -v c="$cpu_sec" -v e="$elapsed_sec" 'BEGIN { if (e <= 0) { print "0.00" } else { printf "%.2f", (100*c)/e } }')"

    echo "RESULT,$ref_name,$retain,$run,$elapsed_sec,$messages,$bytes_total,$mbps,$msg_per_sec,$cpu_sec,$cpu_pct,$err_count" >> "$RESULTS_CSV"
  done

  curl -fsS -X POST "http://127.0.0.1:2999/stop" >/dev/null 2>&1 || true
  kill "$caddy_pid" >/dev/null 2>&1 || true
  wait "$caddy_pid" >/dev/null 2>&1 || true
}

cat > "$RESULTS_CSV" <<EOF
kind,ref,retain,run,elapsed_sec,messages,bytes_total,mbps,msg_per_sec,cpu_sec,cpu_pct,errors
EOF

echo "Running matrix: master/current x retain(false/true)"
run_case "master" "$MASTER_BIN" "false"
run_case "master" "$MASTER_BIN" "true"
run_case "current" "$CURRENT_BIN" "false"
run_case "current" "$CURRENT_BIN" "true"

echo
echo "Raw results written to: $RESULTS_CSV"
echo "Computing averages..."

awk -F',' '
BEGIN {
  print "";
  print "Averages by case:";
  print "ref,retain,samples,avg_mbps,avg_cpu_pct,avg_msg_per_sec,avg_errors";
}
$1 == "RESULT" {
  key = $2 "," $3;
  cnt[key]++;
  mbps[key] += $8;
  cpu[key] += $11;
  mps[key] += $9;
  errs[key] += $12;
}
END {
  for (k in cnt) {
    printf "%s,%d,%.3f,%.3f,%.1f,%.3f\n", k, cnt[k], mbps[k]/cnt[k], cpu[k]/cnt[k], mps[k]/cnt[k], errs[k]/cnt[k];
  }
}
' "$RESULTS_CSV"

wsbench/main.go

package main

import (
	"bufio"
	"errors"
	"flag"
	"fmt"
	"io"
	"net"
	"net/http"
	"net/textproto"
	"os"
	"strconv"
	"strings"
	"sync"
	"sync/atomic"
	"time"
)

type config struct {
	adminAddr   string
	proxyAddr   string
	duration    time.Duration
	clients     int
	payloadSize int
	retain      bool
}

type benchClient struct {
	conn   net.Conn
	reader *bufio.Reader
}

type backend struct {
	addr   string
	ln     net.Listener
	server *http.Server
	mu     sync.Mutex
	conns  map[net.Conn]struct{}
}

func main() {
	cfg, err := parseFlags()
	if err != nil {
		fatalf("invalid flags: %v", err)
	}

	be, err := startBackend()
	if err != nil {
		fatalf("start backend: %v", err)
	}
	defer be.close()

	if err := loadProxyConfig(cfg.adminAddr, cfg.proxyAddr, be.addr, cfg.retain); err != nil {
		if cfg.retain && strings.Contains(strings.ToLower(err.Error()), "stream_retain_on_reload") {
			fatalExitf(3, "retain mode unsupported by this ref: %v", err)
		}
		fatalf("load proxy config: %v", err)
	}

	clients := make([]*benchClient, 0, cfg.clients)
	for i := 0; i < cfg.clients; i++ {
		client, openErr := openUpgradedClient(cfg.proxyAddr)
		if openErr != nil {
			closeClients(clients)
			fatalf("open upgraded client %d: %v", i, openErr)
		}
		clients = append(clients, client)
	}
	defer closeClients(clients)

	payload := make([]byte, cfg.payloadSize)
	for i := range payload {
		payload[i] = byte('a' + (i % 26))
	}

	var (
		messages atomic.Uint64
		bytes    atomic.Uint64
		errorsN  atomic.Uint64
	)

	deadline := time.Now().Add(cfg.duration)
	start := time.Now()

	var wg sync.WaitGroup
	for _, client := range clients {
		wg.Add(1)
		go func(c *benchClient) {
			defer wg.Done()
			for time.Now().Before(deadline) {
				if err := c.echo(payload); err != nil {
					errorsN.Add(1)
					return
				}
				messages.Add(1)
				bytes.Add(uint64(len(payload) * 2))
			}
		}(client)
	}
	wg.Wait()

	elapsed := time.Since(start)
	elapsedSec := elapsed.Seconds()
	if elapsedSec <= 0 {
		elapsedSec = 1e-9
	}
	msgCount := messages.Load()
	bytesTotal := bytes.Load()
	mbps := (float64(bytesTotal) / (1024 * 1024)) / elapsedSec
	msgPerSec := float64(msgCount) / elapsedSec

	fmt.Printf("%.6f,%d,%d,%.6f,%.6f,%d\n", elapsedSec, msgCount, bytesTotal, mbps, msgPerSec, errorsN.Load())
}

func parseFlags() (config, error) {
	cfg := config{}
	flag.StringVar(&cfg.adminAddr, "admin", "127.0.0.1:2999", "caddy admin host:port")
	flag.StringVar(&cfg.proxyAddr, "proxy", "127.0.0.1:9080", "reverse proxy host:port")
	flag.DurationVar(&cfg.duration, "duration", 10*time.Second, "benchmark duration")
	flag.IntVar(&cfg.clients, "clients", 32, "concurrent upgraded clients")
	flag.IntVar(&cfg.payloadSize, "payload-bytes", 1024, "payload bytes per message")
	flag.BoolVar(&cfg.retain, "retain", false, "enable stream_retain_on_reload in reverse_proxy config")
	flag.Parse()

	if cfg.adminAddr == "" || cfg.proxyAddr == "" {
		return cfg, errors.New("admin and proxy addresses are required")
	}
	if cfg.duration <= 0 {
		return cfg, errors.New("duration must be > 0")
	}
	if cfg.clients <= 0 {
		return cfg, errors.New("clients must be > 0")
	}
	if cfg.payloadSize <= 0 {
		return cfg, errors.New("payload-bytes must be > 0")
	}
	return cfg, nil
}

func startBackend() (*backend, error) {
	ln, err := net.Listen("tcp", "127.0.0.1:0")
	if err != nil {
		return nil, err
	}

	be := &backend{
		addr:  ln.Addr().String(),
		ln:    ln,
		conns: make(map[net.Conn]struct{}),
	}
	be.server = &http.Server{Handler: http.HandlerFunc(be.serveHTTP)}

	go func() {
		_ = be.server.Serve(ln)
	}()

	return be, nil
}

func (b *backend) serveHTTP(w http.ResponseWriter, r *http.Request) {
	if !strings.EqualFold(r.Header.Get("Connection"), "Upgrade") || !strings.EqualFold(r.Header.Get("Upgrade"), "stress-stream") {
		http.Error(w, "upgrade required", http.StatusUpgradeRequired)
		return
	}

	hijacker, ok := w.(http.Hijacker)
	if !ok {
		http.Error(w, "hijack unsupported", http.StatusInternalServerError)
		return
	}

	conn, rw, err := hijacker.Hijack()
	if err != nil {
		return
	}
	b.trackConn(conn)

	_, _ = rw.WriteString("HTTP/1.1 101 Switching Protocols\r\nConnection: Upgrade\r\nUpgrade: stress-stream\r\n\r\n")
	_ = rw.Flush()

	go func() {
		defer b.untrackConn(conn)
		defer conn.Close()
		_, _ = io.Copy(conn, conn)
	}()
}

func (b *backend) trackConn(conn net.Conn) {
	b.mu.Lock()
	b.conns[conn] = struct{}{}
	b.mu.Unlock()
}

func (b *backend) untrackConn(conn net.Conn) {
	b.mu.Lock()
	delete(b.conns, conn)
	b.mu.Unlock()
}

func (b *backend) close() {
	_ = b.server.Close()
	_ = b.ln.Close()

	b.mu.Lock()
	defer b.mu.Unlock()
	for conn := range b.conns {
		_ = conn.Close()
	}
	clear(b.conns)
}

func loadProxyConfig(adminAddr, proxyAddr, backendAddr string, retain bool) error {
	host, port, err := net.SplitHostPort(proxyAddr)
	if err != nil {
		return err
	}
	httpPort, err := strconv.Atoi(port)
	if err != nil {
		return err
	}

	directives := ""
	if retain {
		directives = "\n\t\tstream_retain_on_reload"
	}

	config := fmt.Sprintf(`
{
	admin %s
	http_port %d
	https_port 9443
	grace_period 1ns
	skip_install_trust
}

%s:%d {
	reverse_proxy %s {%s
	}
}
`, adminAddr, httpPort, host, httpPort, backendAddr, directives)

	req, err := http.NewRequest(http.MethodPost, "http://"+adminAddr+"/load", strings.NewReader(config))
	if err != nil {
		return err
	}
	req.Header.Set("Content-Type", "text/caddyfile")

	client := &http.Client{Timeout: 15 * time.Second}
	resp, err := client.Do(req)
	if err != nil {
		return err
	}
	defer resp.Body.Close()

	body, _ := io.ReadAll(resp.Body)
	if resp.StatusCode != http.StatusOK {
		return fmt.Errorf("status=%d body=%s", resp.StatusCode, strings.TrimSpace(string(body)))
	}
	return nil
}

func openUpgradedClient(proxyAddr string) (*benchClient, error) {
	conn, err := net.DialTimeout("tcp", proxyAddr, 5*time.Second)
	if err != nil {
		return nil, err
	}

	request := strings.Join([]string{
		"GET /upgrade HTTP/1.1",
		"Host: " + proxyAddr,
		"Connection: Upgrade",
		"Upgrade: stress-stream",
		"",
		"",
	}, "\r\n")
	if _, err := io.WriteString(conn, request); err != nil {
		_ = conn.Close()
		return nil, err
	}

	reader := bufio.NewReader(conn)
	tproto := textproto.NewReader(reader)
	statusLine, err := tproto.ReadLine()
	if err != nil {
		_ = conn.Close()
		return nil, err
	}
	if !strings.Contains(statusLine, "101") {
		_ = conn.Close()
		return nil, fmt.Errorf("unexpected upgrade status: %s", statusLine)
	}

	headers, err := tproto.ReadMIMEHeader()
	if err != nil {
		_ = conn.Close()
		return nil, err
	}
	if !strings.EqualFold(headers.Get("Connection"), "Upgrade") {
		_ = conn.Close()
		return nil, fmt.Errorf("unexpected upgrade headers: %v", headers)
	}

	return &benchClient{conn: conn, reader: reader}, nil
}

func (c *benchClient) echo(payload []byte) error {
	deadline := time.Now().Add(2 * time.Second)
	if err := c.conn.SetWriteDeadline(deadline); err != nil {
		return err
	}
	if _, err := c.conn.Write(payload); err != nil {
		return err
	}
	if err := c.conn.SetReadDeadline(deadline); err != nil {
		return err
	}

	buf := make([]byte, len(payload))
	if _, err := io.ReadFull(c.reader, buf); err != nil {
		return err
	}
	if !equalBytes(buf, payload) {
		return errors.New("payload mismatch")
	}
	return nil
}

func closeClients(clients []*benchClient) {
	for _, client := range clients {
		if client != nil && client.conn != nil {
			_ = client.conn.Close()
		}
	}
}

func equalBytes(a, b []byte) bool {
	if len(a) != len(b) {
		return false
	}
	for i := range a {
		if a[i] != b[i] {
			return false
		}
	}
	return true
}

func fatalf(format string, args ...any) {
	_, _ = fmt.Fprintf(os.Stderr, format+"\n", args...)
	os.Exit(1)
}

func fatalExitf(code int, format string, args ...any) {
	_, _ = fmt.Fprintf(os.Stderr, format+"\n", args...)
	os.Exit(code)
}

ws_benchmark_results.csv

kind,ref,retain,run,elapsed_sec,messages,bytes_total,mbps,msg_per_sec,cpu_sec,cpu_pct,errors
RESULT,master,false,1,10.000151,1501963,3076020224,293.347710,150194.027639,56.390000,563.89,0
RESULT,master,false,2,10.000140,1449172,2967904256,283.037447,144915.173071,54.870000,548.69,0
RESULT,master,false,3,10.000117,1487499,3046397952,290.523736,148748.152953,56.410000,564.09,0
UNSUPPORTED,master,true,1
UNSUPPORTED,master,true,2
UNSUPPORTED,master,true,3
RESULT,current,false,1,10.000107,1517738,3108327424,296.430023,151772.171727,57.810000,578.09,0
RESULT,current,false,2,10.000185,1397386,2861846528,272.921912,139736.018726,53.010000,530.09,0
RESULT,current,false,3,10.000112,1509902,3092279296,294.899419,150988.502572,57.470000,574.69,0
RESULT,current,true,1,10.000187,1489710,3050926080,290.953558,148968.221594,56.590000,565.89,0
RESULT,current,true,2,10.000092,1450261,2970134528,283.251498,145024.767135,54.980000,549.79,0
RESULT,current,true,3,10.000406,1491372,3054329856,291.271780,149131.151285,56.690000,566.88,0

Ran with: CADDY_BENCH_DURATION=10s CADDY_BENCH_RUNS=3 CADDY_BENCH_CLIENTS=32 CADDY_BENCH_PAYLOAD_BYTES=1024 ./caddytest/integration/bench_websocket_regression.sh

@Geramy
Copy link
Copy Markdown

Geramy commented Apr 18, 2026

This is amazing! I'm glad you where able to find a solution! This should be merged asap then caddy won't have any competition :)

Copy link
Copy Markdown
Member

@WeidiDeng WeidiDeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick review.

I'll look into the details of the new copying mechanics and logs later, possibly testing locally.

Comment thread modules/caddyhttp/reverseproxy/streaming.go Outdated
Comment thread modules/caddyhttp/reverseproxy/streaming.go Outdated
Comment thread modules/caddyhttp/reverseproxy/streaming.go Outdated
Comment thread modules/caddyhttp/reverseproxy/streaming.go Outdated
Comment thread modules/caddyhttp/encode/encode.go Outdated
@francislavoie francislavoie requested a review from WeidiDeng April 18, 2026 18:20
@francislavoie francislavoie force-pushed the proxy-stream-detached branch 4 times, most recently from facbed8 to db86fda Compare April 19, 2026 22:16
Copy link
Copy Markdown
Contributor

@mmm444 mmm444 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the effort on this topic!

I have found some time for a quick review of the core of the changes in reverseproxy.go and streaming.go. Unfortunately I think the issue discussed on line 542 in reverseproxy.go is substantial and will require some minor redesign. Other comments are just nitpicks.

Please take my review with a grain of salt, as it’s been a while since I last contributed to this project and I also did not run anything, just read the code.

Comment thread modules/caddyhttp/reverseproxy/streaming.go Outdated
Comment thread modules/caddyhttp/reverseproxy/streaming.go Outdated
Comment thread modules/caddyhttp/reverseproxy/streaming.go Outdated
Comment thread modules/caddyhttp/reverseproxy/streaming.go Outdated
Comment thread modules/caddyhttp/reverseproxy/streaming.go Outdated
Comment thread modules/caddyhttp/reverseproxy/reverseproxy.go Outdated
@francislavoie francislavoie requested a review from mmm444 April 21, 2026 12:17
@francislavoie
Copy link
Copy Markdown
Member Author

francislavoie commented Apr 21, 2026

I re-ran the memory usage report, got these numbers after the changes:

Current memory report (this branch, current HEAD):

  • legacy: before 117.7 MiB, mid 23.1 MiB, after 22.2 MiB, delta -95.6 MiB, objects 221040 -> 38501
  • close_delay: before 119.6 MiB, mid 119.9 MiB, after 23.3 MiB, delta -96.2 MiB, objects 221364 -> 41024
  • retain: before 99.5 MiB, mid 97.8 MiB, after 97.8 MiB, delta -1.8 MiB, objects 89152 -> 84580

So essentially, the detached/retain memory usage is basically the same (still ~20MB lower than legacy), but now legacy and close_delay now uses the same amount of memory as before this PR.

The changes since the initial PR make it so this will not have any significant effect on existing configs, but enabling retain mode can reduce memory usage when there's a lot of connections.

I think I should probably rename stream_retain_on_reload to stream_detached. The name would more directly relate to the behaviour rather than to the side effect of its relationship to config reloads. Edit: I did so

@mholt
Copy link
Copy Markdown
Member

mholt commented Apr 21, 2026

Thanks Francis. Given the performance, do you think that this should just be the default behavior then?

@francislavoie
Copy link
Copy Markdown
Member Author

Eventually yes I think we will likely want to change the default, but I think playing it safe with an opt-in is better. We'll want to hear how stable it is out in the wild.

@francislavoie francislavoie force-pushed the proxy-stream-detached branch from 12139f2 to eeb13f1 Compare April 25, 2026 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature ⚙️ New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants