Skip to content

Fix intermittent 500 errors from worker pool concurrency issues#3494

Open
chubes4 wants to merge 11 commits intoWordPress:trunkfrom
chubes4:fix/pool-proxy-streamed-response-lifecycle
Open

Fix intermittent 500 errors from worker pool concurrency issues#3494
chubes4 wants to merge 11 commits intoWordPress:trunkfrom
chubes4:fix/pool-proxy-streamed-response-lifecycle

Conversation

@chubes4
Copy link
Copy Markdown

@chubes4 chubes4 commented Apr 14, 2026

Summary

Fixes intermittent HTTP 500 errors that occur under multi-client concurrent load — most visibly when a user has several WordPress admin tabs open in Studio and refreshes them simultaneously.

The root cause is a pool proxy lifecycle mismatch: createObjectPoolProxy's withInstance releases a worker back to the pool as soon as the proxied method's outer promise resolves. For requestStreamed() that's when the StreamedPHPResponse object is returned (headers available, body still streaming), not when the body is fully drained. But the worker's SinglePHPInstanceManager doesn't release the PHP instance until response.finished fires. During that window the pool proxy treats the worker as free and hands it to a concurrent request — which hits isAcquired === true and surfaces as HTTP 500.

Approach

Expose a new __withInstance(fn) method on Pooled<T> that acquires an instance, invokes a caller-supplied callback with it, and releases only after the callback promise resolves (or throws). Callers whose logical unit of work outlives a single proxied method call — streamed responses being the canonical case — use this to bind the worker lifecycle to the full request lifecycle instead of the first-promise-to-resolve.

In run-cli.ts, handleRequest now wraps requestStreamed() + streamResponse() inside a __withInstance callback. The worker is held for the entire time the response body is streaming to the HTTP client, eliminating the race window. start-server.ts's handleRequest signature changes from (request) => Promise<StreamedPHPResponse> to (request, streamResponse) => Promise<void> to carry the stream-drain lifecycle back into the handler.

No behavioral changes outside of the CLI request path. Streaming semantics are preserved — large bodies and SSE flows still stream progressively, they're just bounded to one worker for their duration.

Changes

  • object-pool-proxy.ts: Add __withInstance<R>(fn): Promise<R> to Pooled<T> and createObjectPoolProxy. The method is exposed directly on the proxy target so it's found by the prop in _target check before hitting the generic proxy trap.
  • run-cli.ts: Switch handleRequest to use playgroundPool.__withInstance(async (instance) => { ... instance.requestStreamed(request); await streamResponse(response); }). Also retypes playgroundPool from Pooled<PlaygroundCliWorker> to Pooled<RemoteAPI<PlaygroundCliWorker>> to honestly reflect that the pool holds comlink remote proxies, not raw workers. This moves existing implicit type-laundering to the honest cross-package boundary.
  • start-server.ts: handleRequest signature changes to a streaming-callback pattern so the full request→stream→drain lifecycle stays within the pool proxy's __withInstance scope.
  • single-php-instance-manager.ts: Fix grammar on the "already acquired" error message.
  • Tests: 4 new spec cases for __withInstance (callback-held release, sync/async throw release, raw-instance identity). Updated run-cli.spec.ts error-handling stub to target __withInstance. Updated start-server.spec.ts for the new callback signature.

Test plan

Unit tests

npx vitest run --config packages/php-wasm/universal/vite.config.ts packages/php-wasm/universal/src/lib/object-pool-proxy.spec.ts
npx vitest run --config packages/playground/cli/vite.config.ts packages/playground/cli/tests/start-server.spec.ts
npx vitest run --config packages/playground/cli/vite.config.ts packages/playground/cli/tests/run-cli.spec.ts

All pass.

Multi-client reproducer (the bug manifest in practice)

The bug surfaces most reliably when multiple independent clients hit the server simultaneously — e.g. several browser admin tabs refreshing at once. Each "tab" is its own shell process with its own cookie jar; each fires a 10-URL WP admin asset fanout:

Script: multi-tab-stress.sh
#!/usr/bin/env bash
URL="http://localhost:8881"
TAB_COUNT="${1:-6}"
WAVES="${2:-20}"
OUT=/tmp/multi-tab.txt
> "$OUT"

tab_session() {
    local tab_id="$1"
    local jar="/tmp/tab-jar-$tab_id.txt"
    > "$jar"
    for u in "/" "/wp-admin/" "/wp-login.php" \
        "/wp-includes/js/wp-emoji-release.min.js" \
        "/wp-content/themes/twentytwentyfive/style.css" \
        "/wp-includes/css/dist/block-library/style.min.css" \
        "/wp-includes/js/jquery/jquery.min.js" \
        "/wp-includes/js/dist/i18n.min.js" \
        "/favicon.ico" "/wp-json/"; do
        curl -sL -o /dev/null -w "%{http_code}\n" -m 30 \
            -b "$jar" -c "$jar" \
            -H "User-Agent: tab-$tab_id" \
            "$URL$u" >> "$OUT" &
    done
    wait
    rm -f "$jar"
}
export -f tab_session
export URL OUT

for wave in $(seq 1 "$WAVES"); do
    for tab_id in $(seq 1 "$TAB_COUNT"); do
        (tab_session "$tab_id") &
    done
    wait
done
echo "Total: $(wc -l < "$OUT")"
awk '{print $1}' "$OUT" | sort | uniq -c | sort -rn

Tested against a Studio site (http://localhost:8881), Playground built from this branch vs. stock upstream/trunk, both installed via self-hosted tarballs and loaded identically by Studio.

Results:

Branch Scenario 500 rate
This PR 6 tabs × 20 waves, run 1 0 / 1200 = 0%
This PR 6 tabs × 20 waves, run 2 0 / 1200 = 0%
This PR 10 tabs × 15 waves 0 / 1500 = 0%
Stock trunk 6 tabs × 20 waves, run 1 63 / 1200 = 5.3%
Stock trunk 6 tabs × 20 waves, run 2 66 / 1200 = 5.5%
Stock trunk 10 tabs × 15 waves 98 / 1500 = 6.5%

Totals: this PR 0 / 3900 (0%). Stock trunk 227 / 3900 (5.8%).

Same workload against both branches. Strip consistently zero, trunk consistently 5-7% failure.

Additional stress (this PR only, ensures robustness)

  • 10 mixed attack patterns (3, 6, 10, 12, 20-way concurrent bursts; sustained 30s loads; garbage paths; mid-stream aborts; 10MB POST bodies; 16KB URIs; cookie floods; bare TCP abort barrages): 0 pool-related 500s across ~3,800 requests
  • Across ~25 minutes of sustained stress, zero worker exit events in the CLI log

PR scope note

An earlier version of this PR also included dead-worker eviction (__removeInstance + an onExit handler in run-cli.ts) in response to an @TODO: Should we respawn the worker if it exited with an error comment. Per @brandonpayton's review, that's orthogonal to the pool-release-timing bug and belongs in a separate PR backed by evidence of real worker crashes. I removed the eviction code in 64a8f02 — ~25 minutes of sustained multi-pattern stress showed zero worker exit events, so the eviction path never triggered on this workload. If worker-process crash recovery is wanted later, it should be designed from a real repro (or, per Brandon's suggestion, via "fail startup altogether" for the startup case).

This PR is now scoped to the one observed bug: the pool-release-timing race, fixed with __withInstance.

Fixes #3492

AI assistance

  • AI assistance: Yes
  • Tool(s): Claude Code (Opus 4.7)
  • Used for: Root cause analysis of the pool proxy lifecycle mismatch, initial implementation, iterative refinement in response to review feedback (the __withInstance callback API in particular came out of re-reading Copilot's note about buffered vs streamed semantics), the multi-client stress harness, and drafting this PR description. Every commit reviewed, tested, and validated by the author.

@chubes4 chubes4 requested review from a team, brandonpayton and Copilot April 14, 2026 23:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Fixes intermittent HTTP 500s under concurrent load by aligning worker-pool acquisition/release with the full request lifecycle and evicting crashed workers from the pool.

Changes:

  • Updates the server handleRequest contract to stream responses via a callback that remains in-scope until streaming completes.
  • Switches CLI request handling away from early-releasing streaming calls and removes dead worker proxies from the pool on worker exit.
  • Adds opt-in debug logging and introduces pool eviction (__removeInstance) to prevent routing requests to crashed workers.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
packages/playground/cli/tests/start-server.spec.ts Updates tests to match the new handleRequest(request, streamResponse) callback signature.
packages/playground/cli/src/start-server.ts Changes request handling to pass a streamResponse callback and adds opt-in HTTP debug logs.
packages/playground/cli/src/run-cli.ts Evicts dead workers from the pool on exit and switches to buffered request() to avoid early worker release.
packages/php-wasm/universal/src/lib/single-php-instance-manager.ts Adds opt-in debug logging (with stack traces) for double-acquire conflicts.
packages/php-wasm/universal/src/lib/object-pool-proxy.ts Adds pool eviction (__removeInstance) and opt-in debug logging around acquire/release/queue behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +73 to +94
function removeInstance(instance: unknown): void {
const inst = instance as T;

// Remove from the canonical list
const idx = instances.indexOf(inst);
if (idx !== -1) {
instances.splice(idx, 1);
}

function acquire(): Promise<T> {
// Remove from the free list if it's currently free
const freeIdx = freeInstances.indexOf(inst);
if (freeIdx !== -1) {
freeInstances.splice(freeIdx, 1);
log(
`removeInstance: removed free instance (free=${freeInstances.length})`
);
} else {
// Instance is currently busy — mark it for removal on release
removedInstances.add(inst);
log(`removeInstance: marked busy instance for removal`);
}
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removeInstance() can cause queued acquisitions to hang indefinitely when it’s called for a busy instance and that instance is later discarded on release(). In that path (discarded removed instance), the function returns early without waking a queued waiter, so if the removed instance was the last remaining capacity (or removal reduces capacity below demand), callers can deadlock awaiting acquire(). A concrete fix is to redesign the wait queue to support rejection (store {resolve, reject}), and when capacity is removed (especially when it reaches 0) fail queued requests with a clear error; alternatively, on discarding a removed instance, immediately attempt to satisfy the next waiter using any remaining free instance (if present) and/or keep the instance in the canonical count until it’s actually released.

Copilot uses AI. Check for mistakes.
Comment thread packages/php-wasm/universal/src/lib/object-pool-proxy.ts Outdated
Comment on lines 48 to 51
export function createObjectPoolProxy<T extends object>(
instances: T[]
instances: T[],
debugLabel?: string
): Pooled<T> {
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createObjectPoolProxy() now mutates the caller-provided instances array (via instances.splice(...)) when evicting. That’s a behavioral change from typical pool constructors and can create surprising side effects if the caller retains and uses that array elsewhere. Consider copying into an internal allInstances array and mutating that instead (or explicitly document that the input array is treated as owned and will be mutated).

Copilot uses AI. Check for mistakes.
Comment on lines +76 to +80
// Remove from the canonical list
const idx = instances.indexOf(inst);
if (idx !== -1) {
instances.splice(idx, 1);
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createObjectPoolProxy() now mutates the caller-provided instances array (via instances.splice(...)) when evicting. That’s a behavioral change from typical pool constructors and can create surprising side effects if the caller retains and uses that array elsewhere. Consider copying into an internal allInstances array and mutating that instead (or explicitly document that the input array is treated as owned and will be mutated).

Copilot uses AI. Check for mistakes.
Comment on lines +167 to +172
const proxyTarget = {} as Pooled<T>;
// Expose __removeInstance directly on the target so it's found
// by the `prop in _target` check before hitting the proxy trap.
(proxyTarget as any).__removeInstance = removeInstance;

return new Proxy(proxyTarget, {
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a stringly-named control method (__removeInstance) onto a generic Pooled<T> can mask a real __removeInstance property/method on T (if one exists) since the proxy will return the pool control function instead of forwarding to the underlying instance. To avoid potential name collisions, prefer a Symbol-keyed control API (or a separate returned control object) so proxy consumers can’t accidentally shadow/override underlying instance members.

Copilot uses AI. Check for mistakes.
Comment thread packages/playground/cli/src/run-cli.ts Outdated
}

return response;
await streamResponse(StreamedPHPResponse.fromPHPResponse(response));
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching from requestStreamed() to buffered request() changes runtime behavior in a way that can materially impact production loads: large downloads/responses will be fully buffered in memory and the client won’t receive any bytes until PHP completes, which can increase memory pressure and latency (and can break long-lived streaming semantics like SSE). Since this PR already introduces a callback-based streaming contract in start-server.ts, consider addressing the lifecycle mismatch by keeping streaming but ensuring the worker remains acquired until the stream finishes (e.g., have the pool/proxy hold the instance until handleStreamedResponse completes, or provide a pool method that ties instance release to stream completion) so you retain streaming benefits without reintroducing early release.

Copilot uses AI. Check for mistakes.
Comment thread packages/playground/cli/src/start-server.ts Outdated
Comment thread packages/playground/cli/src/start-server.ts Outdated
Comment thread packages/php-wasm/universal/src/lib/single-php-instance-manager.ts Outdated
Comment thread packages/php-wasm/universal/src/lib/single-php-instance-manager.ts Outdated
@chubes4 chubes4 force-pushed the fix/pool-proxy-streamed-response-lifecycle branch 2 times, most recently from 11a145e to 5bfb7fd Compare April 14, 2026 23:52
pento added a commit to pento/claudaborative-editing that referenced this pull request Apr 17, 2026
Vendors tarballs of WordPress/wordpress-playground#3494 under vendor/ and
references them via a direct dependency (@wp-playground/cli) and an npm
override (@php-wasm/universal). Without this patch the object-pool proxy
returns a worker to the free list before its streamed response finishes;
concurrent e2e requests then land on a mid-response worker and WordPress
returns 500. Released builds (3.1.20) still have the bug.

Verified: all 19 e2e tests pass locally with 4 Playwright workers (3.1m).

Revisit once the upstream PR lands — drop vendor/ and bump to the next
released version.
Copy link
Copy Markdown
Member

@pento pento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only commenting on the functionality, I've tested this PR on my own repo, I can confirm that it makes e2e test runs (ie, tests that hammer the server with a heap of simultaneous connections) much more reliable.

I'll leave the code review to a member of @WordPress/playground-maintainers, but 👍🏻 on the functionality of the fix!

@mho22
Copy link
Copy Markdown
Collaborator

mho22 commented Apr 22, 2026

@brandonpayton This pull request may fix the recurring test failure in test-playground-cli (macos-latest).

@pento
Copy link
Copy Markdown
Member

pento commented Apr 22, 2026

Hmm, it looks like this PR doesn't fix that intermittent failure, unfortunately.

@mho22
Copy link
Copy Markdown
Collaborator

mho22 commented Apr 23, 2026

@chubes4 It looks like there is no mention of workers: 3 inside the runCLI call in file-locking.spec.ts.

This seems weird since @pento explicitly merged trunk into this pull request. I'll let you rebase with trunk and we'll figure out if workers: 3 will make these annoying failing tests disappear.

chubes4 added 2 commits April 23, 2026 12:47
Two related bugs caused intermittent HTTP 500 errors under normal
concurrent request loads (e.g. a browser page load firing parallel
requests for HTML, CSS, JS, and REST API).

1. Pool proxy released workers before PHP finished:
   handleRequest called playgroundPool.requestStreamed() through the
   pool proxy. The proxy released the worker when the promise resolved
   (streaming started), but the worker's SinglePHPInstanceManager held
   the PHP instance until the stream finished. A subsequent request
   routed to that 'free' worker hit isAcquired === true and returned
   HTTP 500.

   Fix: Switch to buffered request() which holds the worker until PHP
   completes. Change start-server.ts to a callback pattern so the full
   request-to-response lifecycle stays within the pool proxy scope.

2. Crashed workers were never removed from the pool:
   When a worker thread exited unexpectedly (e.g. EADDRINUSE), its API
   proxy remained in the pool. Requests routed to the dead proxy would
   fail and be released back, creating an infinite failure cycle.

   Fix: Add __removeInstance() to createObjectPoolProxy so callers can
   evict dead instances. Wire up the onExit handler in run-cli.ts to
   remove dead workers from the pool.

Fixes WordPress#3492

## AI disclosure

Per WordPress AI Guidelines (https://make.wordpress.org/ai/handbook/ai-guidelines/):

AI assistance: Yes
Tool: Claude Code (Claude Opus 4.6)
Used for: Root cause analysis of the pool proxy lifecycle mismatch,
implementing the fix (buffered request approach, callback pattern,
__removeInstance for dead worker eviction), updating tests, and
drafting the commit message. All code reviewed, tested, and validated
by the author against a running Studio site with concurrent request
benchmarks.
…tStreamed()

The handleRequest code was changed to call playgroundPool.request()
instead of playgroundPool.requestStreamed(). The test still stubbed
.requestStreamed, so the error was never thrown and WordPress returned
a 404 instead of 500.
@chubes4 chubes4 force-pushed the fix/pool-proxy-streamed-response-lifecycle branch from 0411b39 to 9022794 Compare April 23, 2026 16:47
@chubes4
Copy link
Copy Markdown
Author

chubes4 commented Apr 23, 2026

@mho22 interesting, now a different test is failing.

chubes4 added 4 commits April 23, 2026 15:00
…er's array

Two related fixes to createObjectPoolProxy:

1. Deadlock when pool drains to empty. If every instance is removed
   (e.g. all workers crashed) while a caller is awaiting acquire(),
   the promise would previously hang forever — no instance will ever
   be released to wake the queue. Now: removeInstance() fails all
   queued waiters with a clear error when capacity drops to 0, and
   acquire() fails fast against an already-empty pool instead of
   enqueueing. Waiter queue entries now carry both resolve and reject
   to support this.

2. Caller array mutation. removeInstance() previously called
   instances.splice() on the caller-provided array, producing
   side effects visible outside the pool. Now the pool copies the
   input into an internally-owned allInstances array at construction
   and only ever mutates that copy.

Three new tests cover the new behaviors (mid-flight drain rejects
queued waiters; post-drain acquire fails fast; caller's array is
left intact after eviction).
'The PHP instance already acquired' → 'The PHP instance is already
acquired'. This message surfaces during concurrency failures so
tightening it up makes logs easier to parse.
The 500-under-concurrency fix (a2285ec) switched handleRequest from
requestStreamed() to buffered request() because the pool proxy released
workers when the outer promise resolved — which, for streaming, was when
the response *started* streaming (headers available), not when it
finished. Concurrent requests could then be routed to a worker still
draining its previous body, producing 'PHP instance already acquired'
errors.

Buffering fixes concurrency but regresses streaming: large bodies are
held entirely in memory and SSE flows become request/response.

Expose the existing internal withInstance helper on the pool proxy as
__withInstance, then scope acquire/release around the full stream drain
in run-cli.ts:

    await playgroundPool.__withInstance(async (instance) => {
        const response = await instance.requestStreamed(request);
        ...cookie handling...
        await streamResponse(response);
        // Instance released only after streamResponse fully drains.
    });

This ties the worker lifecycle to the HTTP response lifecycle: the pool
holds the instance for the entire duration of a request, exactly as the
buffered fix did, while the body still streams directly to the client
without intermediate buffering.

Tests: 4 new spec cases cover callback-held release semantics
(pre-release concurrent calls must queue), sync/async error release,
and raw-instance identity. Existing 17 tests unchanged.

The Pooled<PlaygroundCliWorker> / Pooled<RemoteAPI<PlaygroundCliWorker>>
type divergence was previously papered over by the Promisified<T>
method-proxy shape; __withInstance's contravariant callback parameter
makes it observable. Localized unknown casts keep public signatures
unchanged — a broader refactor of the playgroundPool typing is out of
scope for this PR.
chubes4 added 2 commits April 23, 2026 17:08
The pool is constructed from RemoteAPI<PlaygroundCliWorker> instances
(comlink proxies over MessagePort), but was declared as
Pooled<PlaygroundCliWorker>. Before __withInstance, the two shapes
were structurally compatible because Promisified<T> only exposed method
signatures — and RemoteAPI<T> mirrors T's method surface with
promisified returns. __withInstance's callback parameter puts T in a
contravariant position, which makes the structural difference
observable.

The previous commit resolved this with two localized casts at the
assignment site and inside the __withInstance callback. This commit
replaces both with honest local typing: playgroundPool, the
RunCLIServer.playground public shape, zipSite(), and both
BlueprintsV{1,2}Handler.bootWordPress() signatures all move to
Pooled<RemoteAPI<PlaygroundCliWorker>>. The hot path
(handleRequest + __withInstance callback) is now cast-free, and future
lifecycle-hook additions to Pooled<T> will type-check without per-site
widening.

The casts don't disappear entirely — they migrate outward to the
cross-package boundary with @wp-playground/blueprints and xdebug-bridge,
which expect UniversalPHP (= LimitedPHPApi | Remote<...> | Pooled<...>).
UniversalPHP's Pooled<LimitedPHPApi> arm shares the same underlying
lie (the pool never holds raw LimitedPHPApi, only remote proxies to it),
but widening it is a public-type change across ~100 call sites in
blueprints, wordpress, and other packages — worth a dedicated PR rather
than folding into a streaming-fix. A follow-up issue will track the
UniversalPHP cleanup.

Net: same number of casts, but they now live at the honest
cross-package boundary instead of inside this PR's own additions.
The error-handling test (tests/run-cli.spec.ts:'should return 500 when
the request handler throws an error') stubs the entry point that
handleRequest calls into, relying on the stub to throw and the Express
error path to surface a 500. The test has now tracked three call sites:

- Original streaming path → stubbed .requestStreamed()
- Buffered-fix (a2285ec) → stubbed .request() (commit 9022794)
- Lifecycle-scoped streaming revival (this PR) → now stubs .__withInstance()

handleRequest no longer calls .request() or .requestStreamed() on the
pool proxy directly; both are invoked on the raw instance inside a
.__withInstance() callback. Stubbing .__withInstance() is the
equivalent seam — it intercepts the entire acquire/invoke/release
cycle and fails fast, which is what the test wants.

CI signal for this change: test-playground-cli (macos, ubuntu) reported
'expected 404 to be 500' on the previous commit because the stale
.request() stub was being ignored.
@brandonpayton
Copy link
Copy Markdown
Member

Hi @chubes4, thanks for noticing these issues and working on this PR!

Pool proxy released workers before PHP finished

This was a big oversight on my part with the multi-worker changes. The PHP instance streaming a response is busy until it is done streaming, and using a busy instance breaks things.

At first glance, I feel like we might be able to address this issue with fewer changes. Maybe not, but I'll try to sketch something in the morning and review this PR in more detail.

Crashed workers were never removed from the pool

It would be good to be able to recover in the event of a worker crash. Since this is a separate issue from the overused PHP workers, let's move these fixes to a separate PR.

One thing I'm concerned about here is finding and fixing the reasons for crashed workers. Workers shouldn't be crashing at all. And if workers crash on startup, perhaps we should just fail startup altogether.

When a worker thread exited unexpectedly (e.g. EADDRINUSE)

When was this happening? AFAIK, the HTTP listener is currently owned by the main thread. Maybe other listeners have been added that I'm unaware of 🤔 Will look...

Were there other reasons you were seeing workers exiting unexpectedly?

Strip __removeInstance and the onExit→eviction wiring. The eviction
code was defensive against a hypothetical case where a CLI worker
thread exits unexpectedly post-startup — but that was never observed
in practice, and the EADDRINUSE example in the original PR body was
pattern-matched from an unrelated @php-wasm/node networking-proxy leak
that doesn't touch the CLI worker pool.

Per maintainer feedback (brandonpayton on WordPress#3494): worker crash
recovery is a separate concern worth its own PR backed by real crash
evidence, and 'fail startup altogether' may be the better answer for
the startup case anyway. This PR now only addresses the observed
pool-release-timing bug via __withInstance.

Also removes the pool-drain/capacity-zero bookkeeping that only
existed as a consequence of eviction: the removedInstances Set, the
release() early-discard path, the acquire() fast-fail on empty pool,
and the {resolve, reject} waiter shape (no caller rejects anymore).
The internal allInstances copy is also gone — nothing mutates the
caller's array now that eviction is out, so the defensive copy is
unnecessary.

Removes 3 eviction-related tests from the spec. The 4 __withInstance
tests (added in edbfee1) stay — they cover the streaming lifecycle
fix directly.
@chubes4
Copy link
Copy Markdown
Author

chubes4 commented Apr 24, 2026

Thanks for the review, @brandonpayton!

You're right that the eviction was defensive programming. Admittedly, Claude picked up a TODO: comment in the existing code and ran with it, and it slipped past me. I've updated the PR to drop that, as there were no worker crashes during extensive stress testing.

The overall diff is much smaller now and I've hammered the site to ensure the fix still stands.

All tests are now passing 💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Intermittent 500 errors: Object pool proxy releases worker before streamed response finishes

5 participants