[CLI] Pin file-locking test suite to 3 workers#3521
Conversation
The `multi shared locks` test at file-locking.spec.ts:1090 fires three concurrent PHP requests with `Promise.all`, each busy-waiting on a coordination file that only the next script in the chain can write. It therefore needs three PHP workers running concurrently. Before #3504 the CLI hardcoded 6 workers, which was always enough. After #3504 the default dropped to `min(6, max(1, cpus - 1))`. On `macos-latest` GitHub runners (3 CPUs) that resolves to 2 workers, leaving PHP3 permanently queued and deadlocking the test. The cascade is worse than a single timeout: the `runCLI` server is shared across the whole describe via `beforeAll`, and when the multi-shared test times out its two workers are still spinning inside PHP-level `while (file_get_contents(...) \!== ...)` loops — vitest can abort the JS `await`, but there's no hook to unblock PHP. The four two-script tests that follow in file order then inherit a pool with zero free workers and each times out at 60 s. Pin the suite to `workers: 3` so it's robust to CI host CPU counts. Three is the minimum the multi-shared test needs; anything lower re-introduces the cascade. Follow-up to #3504.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Pins the Playground CLI file-locking test suite’s CLI worker pool size to prevent worker starvation deadlocks on low-CPU CI runners (notably macos-latest).
Changes:
- Passes an explicit
workers: 3option to the sharedrunCLI()server created inbeforeAll. - Documents why fewer than 3 workers leads to deadlock and cascading timeouts in subsequent tests.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Independently reproduced and verified the fix locally. Came at this from the other direction (looking at the remaining intermittent failure on #3494) and landed on the same root cause and the same fix. Repro on macOS (M-series), same PR-branch-merged-with-trunk tree CI runs against:
The cascade you describe matches what I see locally too — the Confirms this is independent of #3494; #3494 fixes a different class of concurrency bug (early worker release + dead-worker eviction). 👍 on shipping this. |
|
While looking at the failing test on this PR ( |
|
@mho22 No problem! Thanks for your work |
Motivation for the change, related issues
Follow-up to #3504.
test-playground-cli (macos-latest)started failing consistently after #3504 merged. Every failing run showed the same five timeouts inpackages/playground/cli/tests/file-locking.spec.ts:PHP flock() > should grant multiple shared locks on a filePHP flock() > should release a shared lock when its associated file descriptor is closedPHP flock() > should release an exclusive lock when its associated file descriptor is closedPHP flock() > should release a shared lock when the owning process exitsPHP flock() > should release an exclusive lock when the owning process exitsRoot cause
The
multi shared lockstest atfile-locking.spec.ts:1090fires three concurrent PHP requests withPromise.all. Each script busy-waits on a coordination file that only the next script in the chain can write, so all three must run concurrently.Before #3504 the CLI hardcoded 6 workers, which was always enough. After #3504 the default dropped to
min(6, max(1, cpus - 1)). GitHub'smacos-latestrunners report 3 CPUs, resolving to 2 workers — so PHP3 is permanently queued and the test deadlocks.Why four unrelated tests also fail
The
runCLIserver is created once inbeforeAlland shared across every test in thedescribe. When the multi-shared test times out, its two workers are still spinning inside PHP-levelwhile (file_get_contents(...) \!== ...)loops — vitest can abort the JSawait, but there's no hook to unblock PHP. The four two-script tests that follow multi-shared in file order each doPromise.all([fetchScript(php1), fetchScript(php2)]), inherit a pool with zero free workers, and time out at 60 s each.Evidence this is the right model:
expect(lock_acquired).toBe(false)-style failures.test-playground-cli (macos-latest)across all branches shows no consistent failure pattern before [CLI] Add --workers=<n|auto> flag to configure worker thread count #3504 merged.Implementation details
Pass
workers: 3torunCLI()in the suite'sbeforeAll. Three is the minimum the multi-shared test needs; anything lower re-introduces the cascade. Explicit integer values bypass themin(6, cpus-1)default clamp (confirmed by thehonors an explicit --workers=3test atrun-cli.spec.ts:1886), so the suite is now robust to host CPU count.No production code changes. Only the test file is touched.
Testing Instructions (or ideally a Blueprint)
On CI, the
test-playground-cli (macos-latest)job should turn green. Ubuntu and Windows runners were already passing (their default worker count was already ≥3) and should remain so.AI disclosure
Per WordPress AI Guidelines:
AI assistance: Yes
Tool: Claude Code (Claude Opus 4.7)
Used for: Root-cause analysis of the worker-starvation cascade, drafting the fix and this PR description. All code was reviewed and tested by the author.