[CLI] Add --workers=<n|auto> flag to configure worker thread count#3504
[CLI] Add --workers=<n|auto> flag to configure worker thread count#3504
Conversation
The worker count was previously hardcoded to 6, capping multi-client workloads (e2e suites, load testing) at six in-flight requests. The deprecated --experimental-multi-worker flag accepted a number but its value was silently discarded. Add a new --workers flag that accepts a positive integer or "auto" (= max(1, cpus-1)). The default is min(6, cpus-1), preserving today's behavior on typical dev machines and shrinking on small hosts. --experimental-multi-worker still parses but now emits a deprecation warning pointing at --workers.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a new --workers=<n|auto> CLI flag to control the number of request-handling worker threads, restoring an “auto” mode and keeping the existing 6-worker default behavior on larger hosts.
Changes:
- Introduces
--workersparsing/validation in the CLI and a pureresolveWorkerCount()helper. - Adds unit/integration-style tests covering defaults,
auto, explicit counts, invalid values, and the deprecated flag warning. - Updates CLI documentation to describe
--workersand deprecate--experimental-multi-worker.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| packages/playground/cli/src/run-cli.ts | Adds --workers option, deprecation warning, and resolveWorkerCount() implementation |
| packages/playground/cli/tests/run-cli.spec.ts | Adds coverage for worker resolution logic and CLI flag behavior |
| packages/playground/cli/README.md | Documents new --workers flag and deprecates --experimental-multi-worker |
| packages/docs/site/docs/developers/05-local-development/04-wp-playground-cli.md | Updates Docusaurus CLI reference with --workers and deprecation note |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Thanks @pento for this great pull request. There's still a test that won't stop failing but I am pretty sure this isn't related to this PR. So we're almost good! I discussed with @brandonpayton and we agreed on warning users on a minimal amount of workers in order for certain blueprints to work. Something like : const targetWorkerCount = resolveWorkerCount(args.workers);
if (targetWorkerCount < 6) {
logger.warn(
`Worker count is below the safe threshold (6). ` +
`This may increase the likelihood of deadlock` +
`due to workers blocking on file locks.`
);
}What do you think? |
|
Thanks for the feedback, @mho22! The failing test suite seemed to be the same as was failing in other PRs when I created this one, so I didn't investigate. I think a warning could be useful, though I'm leaning towards only showing it if the user has deliberately set the worker count to less than 6. If the count has been restricted by the number of cores available, there's not really much they can do, making it a warning that can't be actioned. |
Some blueprints deadlock when fewer than 6 workers are available, because workers block on file locks. Surface two distinct warnings so users can tell a config mistake from a hardware ceiling: - Explicit --workers=<n|auto> resolving below 6 points them at the flag they just set. - Default path reduced below 6 by cpus-1 names the CPU count so the lever is "bigger host," not "different flag."
|
#3494 also fixes: Automattic/studio#3116 |
|
@pento Briliant. Thanks for the contribution. |
## Motivation for the change, related issues Follow-up to #3504. `test-playground-cli (macos-latest)` started failing consistently after #3504 merged. Every failing run showed the same five timeouts in `packages/playground/cli/tests/file-locking.spec.ts`: - `PHP flock() > should grant multiple shared locks on a file` - `PHP flock() > should release a shared lock when its associated file descriptor is closed` - `PHP flock() > should release an exclusive lock when its associated file descriptor is closed` - `PHP flock() > should release a shared lock when the owning process exits` - `PHP flock() > should release an exclusive lock when the owning process exits` ### Root cause The `multi shared locks` test at `file-locking.spec.ts:1090` fires three concurrent PHP requests with `Promise.all`. Each script busy-waits on a coordination file that only the *next* script in the chain can write, so all three must run concurrently. Before #3504 the CLI hardcoded 6 workers, which was always enough. After #3504 the default dropped to `min(6, max(1, cpus - 1))`. GitHub's `macos-latest` runners report 3 CPUs, resolving to **2 workers** — so PHP3 is permanently queued and the test deadlocks. ### Why four unrelated tests also fail The `runCLI` server is created once in `beforeAll` and shared across every test in the `describe`. When the multi-shared test times out, its two workers are still spinning inside PHP-level `while (file_get_contents(...) \!== ...)` loops — vitest can abort the JS `await`, but there's no hook to unblock PHP. The four two-script tests that follow multi-shared in file order each do `Promise.all([fetchScript(php1), fetchScript(php2)])`, inherit a pool with zero free workers, and time out at 60 s each. Evidence this is the right model: - The two-script tests that run *before* multi-shared pass (deny-exclusive-with-shared, deny-shared-with-exclusive). The two-script tests *after* fail. Failures aren't interleaved among the earlier ones. - Every failure is a 60 s timeout, never an assertion error. Independent flock races would surface as `expect(lock_acquired).toBe(false)`-style failures. - A 200-run audit of `test-playground-cli (macos-latest)` across all branches shows no consistent failure pattern before #3504 merged. ## Implementation details Pass `workers: 3` to `runCLI()` in the suite's `beforeAll`. Three is the minimum the multi-shared test needs; anything lower re-introduces the cascade. Explicit integer values bypass the `min(6, cpus-1)` default clamp (confirmed by the `honors an explicit --workers=3` test at `run-cli.spec.ts:1886`), so the suite is now robust to host CPU count. No production code changes. Only the test file is touched. ## Testing Instructions (or ideally a Blueprint) ```bash # Passes on macOS-like hosts with 3 CPUs: npx nx test-playground-cli playground-cli --testFile=file-locking.spec.ts # To reproduce the original failure, temporarily change workers: 3 to workers: 2: # expect 5 timeouts in the PHP flock() describe. ``` On CI, the `test-playground-cli (macos-latest)` job should turn green. Ubuntu and Windows runners were already passing (their default worker count was already ≥3) and should remain so. ## AI disclosure Per [WordPress AI Guidelines](https://make.wordpress.org/ai/handbook/ai-guidelines/): **AI assistance:** Yes **Tool:** Claude Code (Claude Opus 4.7) **Used for:** Root-cause analysis of the worker-starvation cascade, drafting the fix and this PR description. All code was reviewed and tested by the author.
Motivation for the change, related issues
The Playground CLI hardcodes the number of request-handling worker threads to 6 (chosen to match HTTP/1.1's per-origin browser connection limit). That's fine for a single browser, but it caps multi-client workloads — parallel Playwright contexts, load tests, fanning e2e suites — at six in-flight requests, which is too few on modern hosts.
The old
--experimental-multi-workerflag that let users override this was deprecated and its value is now silently ignored, and the pre-hardcoding default ofcpus - 1was dropped along with it. There's no way to turn the dial back up.Implementation details
Adds a new
--workers=<n|auto>flag:autousesmax(1, os.cpus().length - 1)— restores the old pre-hardcoding default.min(6, max(1, cpus - 1)): preserves today's 6-worker behavior on machines with ≥7 cores, shrinks on small hosts so we don't spawn more workers than cores. TheMath.max(1, ...)guard covers single-core hosts and restricted environments whereos.cpus()returns an empty array.coerce: rejects 0, negatives, non-integers, and any non-autostring with a clear error message, routed through the existing.fail()handler.The resolver is a small exported pure function (
resolveWorkerCount) so the min/max/cpu logic is unit-testable without spinning up a server.--experimental-multi-workerstays parsed-but-ignored — existing invocations don't break — but now emits a one-linelogger.warnpointing users at--workers.Docs updated in both
packages/playground/cli/README.mdand the Docusaurus CLI reference page.Breaking changes
None. The default worker count is unchanged on typical dev machines (≥7 cores → still 6). On hosts with 1–6 cores the default drops to
cpus - 1, which is a behavior change on small hosts but strictly safer (we were previously spawning more workers than cores there).Testing Instructions (or ideally a Blueprint)
Unit tests (fast, no server spawn):
npx nx run playground-cli:test-playground-cli --testNamePattern="resolveWorkerCount|worker count"All 14 new tests should pass: default, explicit number,
--workers=1(single-worker bootstrap),--workers=auto, invalid0, invalidabc, deprecation warning for--experimental-multi-worker, plus 7 unit tests stubbingos.cpus()to cover the min/max/empty-array edges.Manual smoke:
Multi-client verification (the motivating use case):