chore(repo): parallelize e2e-ci tasks on large/xlarge agents#35325
chore(repo): parallelize e2e-ci tasks on large/xlarge agents#35325FrozenPandaz merged 53 commits intomasterfrom
Conversation
👷 Deploy Preview for nx-dev processing.
|
✅ Deploy Preview for nx-docs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
View your CI Pipeline Execution ↗ for commit 74ae9b9
☁️ Nx Cloud last updated this comment at |
7dee196 to
6904134
Compare
c9a5336 to
685c93f
Compare
ea59c9b to
935693f
Compare
6c32607 to
4d37292
Compare
2efbc10 to
ac38b61
Compare
8214098 to
d1e65bf
Compare
3768b53 to
4924e1f
Compare
29117d6 to
02ffed4
Compare
The reservePorts/updateJson/CYPRESS_BASE_URL approach turned out to be chasing the wrong root cause and caused the dev-server to exit with code 1 mid-startup. Restore master's simpler form (hostPort=4200) and rely on the existing e2e-angular parallelism=1 workflow rule to avoid on-agent port collisions instead.
The angular host generator hardcoded port 4200 in three places and
add-e2e.ts hardcoded ciBaseUrl='http://localhost:4200'. That made it
impossible for parallel e2e tests to run safely on the same agent —
two hosts both fight for 4200.
- Add 'port' to host schema (mirrors application's existing port
option), threaded through applicationGenerator, setupMf, the rspack
remote port computation, and the final serve.options.port write.
- Fix add-e2e.ts so e2eCiBaseUrl uses e2ePort instead of a literal
4200, matching e2eWebServerAddress.
- Drop the e2e-angular parallelism=1 workflow rule we added to
paper over the collision; angular e2e now runs at the default 2/4
parallelism alongside other projects.
- module-federation-lib.test.ts reserves a port and passes
--port=${hostPort} to the host generator.
…yability rspack test
The third 'it' block ('should support host and remote with library type
var') uses runCommandUntil without an explicit timeout, falling back to
the 30s default. Under e2e-ci parallelism=2/4 this is too tight for an
rspack module-federation cypress run and the test times out. Match the
120s timeout the angular module-federation tests already use.
…operability Same flake pattern as the independent-deployability bump — runCommandUntil defaults to 30s which is too tight for an rspack/webpack module-federation cypress run under parallelism=2/4. Bump to 120s, matching other MF tests.
The reservePort() calls were never threaded into the host generator calls. Without --port, the generator defaults host serve to 4200, which collides with other parallel tests using port 4200.
Move reservePort default start from 4200 to 6100 so reserved ports never collide with parallel tests that generate apps using the framework default 4200 (e.g. @nx/react:app, @nx/angular:app without --devServerPort). Also pin a port in workspace-legacy.test.ts which was generating a react app without an explicit port and racing on 4200.
core-webpack-ssr.test.ts: SSR `server` target binds its own socket via
`@nx/{webpack,rspack}:ssr-dev-server` and defaults to 4200. Pin it
alongside `serve` so parallel tests don't collide.
cache-no-daemon.test.ts: remote-cache.js fixture listens on $PORT (defaults
to 3000). Reserve a unique port and pass via fork env so multiple parallel
agents on the same host don't fight for 3000.
…dle-project The setup phase ran 'gradlew help --task :init' and 'gradlew init' with the default daemon mode, leaving a long-lived gradle daemon holding inotify watches on the project dir. Later 'nx import' steps run 'git filter-branch --tree-filter' on the same tree, and the daemon's concurrent filesystem activity made git fail with 'Unable to read current working directory'. Adding --no-daemon to setup-time gradle commands keeps the daemon-isolation property of the parallel-e2e setup intact (we can't 'gradlew --stop' globally without killing other concurrent tests' daemons) while preventing setup-spawned daemons from outliving setup. Test-body gradle calls still use the daemon for speed.
…dle-project [Self-Healing CI Rerun]
…e 'listening on' The runCommandUntil match string in module-federation-host-remote.test.ts waited for 'listening on localhost:$port', but the current nx serve output is 'All remotes started, server ready at http://localhost:$port'. The serve actually succeeds, but the test never recognises that and times out on the wrong matcher.
There was a problem hiding this comment.
✅ The fix from Nx Cloud was applied
We introduced a separate reservePort() call for the esbuild standalone app serve so each serve command holds its own atomic lock-file-backed port, rather than reusing the already-released appPort. This prevents a parallel e2e task on the same agent (enabled by this PR's increased parallelism) from binding to the freed OS port before the second serve starts, which was causing the "Port already in use" failure.
Tip
✅ We verified this fix by re-running e2e-angular:e2e-ci--src/projects-build-and-test.test.ts.
Suggested Fix changes
diff --git a/e2e/angular/src/projects-build-and-test.test.ts b/e2e/angular/src/projects-build-and-test.test.ts
index 6e1b0edb..d61a092d 100644
--- a/e2e/angular/src/projects-build-and-test.test.ts
+++ b/e2e/angular/src/projects-build-and-test.test.ts
@@ -105,15 +105,16 @@ describe('Angular Projects - Build and Test', () => {
// port and process cleanup
await killProcessAndPorts(process.pid, appPort);
+ const esbuildPort = await reservePort();
const esbProcess = await runCommandUntil(
- `serve ${esbuildStandaloneApp} -- --port=${appPort}`,
+ `serve ${esbuildStandaloneApp} -- --port=${esbuildPort}`,
(output) =>
output.includes(`Application bundle generation complete`) &&
- output.includes(`localhost:${appPort}`)
+ output.includes(`localhost:${esbuildPort}`)
);
// port and process cleanup
- await killProcessAndPorts(esbProcess.pid, appPort);
+ await killProcessAndPorts(esbProcess.pid, esbuildPort);
}, 1000000);
it('should successfully work with rspack for build', async () => {
🎓 Learn more about Self-Healing CI on nx.dev
Co-authored-by: FrozenPandaz <FrozenPandaz@users.noreply.github.com>
This reverts commit 573559a.
Current Behavior
Many e2e-ci projects are pinned to serial execution on each CI agent via project-specific overrides in
.nx/workflows/dynamic-changesets.yaml:e2e-gradle,e2e-angular,e2e-node,e2e-react→ 1 task perlinux-extra-largee2e-next,e2e-plugin→ 2 tasks perlinux-extra-largee2e-release,e2e-nuxt,e2e-web,e2e-eslint,e2e-remix,e2e-cypress,e2e-docker,e2e-js,e2e-nx,e2e-nx-init,e2e-dotnet,e2e-workspace-create,e2e-rollup→ 1 on large / 2 on xlargeThese overrides exist because the underlying tests collide on hardcoded ports when run concurrently on the same agent.
Expected Behavior
A single rule lets every
e2e-ci**task run with parallelism 2 onlinux-largeand 3 onlinux-extra-large, shrinking total agent time and removing per-project special cases.To make that safe, this PR:
reservePort()/reservePorts()toe2e/utils/port-utils.ts. The helper claims a port via an atomicO_EXCLlock file under/tmp/nx-e2e-port-locks, so two parallel processes on the same agent cannot reserve the same port. The existinggetAvailablePort()is deprecated — probing port 0 and then binding it seconds later opens a TOCTOU race where another e2e task could grab the same port in between.reservePort():e2e/vite/src/vite.test.ts—serve-statictest (was 8081)e2e/web/src/web-webpack.test.ts— webpack ssl serve (was 5000)e2e/storybook/src/storybook-angular.test.tsandstorybook-nested.test.ts(was 4400)e2e/node/src/node-server.test.ts— express/fastify/koa/nest framework tests (was 7000–7003) and waitUntilTargets test (was 4444/4445)e2e/node/src/node-esm-support.test.ts— 9 tests previously defaulting to port 3000e2e/cypress,e2e/playwright, and the React module-federation tests still hardcode ports through their generators; those will be addressed in follow-up PRs as CI surfaces collisions.Related Issue(s)
N/A — internal CI optimization.