Skip to content

feat(testing): add upgrade version integration test workflow#4702

Open
Xeoneid wants to merge 7 commits into
Dokploy:canaryfrom
Xeoneid:canary
Open

feat(testing): add upgrade version integration test workflow#4702
Xeoneid wants to merge 7 commits into
Dokploy:canaryfrom
Xeoneid:canary

Conversation

@Xeoneid

@Xeoneid Xeoneid commented Jun 26, 2026

Copy link
Copy Markdown

Summary

Recent releases have surfaced upgrade regressions — services broken, logs lost, containers failing to start (see #4049, #4245, #4367). This PR adds a manually-triggered GitHub Actions workflow that validates Dokploy's upgrade path across multiple version pairs, catching these regressions before they reach users.

What the workflow does

  1. Generates an upgrade pair matrix from Docker Hub tags: every stable tag in [floor_version, target_version) is paired with target_version (defaults to the highest available tag). Each pair becomes an independent matrix job.
  2. Installs Dokploy at version A via the official install.sh using the DOKPLOY_VERSION env var — no "install latest then downgrade" step, which would risk running version B's migrations against version A's schema.
  3. Creates test resources via the tRPC API: a project, PostgreSQL 15, MongoDB 7.0, and three web apps (nginx, echo-server, traefik/whoami).
  4. Asserts pre-upgrade health — all Swarm services must be 1/1 and all resource statuses done.
  5. Upgrades to version B via docker service update --image.
  6. Asserts post-upgrade health — same checks, ensuring no service disruption or data loss.
  7. Dumps diagnostics (service list, logs, signup response) on any failure.

Design decisions

  • Consecutive pairs (v[i] → v[i+1]) instead of "every version → latest" — mirrors real-world sequential upgrades and keeps the matrix bounded.
  • fail-fast: false — all pairs run independently so failures are isolated to specific version transitions.
  • workflow_dispatch only — manual trigger to avoid burning Actions minutes on every push.
  • Test credentials (CiTest1234!, CiPg1pass, CiMg1pass) are intentional throwaway values on ephemeral CI runners only.

Test plan

  • Trigger via Actions → Upgrade Integration Test → Run workflow
  • Verify the matrix generates expected pairs for the given floor_version / target_version
  • Confirm pre-upgrade health checks pass for at least one pair
  • Confirm post-upgrade health checks pass — all services 1/1, all statuses done
  • Verify the "Dump state on failure" step produces useful diagnostics on a forced failure

Related

Upgrade regressions that motivated this workflow:

Xeoneid and others added 7 commits June 26, 2026 13:11
…ia DOKPLOY_VERSION

Previous run failed with 'This script must be run as root'. install.sh also
respects DOKPLOY_VERSION and ADVERTISE_ADDR env vars, so we can install
version A directly instead of installing latest and downgrading (which
would risk B's migrations running on A's expected schema).
…nse formats

In Dokploy <= ~v0.24, project.create returns a flat object with just
projectId at the top level; no environment concept exists yet, so
resources (postgres, mongo, apps) are created with projectId.

In Dokploy >= ~v0.25, project.create returns nested {project, environment}
objects and resources require environmentId.

Previous workflow always used .project.projectId and .environment.environmentId,
both of which evaluated to null on old versions, causing curl to exit with
code 22 (HTTP error) on every subsequent resource-create call.

Fix: extract PROJECT_ID with a // fallback, check ENV_ID nullness, and
build a SCOPE fragment (either 'environmentId' or 'projectId') used in
all resource-creation tRPC calls.
…reate

v0.20.x used a completely different postgres/mongo/app API schema —
projectId fallback returned boolean false, not the expected resource object.
Since the goal is testing v0.29.4+, drop the old-API compatibility shim
and fail fast with a clear message if environmentId is missing.

Also: default floor_version changed from v0.20.0 to v0.29.4 to reduce
the number of matrix jobs and focus on the stable modern API surface.
The create step was calling postgres.start / mongo.start immediately after
create. In Dokploy, .start only scales an already-deployed Swarm service;
on a freshly created resource the service doesn't exist yet, so the server
ran 'docker service scale ci-pg-db-xxx=1' against a missing service and
returned a 500 (curl -sf → exit 22 → step failure).

.deploy is the mutation that actually builds and creates the Swarm service.
Applications already used .deploy correctly; this aligns the databases.
- Replace "every version → latest" with consecutive pairs (v[i] → v[i+1])
- Remove curated v0.25→v0.26 pair logic
- Remove extra_pairs workflow_dispatch input
- Replace second form field (extra_pairs) with target_version
- target_version sets version B ("to"); empty = highest available tag
- Pair every tag in [floor_version, target_version) with target_version
@Xeoneid Xeoneid marked this pull request as ready for review June 26, 2026 22:35
@Xeoneid Xeoneid requested a review from Siumauricio as a code owner June 26, 2026 22:35
@Xeoneid Xeoneid changed the title Add upgrade integration test workflow feat(testing): add upgrade version integration test workflow Jun 26, 2026
@dosubot dosubot Bot added size:XS This PR changes 0-9 lines, ignoring generated files. pr-open labels Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-open size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants