Skip to content

[chore] execute integration tests in CI#3194

Open
Kielek wants to merge 24 commits intoopen-telemetry:mainfrom
Kielek:integrationtests
Open

[chore] execute integration tests in CI#3194
Kielek wants to merge 24 commits intoopen-telemetry:mainfrom
Kielek:integrationtests

Conversation

@Kielek
Copy link
Copy Markdown
Member

@Kielek Kielek commented Apr 2, 2026

Changes

[chore] execute integration tests in CI
Adjust tests to new sem conv.
Make some rules not so strict. See #3194 (comment)
Add Healtchecks. See #3194 (comment)
I doubt that we need to make any helm changes.
Executed couple of times locally. I hope that there is no more flacky test, but it might occur do due too strict rules for the categorization purposes.

Future considerations:
Use in the integration tests already prepared docker images in another steps. It should significantly reduce time on the job.

Merge Requirements

For new features contributions, please make sure you have completed the following
essential items:

  • [ ] CHANGELOG.md updated to document new feature additions
  • [ ] Appropriate documentation updates in the docs
  • [ ] Appropriate Helm chart updates in the helm-charts

Maintainers will not merge until the above have been completed. If you're unsure
which docs need to be changed ping the
@open-telemetry/demo-approvers.

@Kielek Kielek requested a review from a team as a code owner April 2, 2026 04:49
@Kielek Kielek force-pushed the integrationtests branch from e88981f to d100516 Compare April 2, 2026 05:36
@Kielek Kielek marked this pull request as draft April 2, 2026 05:48
@Kielek Kielek force-pushed the integrationtests branch 3 times, most recently from 3bbf5e0 to 4325057 Compare April 2, 2026 09:18
@Kielek
Copy link
Copy Markdown
Member Author

Kielek commented Apr 7, 2026

Some reason behind disabling span type checking:
image

@github-actions github-actions Bot added the helm-update-required Requires an update to the Helm chart when released label Apr 7, 2026
@Kielek Kielek force-pushed the integrationtests branch from d3a7c6e to f798a01 Compare April 7, 2026 10:00
@Kielek
Copy link
Copy Markdown
Member Author

Kielek commented Apr 7, 2026

Healthchecks notes

Problem

The traceBasedTests container started executing tests as soon as Docker reported services as started, not ready. JVM-based services (ad, fraud-detection) take 15–30s to initialize, and other services had varying startup times. This caused intermittent test failures with empty "results": {} — the gRPC/HTTP trigger fired before the target service was accepting connections.

Solution

All changes are isolated to docker-compose-tests.ymlno changes to docker-compose.yml or any Dockerfile.

The file leverages Docker Compose's service merge behaviour to inject healthcheck blocks into the base services via the include directive, without touching the base file.

Healthchecks added

Service Image type Check command
ad JVM / Alpine bash /dev/tcp/localhost/9555
cart .NET / Alpine nc -z localhost 7070
currency C++ / Alpine nc -z localhost 7001
email Ruby / Alpine ruby -e "require 'socket'; TCPSocket.new('localhost', 6060).close"
frontend Next.js / distroless Node 24 /nodejs/bin/node HTTP GET to http://frontend:8080/
llm Python / Alpine python3 -c "import socket; socket.create_connection(('localhost',8000),2).close()"
payment Node.js / distroless Node 22 /nodejs/bin/node TCP connect to port 50051
product-reviews Python / Alpine nc -z localhost 3551
quote PHP / Alpine php -r "fsockopen('localhost', 8090) or die('fail');"
recommendation Python / Alpine nc -z localhost 9001
postgresql postgres:17 pg_isready -U root
valkey-cart Valkey / Alpine valkey-cli ping

Wait-for sidecars (distroless services)

Three services use distroless images with no shell or network tools available, so healthchecks cannot be defined on the container itself. Instead, lightweight sidecar containers are added that poll readiness and exit 0, and traceBasedTests waits on them with condition: service_completed_successfully:

Sidecar Target Image used
checkout-ready checkout:5050 (gRPC) ghcr.io/grpc-ecosystem/grpc-health-probe:v0.4.39
product-catalog-ready product-catalog:3550 (gRPC) ghcr.io/grpc-ecosystem/grpc-health-probe:v0.4.39
shipping-ready shipping:50050 (HTTP/TCP) busybox:1.37

Notable findings

  • Alpine busybox nc -z returns non-zero on ports that don't speak raw TCP (HTTP, Ruby Sinatra, Python Flask, PHP built-in server). Language-native socket checks (ruby/python3/php) are used for those services instead.
  • Distroless Node images (gcr.io/distroless/nodejs*) do not expose node on $PATH; the binary is at /nodejs/bin/node.
  • Next.js (frontend) binds to the container's eth0 interface, not loopback — 127.0.0.1 connections are refused. The healthcheck connects via the service's own DNS name (http://frontend:8080/).

Future

Part of the healthcheck potentially can be transferred to the production code. It is far beyond this PR.

@Kielek Kielek marked this pull request as ready for review April 7, 2026 11:09
@puckpuck
Copy link
Copy Markdown
Contributor

puckpuck commented Apr 9, 2026

Because tracetesting is no longer actively maintained, the time it added to our CI pipeline, and the flakyness associated with the tests themsevles we removed tracetesting from CI.

We did talk about this during the SIG call, and the call to action is to create a small lightweight framework we can own as part of the Demo sig used to test for the existence of traces in Jaeger. We already have a proof of concept working locally for this.

@Kielek
Copy link
Copy Markdown
Member Author

Kielek commented Apr 9, 2026

@puckpuck, I agree that is great direction to drop tracetests. For now, there is not big issues with executing it, until better solution is settled. The reason for enabling it as is:
There is tons of vulnerable dependencies in this repository. Merging it without (mostly by dependabot) without automated tests is time consuming/error prone.

I would treat it as intermittent step only.

@github-actions
Copy link
Copy Markdown

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions Bot added the Stale label Apr 17, 2026
@Kielek
Copy link
Copy Markdown
Member Author

Kielek commented Apr 17, 2026

@puckpuck, do you think it is worth to narrow the scope of this PR just to fix integration tests?
Will it help you with rewriting it to the new solution?

@Kielek Kielek removed the Stale label Apr 17, 2026
@github-actions
Copy link
Copy Markdown

This PR was marked stale due to lack of activity. It will be closed in 7 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

helm-update-required Requires an update to the Helm chart when released

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants