Skip to content

Add per-LitAPI health check endpoints#712

Open
discobot wants to merge 1 commit into
Lightning-AI:mainfrom
discobot:fix/605-per-api-health
Open

Add per-LitAPI health check endpoints#712
discobot wants to merge 1 commit into
Lightning-AI:mainfrom
discobot:fix/605-per-api-health

Conversation

@discobot

Copy link
Copy Markdown

What does this PR do?

Fixes #605.

Each LitAPI now exposes its own health check at {api_path}{healthcheck_path} — e.g. /api1/health, /api2/health (and /api1/my_server/health with a custom healthcheck_path). The per-API check reports worker readiness for that API's workers only, plus its user-defined health() hook, returning the same 200 "ok" / 503 "not ready" contract as the global endpoint. Auth matches the API's predict route (setup_auth(lit_api)).

The per-API logic lives in a single helper, LitServer._check_lit_api_health, and the global healthcheck endpoint now aggregates it across all APIs, so the two can't drift. Worker keys are matched by exact {endpoint}_{worker_id} parsing rather than a prefix check so an API named predict doesn't pick up predict_v2's workers. Global endpoint behavior is unchanged, including 503 on empty workers_setup_status and calling the health() hook even before workers are ready.

Two existing tests stubbed workers_setup_status with a synthetic "worker-0" key, a format launch_inference_worker never produces; they now use the real "predict_0" format. New tests cover a two-API server (healthy API 200, unhealthy API 503, global 503) and the custom healthcheck_path + custom api_path combination, parametrized over zmq like the neighboring health tests.

Before submitting
  • Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

As a user running multiple LitAPIs behind one server, I can now ask "is this specific API healthy?" by hitting /api1/health instead of only the aggregate /health, which masks which API is degraded. This makes per-API liveness/readiness probes (Kubernetes, load balancers) and targeted alerting possible without custom workaround endpoints.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Yes 🙃


🤖 Implemented with the help of Claude Code.

Each LitAPI now exposes its own health check at {api_path}{healthcheck_path}, e.g. /predict/health, reporting worker readiness and the user-defined health() hook for that API only. The global healthcheck endpoint reuses the same per-API helper and aggregates across all APIs, so its behavior is unchanged. Two existing tests used a synthetic worker status key format the server never produces; they now use the real {endpoint}_{worker_id} format. Closes Lightning-AI#605.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Access Health check endpoint for each LitAPI independently

1 participant