Skip to content

[ENG-3846]: Add liveness probe for temporal workers#329

Open
gtoonstra wants to merge 1 commit into
mainfrom
gerard-eng-3846-improve-k8s-broken-pool-behavior
Open

[ENG-3846]: Add liveness probe for temporal workers#329
gtoonstra wants to merge 1 commit into
mainfrom
gerard-eng-3846-improve-k8s-broken-pool-behavior

Conversation

@gtoonstra
Copy link
Copy Markdown
Collaborator

Summary

  • Adds a liveness section to worker-temporal/values.yaml (enabled by default, port 8091, 60s initial delay, 20s period, 5s timeout, 3 failures before restart)
  • Sets TEMPORAL_LIVENESS_PORT env var so the worker starts the /livez HTTP server
  • Exposes port 8091 as a named liveness container port
  • Wires up a livenessProbe using httpGet to /livez — K8s will restart the pod when the worker reports unhealthy (worker stopped, fatal error, or broken process pool)

Depends on datafold/datafold#12364.

Test plan

  • Deploy to staging and confirm livenessProbe shows Success in pod events
  • Confirm workers restart when the process pool is broken (per the test steps in the linked PR)
  • Verify existing metrics port is unaffected when both metrics.enabled and liveness.enabled are true

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown

🔍 Kubeconform Validation Results

All cloud provider configurations passed Kubernetes API schema validation!

Cloud Provider Status
AWS ✅ Passed
GCP ✅ Passed
Azure ✅ Passed

The rendered Kubernetes manifests conform to the Kubernetes API specification across all cloud providers.

Wires up the /livez HTTP endpoint (datafold/datafold#12364) into the
worker-temporal Helm chart: exposes port 8091, sets TEMPORAL_LIVENESS_PORT,
and configures a K8s livenessProbe that restarts the pod when any worker
stops running, a fatal error occurs, or the process pool breaks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@gtoonstra gtoonstra force-pushed the gerard-eng-3846-improve-k8s-broken-pool-behavior branch from 9620731 to 5a50eac Compare May 13, 2026 18:53
@github-actions
Copy link
Copy Markdown

🔍 Kubeconform Validation Results

All cloud provider configurations passed Kubernetes API schema validation!

Cloud Provider Status
AWS ✅ Passed
GCP ✅ Passed
Azure ✅ Passed

The rendered Kubernetes manifests conform to the Kubernetes API specification across all cloud providers.

@gtoonstra gtoonstra changed the title [ENG-3846]: Add liveness probe for temporal workers [ENG-3903]: Verify Slack connection before workspace scan May 18, 2026
@gtoonstra gtoonstra changed the title [ENG-3903]: Verify Slack connection before workspace scan [ENG-3846]: Add liveness probe for temporal workers May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant