Skip to content

[ENG-3964]: Set worker-thunderbolt to process pool#333

Open
fpopa wants to merge 1 commit into
mainfrom
filip/eng-3964-thunderbolt-process-pool
Open

[ENG-3964]: Set worker-thunderbolt to process pool#333
fpopa wants to merge 1 commit into
mainfrom
filip/eng-3964-thunderbolt-process-pool

Conversation

@fpopa
Copy link
Copy Markdown
Contributor

@fpopa fpopa commented May 26, 2026

Summary

  • Flips worker-thunderbolt.temporal.workerPoolType from thread to process in charts/datafold/values.yaml, matching the other long-LLM workers (worker-compute, worker-highmem, worker-storage[-high], worker-monitors).
  • Bumps charts/datafold/Chart.yaml from 0.10.89 to 0.10.90.

Why

ENG-3935 surfaced heartbeat timeouts on worker-thunderbolt under heavy concurrent agent dispatches — GIL contention in thread pool mode starves the heartbeat daemon thread. Sibling long-LLM workers all use process pool, where the SDK's SharedHeartbeatSender routes heartbeats via a cross-process queue and is decoupled from any one process's GIL. worker-thunderbolt was inadvertently left on thread.

A live saas override (spec.components.worker-thunderbolt.rawValues.temporal.workerPoolType: "process") has been applied since 2026-05-20 with no regressions over the bake-in window.

Once this is merged and the operator picks it up, the per-cluster rawValues override on saas (and ecolab, if applied) can be dropped.

Ref: ENG-3964, ENG-3935, #331.

Test plan

  • Render the chart and confirm only the worker-thunderbolt pool type changes.
  • After merge + operator pickup on saas, verify no heartbeat timeouts and agent runs complete normally.
  • Drop the per-cluster rawValues.temporal.workerPoolType override on saas (and ecolab if applicable).

🤖 Generated with Claude Code

Brings worker-thunderbolt in line with the other long-LLM workers
(worker-compute/highmem/storage[-high]/monitors), which all run the
process pool to avoid GIL contention starving the heartbeat thread
under heavy concurrent agent dispatches. Saas has been running this
override since 2026-05-20 without regression.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@fpopa fpopa requested review from dmigo and gtoonstra May 26, 2026 11:00
@fpopa fpopa enabled auto-merge (squash) May 26, 2026 11:00
Comment thread charts/datafold/values.yaml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants