Skip to content

Tune cluster liveness polling cadence#2035

Open
AlexCheema wants to merge 1 commit into
codex/event-router-nack-tuningfrom
codex/cluster-liveness-cadence
Open

Tune cluster liveness polling cadence#2035
AlexCheema wants to merge 1 commit into
codex/event-router-nack-tuningfrom
codex/cluster-liveness-cadence

Conversation

@AlexCheema
Copy link
Copy Markdown
Contributor

Why

Snapshot/state reconciliation gets workers caught up correctly, but stale topology still affects placement and dashboard freshness. The original branch reduced the time it takes to notice dead nodes and connection changes. This PR keeps that behavioral tuning isolated from the event-sourcing changes.

How

  • Master now checks node inactivity every 1s instead of every 10s.
  • Master considers a node inactive after 5s instead of 30s.
  • Workers poll connection reachability every 2s instead of every 10s.
  • The values are local variables in each loop so future tuning is explicit.

Tests

  • uv run pytest
  • uv run basedpyright
  • uv run ruff check
  • nix fmt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant