Skip to content

fix(master): add --no-master flag and prevent --no-api nodes from self-electing#2135

Open
5F0jd2vLq54RerYW wants to merge 3 commits into
exo-explore:mainfrom
5F0jd2vLq54RerYW:pr/election-no-master
Open

fix(master): add --no-master flag and prevent --no-api nodes from self-electing#2135
5F0jd2vLq54RerYW wants to merge 3 commits into
exo-explore:mainfrom
5F0jd2vLq54RerYW:pr/election-no-master

Conversation

@5F0jd2vLq54RerYW
Copy link
Copy Markdown

Summary

Two related fixes for node role management in multi-node clusters:

1. --no-master flag

New CLI flag that skips Master instantiation entirely. Previously every node always started a Master even when it was intended to run as a pure worker (e.g. a GPU worker with no API surface). With --no-master, the Master event loop, command processor, and download coordinator are not started.

2. --no-api nodes must not self-elect

When a node is started with --no-api (spawn_api=False), it should not be eligible to become master — an API-less node winning an election means no API is reachable in the cluster. Previously, a --no-api node in a solo partition would self-elect (the bully algorithm has no other candidates). This PR sets is_candidate = args.spawn_api and not args.no_master, which gives non-candidate nodes a seniority of -1 so any API-bearing node beats them.

A complementary guard in election.py ensures non-candidate nodes re-propose the last known master during solo partitions instead of winning by default.

Flag precedence

Flags Result
--no-master master=None, is_candidate=False
--no-api is_candidate=False, seniority=-1
--force-master seniority=1_000_000 (only applies when is_candidate=True)
--no-master --force-master master=None (--no-master wins)
--no-api --force-master seniority=-1 (--no-api wins)

Tests

  • 14 existing election tests (src/exo/shared/tests/test_election.py) all pass
  • Added exo_rs stub to src/exo/shared/tests/conftest.py so tests run without a compiled Rust binary

🤖 Generated with Claude Code

Codex and others added 3 commits May 31, 2026 17:17
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…f-electing

Adds --no-master CLI flag that skips Master instantiation entirely, and
couples election candidacy to the CLI flags to prevent worker-only nodes
from winning elections during solo partitions.

Changes:
- src/exo/main.py: Add --no-master flag; skip Master() when set.
  Set is_candidate = args.spawn_api and not args.no_master so that
  --no-api nodes (spawn_api=False) and --no-master nodes never self-elect.
- src/exo/shared/election.py: Store is_candidate on the Election instance.
  In _election_status(), non-candidate nodes re-propose the last known master
  instead of winning by default during solo partitions.

Flag precedence (per spec):
- --no-master wins over --force-master: no Master is instantiated, no election.
- --no-api wins over --force-master: is_candidate=False, seniority=-1.

Also adds exo_rs stub to src/exo/shared/tests/conftest.py so election tests
run without a compiled Rust extension binary.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@5F0jd2vLq54RerYW
Copy link
Copy Markdown
Author

Note: nix is not available in this dev environment, so I ran ruff format (which is what nix fmt invokes for Python files per the treefmt config in flake.nix) and pushed the formatting commit. The Rust/Svelte/TOML formatters don't apply to this PR (Python-only change).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant