Cluster roots runner by dwoz · Pull Request #69179 · saltstack/salt

dwoz · 2026-05-16T08:12:56Z

What does this PR do?

Add cluster roots runner to manually sync file/pillar roots.

Operator-driven content fan-out for file_roots and pillar_roots. Run on whichever master holds the canonical content; the cluster fans out from there to every peer over the encrypted cluster pub bus. Same transport, chunk format, and apply path as the join-time bulk state-sync. How it works 1. Runner fires 'cluster/runner/sync_roots' to the local event bus with the requested channel list. 2. MasterPubServerChannel.publish_payload intercepts that tag -- instead of broadcasting as a regular cluster event, it spawns _run_root_sync_to_peers(channels). 3. For each peer pusher: * Allocate a fresh session id. * Send a 'cluster/peer/sync-roots-begin' event so the receiver pre-registers a state-sync session -- same contract as the join-reply flow, but the receiver's on_complete is a no-op (this is an ad-hoc push, not a Raft-learner bootstrap). * Stream the requested channels in the standard state-sync chunk format. 4. Receivers apply chunks via the existing _apply_state_sync_chunk path -- no new install code; the channels are file_roots and pillar_roots and install_root_chunk handles both. Live smoke verified on a 5-master cluster: added new content to m1's srv/salt (including a nested subdirectory) and srv/pillar after the cluster was up, ran cluster.sync_roots, all 4 peers had the new files within ~8 seconds. Nested directories preserved. What's NOT in this slice * No completion feedback -- runner returns immediately with status='fan-out initiated'. Operators tail each peer's log for the 'state-sync ... installed N items' lines to confirm delivery. * No re-sync trigger -- operator-driven only. No filesystem watcher, no periodic poll. Tests * test_sync_roots_rejects_invalid_roots -- input validation * test_sync_roots_no_cluster_id_is_skip -- non-cluster master returns a structured skip rather than firing a meaningless event * test_sync_roots_fires_local_event -- happy path: event fires with the resolved channel list * test_sync_roots_file_only_filters_channels -- channel filter honoured

End-to-end integration coverage on the 3-master isolated-FS cluster. Complements the unit tests in tests/pytests/unit/runners/test_cluster_runner.py (event firing, input validation) with full daemon-driven runs that exercise the runner -> master event bus -> publish_payload intercept -> _run_root_sync_to_peers -> peer state-sync chunk install path. Two scenarios: * test_isolated_sync_roots_runner_propagates_content After the cluster is steady-state, write new content to master_1's file_roots and pillar_roots, run salt-run cluster.sync_roots, poll master_2 and master_3 until both files appear (or 30s elapses), and assert the marker round-trips through encrypted state-sync. Distinct from test_isolated_late_joiner_receives_file_and_pillar_roots which covers JOIN-time bulk sync. * test_isolated_sync_roots_runner_file_only Pins the channels= filter: roots=file syncs only file_roots; the pillar tree on peers remains untouched. Operator escape hatch against accidentally fanning out secret pillar data when only an SLS update is intended. Pre-existing trivial change: the commented-out warning log line at the top of publish_payload (one of those //log it all// debug crutches) is removed since the function already gets per-event log output via the cluster-event broadcast path and the targeted publish_payload branches. Diagnosis note for future debuggers While bringing these tests up I hit a confusing failure where the runner fired the event, master_1's daemon broadcast it as cluster/event/127.0.0.1/cluster/runner/sync_roots, but the intercept never ran. Root cause: salt-factories had cached /tmp/stsuite/scripts/cli_salt_master.py pointing at a *different* worktree (saw a stale CODE_DIR=.../masterbug entry), so the test daemons loaded an older copy of salt that didn't have the publish_payload intercept. Clearing /tmp/stsuite/scripts fixed it. Worth knowing if you see 'feature works in unit test but integration test silently misses it' — check the factory cache first.

Each ring becomes its own Raft group with independent leader, log, and voter set. The cluster log carries a RingRegistryStateMachine and a RoutingStateMachine that name rings and route data types to them; per-ring membership and policy live in that ring's own log. Operator runners drive the lifecycle end to end: cluster.ring_create / ring_destroy / ring_set cluster.route_set / route_clear / routes / rings cluster.shed_unowned / shed_unowned_all / shed_status cluster.collect_from_peers cluster.migrate_jobs_to_cache shed and collect reuse the existing sync-roots wire format with a bank: channel prefix so arbitrary cache banks travel over state-sync chunks. The salt_cache returner replaces the implicit local-jobs layout with an explicit cache-backed one (jobs/loads, jobs/minions, jobs/returns/<jid>, jobs/endtimes, jobs/nocache) so the migration has a well-defined bank set to move. Gate sites at salt/master.py call ring_membership.owns_for(opts, data_type, key); a non-member master no-ops the write and rate-limits a WARN per (data_type, ring, reason) bucket. A delegate-on-miss event gives operators a safety net for asymmetric topologies where traffic can't reach a ring member directly. Per-ring voter-health watchdog with (group_id, peer_id) cooldown keying keeps each ring's quorum healthy independently. cluster-health.json is now structured per ring. Sentinel writes (shed-status, collect-status, cluster-health) go through salt.utils.atomicfile.atomic_open so a concurrent reader or doubled fan-out never sees partial JSON. The publish daemon dedups self.pushers by (pull_host, pull_port) to stop the shed fan-out from being delivered twice.

The five operator-facing runners (cluster.ring_create, ring_destroy, route_set, route_clear, ring_set) propose entries through the cluster Raft log, and that only works on the leader. The runner intercept in MasterPubServerChannel.publish_payload dispatched locally only — when the operator invoked the runner on a non-leader master the propose raised RuntimeError, the exception was swallowed by log.exception, and no fan-out reached the actual leader. The intercept now also schedules a cluster_aes-encrypted cluster/peer/multi-ring-request event to every peer. Each peer re-dispatches through _handle_multi_ring_runner_event; only the master that is the current Raft leader appends an entry, followers log "not leader" and skip. No double-commit because Raft has one leader at any moment. A new unit-test file pins the wire shape (one publish per pusher, cluster_aes round-trip, broken-pusher tolerance, no-peers no-op, missing-secret silent skip). Also raise per-test timeouts on slow tests that were tripping the global 90-second pytest-timeout under coverage tracing on loaded GHA runners. The workflow's automatic --lf rerun absorbs the resulting slow-but-passes case so a transient runner-load spike no longer fails the job: * test_check_minions_grain_target_10000 — 240 s (test budget is 180 s; the global 90 s default fired before the budget assertion could evaluate). * test_r_state_apply_logical_resource_no_state_module — 180 s (three sequential state.apply runs against dummy resources). * test_auth_not_starved_with_routing — 180 s (7-daemon saturation test). * test_orchestration_with_pillar_dot_items — passes _timeout=120 to salt_run_cli.run() so the 30 s factory default doesn't kill the recursive saltutil.runner('pillar.show_pillar') invocation. Tornado timing flake: test_when_is_timed_out_is_set_before_other_events_are_completed used gather_timeout=0.1 s and event_timeout=0.15 s. The 50 ms ordering window was regularly inverted by GHA scheduler jitter; bumped to 1 s and 2 s respectively, preserving the test's logical invariant. Windows assertion drift: test_win_user_present_new_password now checks changes.get("passwd") == "XXX-REDACTED-XXX" instead of strict-equal so the new password_changed / lstchg keys surfaced by the recent shadow.info refactor don't fail the test.

Daniel A. Wozniak added 2 commits May 16, 2026 01:02

dwoz requested a review from a team as a code owner May 16, 2026 08:12

dwoz had a problem deploying to ci May 16, 2026 08:13 — with GitHub Actions Error

dwoz added the test:full Run the full test suite label May 16, 2026

dwoz temporarily deployed to ci May 16, 2026 08:13 — with GitHub Actions Inactive

dwoz temporarily deployed to ci May 16, 2026 08:30 — with GitHub Actions Inactive

dwoz temporarily deployed to ci May 16, 2026 09:15 — with GitHub Actions Inactive

dwoz added the test:coverage label May 17, 2026

dwoz temporarily deployed to ci May 17, 2026 09:19 — with GitHub Actions Inactive

dwoz temporarily deployed to ci May 17, 2026 09:30 — with GitHub Actions Inactive

dwoz temporarily deployed to ci May 17, 2026 09:45 — with GitHub Actions Inactive

dwoz temporarily deployed to ci May 17, 2026 23:40 — with GitHub Actions Inactive

dwoz temporarily deployed to ci May 17, 2026 23:52 — with GitHub Actions Inactive

dwoz temporarily deployed to ci May 18, 2026 00:17 — with GitHub Actions Inactive

dwoz had a problem deploying to ci May 18, 2026 00:17 — with GitHub Actions Failure

dwoz temporarily deployed to ci May 18, 2026 03:48 — with GitHub Actions Inactive

dwoz temporarily deployed to ci May 18, 2026 07:29 — with GitHub Actions Inactive

dwoz temporarily deployed to ci May 18, 2026 07:42 — with GitHub Actions Inactive

dwoz temporarily deployed to ci May 18, 2026 07:58 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster roots runner#69179

Cluster roots runner#69179
dwoz wants to merge 4 commits into
saltstack:3008.xfrom
dwoz:cluster-max-voters-2

dwoz commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dwoz commented May 16, 2026

What does this PR do?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant