fix: prefer 192.168.100.x subnet for TB4 ring — explicit priority 0 beats bridge0 aliases#2099
Open
mpuodziukas-labs wants to merge 3 commits into
Open
Conversation
…eats bridge0 aliases 192.168.100.0/24 is the dedicated raw-TB4 P2P subnet (en1 direct, <0.5ms RTT). macOS classifies it as maybe_ethernet which tied with LAN, causing bridge0 routes (192.168.2.x) to win via tiebreak and produce EHOSTUNREACH in ring subprocesses. Explicit prefix check gives TB4 subnet priority 0, all other types shift up by 1. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…acy TOML cards missing backends field Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
On macOS with Thunderbolt 4 direct link between two Apple Silicon nodes, ring TCP connections fail with
EHOSTUNREACH(errno 65) despite the peer being pingable.Root cause: The
_find_ip_prioritisedfunction inplacement_utils.pyselects ring IPs by interface type. TB4 creates two IP aliases on the same physical interface:192.168.100.x— the dedicated raw-TB P2P subnet, assigned toen1directly192.168.2.x— a compatibility alias, also onen1but routed viabridge0Both IPs have interface type
maybe_ethernet, so they tie in the priority sort. Whenbridge0wins the tiebreak, TCP connections from spawned ring subprocesses getEHOSTUNREACH— macOS routes subprocess traffic differently than the parent process, and the bridge route doesn't resolve for them.Fix
Add an explicit prefix check for
192.168.100.0/24(the standard TB4 P2P subnet) and assign it priority0— always first choice for ring. All other interface types shift up by 1.Verified
192.168.100.1/2subnetmlx-community/Qwen3-0.6B-4bit→ single-node and ring modeAlso included
fix: default backends to [MlxMetal]—ModelCard.backendsis required since Add node backends to model cards #2071 but old TOML cards don't have the field. Default to[MlxMetal]preventsValidationErroron startup when loading custom model cards.