Skip to content

Fix localizer multi-GPU failure (model.to vs device_map)#404

Merged
KartikP merged 1 commit into
mainfrom
fix-localizer-device-map
Jun 3, 2026
Merged

Fix localizer multi-GPU failure (model.to vs device_map)#404
KartikP merged 1 commit into
mainfrom
fix-localizer-device-map

Conversation

@KartikP

@KartikP KartikP commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

On multi-GPU scoring tiers, HuggingfaceSubject loads models with device_map='auto' (sharded across GPUs). The localizer's extract_representations then called model.to(device), which tries to consolidate the sharded model onto a single GPU:

  • 8B (fp32 ~32GB): OOMs trying to move the whole model onto one ~22GB A10G.
  • 4B: partial consolidation leaves layers split, then the forward pass raises RuntimeError: Expected all tensors to be on the same device (cuda:0 and cuda:1).

Both surfaced as scoring failures on the 4-GPU medium tier (jenkins run 190); 0.6B/1.7B were unaffected because they land on the single-GPU small tier (no device_map, so the .to() is a no-op).

Fix: skip model.to(device) when the model is already dispatched via device_map (hf_device_map set). Inputs are already sent to self.device (the input-embedding shard), so the sharded forward works and 8B can use all GPUs instead of OOMing on one. Single-GPU loads are unchanged.

@KartikP KartikP merged commit abcc58a into main Jun 3, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant