Skip to content

fix(script): correct GLM-4.7 expert_model_parallel_size for single-node 8 GPU#2077

Open
aoshen02 wants to merge 2 commits into
THUDM:mainfrom
aoshen02:fix/glm47-ep-size-mismatch
Open

fix(script): correct GLM-4.7 expert_model_parallel_size for single-node 8 GPU#2077
aoshen02 wants to merge 2 commits into
THUDM:mainfrom
aoshen02:fix/glm47-ep-size-mismatch

Conversation

@aoshen02

Copy link
Copy Markdown
Contributor

Summary

PR #1749 downscaled actor-num-nodes from 2 to 1 (16 GPU → 8 GPU) and tensor-model-parallel-size from 4 to 2, but did not adjust expert-model-parallel-size accordingly.

With expert-tensor-parallel-size=1, expert-model-parallel-size=8, pipeline-model-parallel-size=2:

  • ETP × EP × PP = 1 × 8 × 2 = 16 > world_size (8)
  • Megatron parallel_state.py raises: world_size (8) is not divisible by expert_tensor_model_pipeline_parallel size (16)

Fix

expert-model-parallel-size 8 → 4 so that ETP(1) × EP(4) × PP(2) = 8 = world_size.

Test plan

  • Verify run-glm4.7-30B-A3B.sh launches without assert on single-node 8 GPU

🤖 Generated with Claude Code

aoshen02 and others added 2 commits June 15, 2026 01:55
…de 8 GPU

PR THUDM#1749 downscaled actor-num-nodes from 2 to 1 (16 GPU → 8 GPU) and
tensor-model-parallel-size from 4 to 2, but did not adjust
expert-model-parallel-size accordingly.

With ETP=1, EP=8, PP=2: ETP × EP × PP = 16 > world_size (8), which
causes an assert in Megatron's parallel_state.py:
  "world_size (8) is not divisible by expert_tensor_model_pipeline_parallel size (16)"

Fix: EP 8 → 4 so that ETP(1) × EP(4) × PP(2) = 8 = world_size.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant