fix(script): correct GLM-4.7 expert_model_parallel_size for single-node 8 GPU by aoshen02 · Pull Request #2077 · THUDM/slime

aoshen02 · 2026-06-15T01:56:15Z

Summary

PR #1749 downscaled actor-num-nodes from 2 to 1 (16 GPU → 8 GPU) and tensor-model-parallel-size from 4 to 2, but did not adjust expert-model-parallel-size accordingly.

With expert-tensor-parallel-size=1, expert-model-parallel-size=8, pipeline-model-parallel-size=2:

ETP × EP × PP = 1 × 8 × 2 = 16 > world_size (8)
Megatron parallel_state.py raises: world_size (8) is not divisible by expert_tensor_model_pipeline_parallel size (16)

Fix

expert-model-parallel-size 8 → 4 so that ETP(1) × EP(4) × PP(2) = 8 = world_size.

Test plan

Verify run-glm4.7-30B-A3B.sh launches without assert on single-node 8 GPU

🤖 Generated with Claude Code

…de 8 GPU PR THUDM#1749 downscaled actor-num-nodes from 2 to 1 (16 GPU → 8 GPU) and tensor-model-parallel-size from 4 to 2, but did not adjust expert-model-parallel-size accordingly. With ETP=1, EP=8, PP=2: ETP × EP × PP = 16 > world_size (8), which causes an assert in Megatron's parallel_state.py: "world_size (8) is not divisible by expert_tensor_model_pipeline_parallel size (16)" Fix: EP 8 → 4 so that ETP(1) × EP(4) × PP(2) = 8 = world_size. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

aoshen02 and others added 2 commits June 15, 2026 01:55

Merge branch 'main' into fix/glm47-ep-size-mismatch

0ec57a9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(script): correct GLM-4.7 expert_model_parallel_size for single-node 8 GPU#2077

fix(script): correct GLM-4.7 expert_model_parallel_size for single-node 8 GPU#2077
aoshen02 wants to merge 2 commits into
THUDM:mainfrom
aoshen02:fix/glm47-ep-size-mismatch

aoshen02 commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aoshen02 commented Jun 15, 2026

Summary

Fix

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant