Skip to content

support mtp_shared_weights#41

Merged
Jintao-Huang merged 4 commits intomodelscope:mainfrom
Jintao-Huang:support_mtp_shared_weights
Apr 20, 2026
Merged

support mtp_shared_weights#41
Jintao-Huang merged 4 commits intomodelscope:mainfrom
Jintao-Huang:support_mtp_shared_weights

Conversation

@Jintao-Huang
Copy link
Copy Markdown
Collaborator

#29

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for shared weights in Multi-Token Prediction (MTP) by decoupling the number of physical layers from the number of unroll steps. Key changes include updating ModelConfig to handle weight sharing logic, modifying gpt_model.py to use unroll steps for loss calculation and state chunking, and updating the MTP layer and patcher to support dynamic layer indexing for rotary embeddings. Feedback suggests refining the initialization logic to prevent accidental activation of MTP when layers are zero and explicitly declaring mtp_unroll_steps in the configuration dataclass.

Comment thread src/mcore_bridge/config/model_config.py
Comment thread src/mcore_bridge/config/model_config.py
@MDR-EX1000
Copy link
Copy Markdown

Any related update to MS-SWIFT? I suggest that the mtp_unroll_steps should be reserved in metadata (i.e. config.json)

@Jintao-Huang
Copy link
Copy Markdown
Collaborator Author

#41

@Jintao-Huang
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for Multi-Token Prediction (MTP) with shared weights by adding mtp_shared_weights and mtp_unroll_steps to the configuration. The logic in gpt_model.py and patcher.py has been updated to use mtp_unroll_steps for loop iterations and loss scaling, allowing the model to unroll more steps than the number of physical layers when weights are shared. Feedback highlights the need to explicitly declare mtp_unroll_steps as a field in the ModelConfig dataclass and to improve the robustness of its initialization to avoid potential TypeError when mtp_num_layers is missing.

Comment thread src/mcore_bridge/config/model_config.py
Comment on lines +311 to +316
if self.mtp_shared_weights:
assert self.mtp_num_layers is not None
self.mtp_unroll_steps = self.mtp_num_layers
self.mtp_num_layers = 1
else:
self.mtp_unroll_steps = self.mtp_num_layers
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The initialization logic for mtp_unroll_steps should be more robust. If mtp_num_layers is None and mtp_shared_weights is False, mtp_unroll_steps will be assigned None, which will cause a TypeError in gpt_model.py (line 423) when calculating 1 + self.config.mtp_unroll_steps. Additionally, it is safer to ensure mtp_num_layers is positive when weight sharing is enabled.

Suggested change
if self.mtp_shared_weights:
assert self.mtp_num_layers is not None
self.mtp_unroll_steps = self.mtp_num_layers
self.mtp_num_layers = 1
else:
self.mtp_unroll_steps = self.mtp_num_layers
if self.mtp_shared_weights:
assert self.mtp_num_layers is not None and self.mtp_num_layers > 0, \
"mtp_num_layers must be > 0 when mtp_shared_weights is True"
self.mtp_unroll_steps = self.mtp_num_layers
self.mtp_num_layers = 1
else:
self.mtp_unroll_steps = self.mtp_num_layers or 0

@Jintao-Huang Jintao-Huang merged commit c8b877c into modelscope:main Apr 20, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants