fix(megatron): handle MambaMixer conv1d refactor in importer/exporter#1730
fix(megatron): handle MambaMixer conv1d refactor in importer/exporter#1730AAnoosheh wants to merge 1 commit into
Conversation
📝 WalkthroughWalkthroughAdds dual-layout support for Mamba layer conv1d parameters in both export ( ChangesMamba conv1d dual-layout support
Estimated code review effort🎯 2 (Simple) | ⏱️ ~5 minutes 🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
MambaMixer in some Megatron-LM branches replaced the nn.Conv1d submodule (self.conv1d) with raw nn.Parameters (self.conv1d_weight / self.conv1d_bias). Add fallback rules and hasattr branches in the import and export paths so both the old and new MambaMixer APIs are handled without breaking existing Megatron-LM versions. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>
c0b2975 to
f699894
Compare
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1730 +/- ##
==========================================
- Coverage 77.09% 76.50% -0.60%
==========================================
Files 511 511
Lines 56176 56701 +525
==========================================
+ Hits 43310 43377 +67
- Misses 12866 13324 +458
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Summary
MambaMixerin some Megatron-LM branches replaced thenn.Conv1dsubmodule (self.conv1d) with rawnn.Parameters (self.conv1d_weight/self.conv1d_bias)hasattr(layer.mixer, "conv1d")fallback in both the import path (megatron_importer.py) and export path (unified_export_megatron.py)conv1d_weight/conv1d_biasrules tomcore_nemotron.pyfor both import and export mappingsBackward compatible: old Megatron-LM versions with
nn.Conv1dhit the existingifbranch unchanged. Theelsebranch only fires whenconv1dis absent.Test plan
hasattr)🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
conv1d_weightandconv1d_bias, ensuring consistent state-dict key generation across import/export.