Skip to content

Add Olmo2 architecture adapter tests#1387

Open
RecreationalMath wants to merge 2 commits into
TransformerLensOrg:devfrom
RecreationalMath:olmo2-adapter-test
Open

Add Olmo2 architecture adapter tests#1387
RecreationalMath wants to merge 2 commits into
TransformerLensOrg:devfrom
RecreationalMath:olmo2-adapter-test

Conversation

@RecreationalMath

Copy link
Copy Markdown
Contributor

Description

Adds a unit test suite for Olmo2ArchitectureAdapter under tests/unit/model_bridge/supported_architectures/, following the unit-test guide and modeling structure on the post-#1365 olmoe suite. It needs no model downloads or real checkpoints. The suite uses tiny programmatic TransformerBridgeConfig objects plus small synthetic tensors and a fake attention module for the behavioral tests, so it runs on CPU in seconds.

The suite (32 tests) covers:

  • Anti-drift config: supports_fold_ln=False. Post-norm prevents the LN fold. Folding a norm that runs after attention or MLP would corrupt the weights.
  • Weight conversions: exact key set. The adapter uses the base _qkvo_weight_conversions() helper, so per-key rearrange patterns are left to base-class tests per the guide's anti-pattern 4.
  • Component-mapping structure, bridge types, and HF module paths.
  • Post-norm block wiring: ln1 maps to post_attention_layernorm and ln2 maps to post_feedforward_layernorm. Central architecture-specific decision. Catches porting regressions from the pre-norm Llama family.
  • Q/K-norm submodule structure under attention with the correct HF names.
  • hook_alias_overrides: hook_resid_mid points at mlp.hook_in (the true post-attn pre-mlp residual under post-norm), overriding BlockBridge's default ln2.hook_in. A load-bearing assertion is included (test asserts the override differs from the default).
  • GQA forward hook shapes via a fake attention module with Q/K-norm wired so the bridge takes its Q/K-norm code path.
  • setup_component_testing rotary wiring on template and bridge-model attentions, plus the observable eager-forcing effect on the HF model and per-layer self_attn configs.
  • Architecture guards (no norm weights, no biases).

Contributes to #1302 (Olmo2 checkbox).

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

@RecreationalMath

Copy link
Copy Markdown
Contributor Author

Heads up. I omitted test_attn_output_shape from TestOlmo2GQAHookShapes here. The guide's What-to-test table lists generic (batch, seq, d_model) output-shape assertions under Skip for Behavioral hook shapes ("shared bridge's contract, not yours"). The four sibling adapter test files I previously shipped (mixtral, olmoe, gpt_oss, smollm3) still carry the equivalent assertion. I'll send a follow-up cleanup PR removing it from those four.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant