Add Olmo2 architecture adapter tests by RecreationalMath · Pull Request #1387 · TransformerLensOrg/TransformerLens

RecreationalMath · 2026-06-14T05:19:50Z

Description

Adds a unit test suite for Olmo2ArchitectureAdapter under tests/unit/model_bridge/supported_architectures/, following the unit-test guide and modeling structure on the post-#1365 olmoe suite. It needs no model downloads or real checkpoints. The suite uses tiny programmatic TransformerBridgeConfig objects plus small synthetic tensors and a fake attention module for the behavioral tests, so it runs on CPU in seconds.

The suite (32 tests) covers:

Anti-drift config: supports_fold_ln=False. Post-norm prevents the LN fold. Folding a norm that runs after attention or MLP would corrupt the weights.
Weight conversions: exact key set. The adapter uses the base _qkvo_weight_conversions() helper, so per-key rearrange patterns are left to base-class tests per the guide's anti-pattern 4.
Component-mapping structure, bridge types, and HF module paths.
Post-norm block wiring: ln1 maps to post_attention_layernorm and ln2 maps to post_feedforward_layernorm. Central architecture-specific decision. Catches porting regressions from the pre-norm Llama family.
Q/K-norm submodule structure under attention with the correct HF names.
hook_alias_overrides: hook_resid_mid points at mlp.hook_in (the true post-attn pre-mlp residual under post-norm), overriding BlockBridge's default ln2.hook_in. A load-bearing assertion is included (test asserts the override differs from the default).
GQA forward hook shapes via a fake attention module with Q/K-norm wired so the bridge takes its Q/K-norm code path.
setup_component_testing rotary wiring on template and bridge-model attentions, plus the observable eager-forcing effect on the HF model and per-layer self_attn configs.
Architecture guards (no norm weights, no biases).

Contributes to #1302 (Olmo2 checkbox).

Type of change

New feature (non-breaking change which adds functionality)

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

…ntract)

RecreationalMath · 2026-06-14T05:27:05Z

Heads up. I omitted test_attn_output_shape from TestOlmo2GQAHookShapes here. The guide's What-to-test table lists generic (batch, seq, d_model) output-shape assertions under Skip for Behavioral hook shapes ("shared bridge's contract, not yours"). The four sibling adapter test files I previously shipped (mixtral, olmoe, gpt_oss, smollm3) still carry the equivalent assertion. I'll send a follow-up cleanup PR removing it from those four.

RecreationalMath added 2 commits June 14, 2026 10:24

Add Olmo2 architecture adapter tests

339bafd

Drop test_attn_output_shape per the unit-test guide (shared bridge co…

2b20b7b

…ntract)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Olmo2 architecture adapter tests#1387

Add Olmo2 architecture adapter tests#1387
RecreationalMath wants to merge 2 commits into
TransformerLensOrg:devfrom
RecreationalMath:olmo2-adapter-test

RecreationalMath commented Jun 14, 2026

Uh oh!

RecreationalMath commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RecreationalMath commented Jun 14, 2026

Description

Type of change

Checklist:

Uh oh!

RecreationalMath commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant