Feat/gemma4 adapters by huseyincavusbi · Pull Request #1385 · TransformerLensOrg/TransformerLens

huseyincavusbi · 2026-06-13T12:07:24Z

Description

This PR adds TransformerBridge support for the Gemma 4 model family (E2B, E4B, 26B-A4B, and 31B) through a single unified Gemma4ArchitectureAdapter.

Key Implementation Details

Unified Adapter (gemma4.py): Dynamically handles all 4 variants by evaluating initialization configuration flags:
- MoE Blocks: Submodules conditionally spin up only when enable_moe_block=True (specifically for the 26B variant).
- KV-Sharing: Dropped gracefully when num_kv_shared_layers > 0 (for E2B/E4B).
- PLE Embeddings: Surfaced dynamically when hidden_size_per_layer_input > 0.
- Weight Processing: Maps and converts Gemma 4's joint QKV layout, dual RoPE configurations, alternating sliding/full attention mechanisms, logit softcapping, and RMSNorm.
- Includes 45 dedicated unit tests verifying config attributes, MoE behavior, and weight conversions.
Shared-Library Updates (3 files, fully opt-in, zero regressions on existing adapter tests):
1. position_embeddings_attention.py: Applies V norm post-reshape (Gemma 4 is the first architecture featuring per-head value normalization). Handles KV-sharing delegation to Hugging Face's original attention implementation when K/V submodules are omitted. Caches computed KV states in shared_kv_states post-RoPE for structural layer reuse.
2. bridge.py: Introduces a use_native_generate opt-in flag. This bypasses a current Hugging Face transformers dev-version issue where eager attention causes a KV-cache dimension mismatch during generation. Setting this flag (scoped strictly to this adapter) delegates processing to HF's native generate() utilizing SDPA.
3. main_benchmark.py: Fixes pad_token_id assignment when eos_token_id is a list (Gemma4 uses [1, 106]), taking the first element.

Verification & Performance

All models have been validated.

Fixes #1297

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist:

I have commented my code, particularly in hard-to-understand areas
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

Fix broken link in README

@jlarson4

…ensOrg#1316) * Add Direct Logit Attribution tool for TransformerBridge * Resolve review feedback and add Direct Logit Attribution tests Resolved review feedback from @jlarson4, added tests covering reconstruction invariants on a distilgpt2 bridge in compatibility mode, arguments, asserting sum(scores) == logit_diff - (b_U[correct] - b_U[wrong]) against the model's real logits, plus labels/shape and batch-averaging checks. Added additional hardening: - Fix a latent direction-shape bug: replace the fragile answer_tokens.numel()==1 branch with a robust reshape so single-prompt, single-token inputs are handled correctly - Detect hybrid blocks via bridge.layer_types() instead of substring matching named_modules(), the codebase's own semantic mechanism - Import get_act_name from transformer_lens.utilities to avoid the transformer_lens.utils DeprecationWarning; drop the invalid return_type kwarg to run_with_cache - Register the analysis subpackage in tools/__init__.py Closes TransformerLensOrg#1263.

…merLensOrg#1369) * Add Direct Logit Attribution tool (TransformerLensOrg#1263) Add transformer_lens/tools/analysis/direct_logit_attribution.py, a single-call DLA analysis that decomposes a logit (or logit difference) into per-component, per-layer (logit-lens), or per-head contributions. Wraps the existing ActivationCache primitives (decompose_resid / accumulated_resid / stack_head_results / logit_attrs) and works with both HookedTransformer and TransformerBridge, since they share the cache API. Returns a DirectLogitAttribution dataclass (attribution tensor + aligned labels, plus a top(k) helper). Adds integration tests asserting the exact DLA correctness invariant on both systems: the complete decomposition reconstructs the model's real logit up to the unembedding bias b_U. Closes TransformerLensOrg#1263 * Resolving conflicts between 1316 and 1369 * format fixes --------- Co-authored-by: Azra Bano <azrabano23@gmail.com> Co-authored-by: Jonah Larson <jonahalarson@comcast.net>

…enerate (TransformerLensOrg#1374)

…rmerLensOrg#1373)

* Add Phi adapter tests * Add comment about setup component test * Delete redundant config literal tests

* Fixed SVD interpreter test * Format SVD interpreter fixture test

The Restricted Loss section called loss_fn(all_logits, labels), but all_logits had been rearranged earlier into a (p, p, d_vocab) grid for the logit periodicity analysis. loss_fn's 3-D branch assumes (batch, pos, d_vocab) and takes logits[:, -1], producing a (p, p) tensor that crashes the gather against the p*p labels (TransformerLensOrg#543). Use original_logits instead, which is recomputed just above and is the same full-dataset loss the cell intends to print. Also clear the stored RuntimeError output from the cell.

Breaking: removes the public eps_attr constructor argument and the config.eps_attr attribute. The field was never read (its consumer was deleted when NormalizationBridge moved to direct HF delegation), so no model behavior changes, but it is an API removal.

…nfig)

…utes - Unwrap text_config for Gemma4ForConditionalGeneration models - Read PLE, KV sharing, layer_types, softcapping from text_cfg - Add NotImplementedError guard for MoE variants (26B-A4B) - Update tests to exercise text_config path

…le weight)

…etection

danra and others added 30 commits June 8, 2026 09:14

Merge pull request TransformerLensOrg#1370 from danra/patch-1

9deb6bf

Fix broken link in README

Merge remote-tracking branch 'origin/main' into dev

75095b1

Add stop_strings and stopping_criteria support to TransformerBridge.g…

a5f1193

…enerate (TransformerLensOrg#1374)

Remove extra checks from Phi adapter setup_component_testing (Transfo…

d37642d

…rmerLensOrg#1373)

Add phi tests (TransformerLensOrg#1372)

35ab438

* Add Phi adapter tests * Add comment about setup component test * Delete redundant config literal tests

Fixed SVD interpreter test (TransformerLensOrg#1375)

34dc38a

* Fixed SVD interpreter test * Format SVD interpreter fixture test

feat: Initial Gemma4 architecture adapter (V norm, softcap, PLE/KV co…

c19a062

…nfig)

feat: Register Gemma4ArchitectureAdapter in factory and __init__

5d5564d

feat: Add final_rms and eps_attr to Gemma4 adapter config

b1c2a3d

fix: Use setattr for custom config fields to pass mypy

39565bc

fix: Register Gemma4ForConditionalGeneration alias

eaf190c

fix: Dynamic text prefix for text-only vs multimodal Gemma4 variants

cadfe52

fix: Add Gemma4 to model_registry and add unit tests

79d8de4

Remove dead v_norm weight conversion (with_scale=False has no learnab…

eb9f214

…le weight)

Add full Gemma4 MoE support with optional submodules for 26B-A4B

7fb469e

Make k_proj, v_proj, k_norm, v_norm optional for KV-sharing layers

decefd8

fix: AutoModel returns Gemma4Model directly, correct text_prefix

2700965

fix: revert text_prefix — AutoModelForCausalLM needs model. prefix

74b6168

fix: check cfg.architecture instead of cfg.architectures for prefix d…

7033b91

…etection

fix: delegate to original attention on KV-sharing layers

b54e2cd

fix: store computed KV in shared_kv_states for Gemma4 KV-sharing

d5ce541

fix: add Gemma4ForConditionalGeneration to MULTIMODAL_ARCHITECTURES

ee60b1c

fix: add use_native_generate opt-in flag for hf_generate delegation

6587691

feat: use_native_generate and prepare_loading for Gemma4 adapter

3237784

fix: handle list eos_token_id when setting pad_token_id

6a21267

huseyincavusbi added 4 commits June 13, 2026 14:30

fix: apply V norm in post-reshape attention phase for Gemma4

0732f27

fix: restore Gemma3nForConditionalGeneration in MULTIMODAL_ARCHITECTURES

a8d2c4e

fix: remove dead eps_attr, resolve conflict marker, fix mypy

a30f390

feat: add multimodal vision support to Gemma4 adapter

7eed605

huseyincavusbi marked this pull request as draft June 14, 2026 10:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/gemma4 adapters#1385

Feat/gemma4 adapters#1385
huseyincavusbi wants to merge 34 commits into
TransformerLensOrg:mainfrom
huseyincavusbi:feat/gemma4-adapters

huseyincavusbi commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

huseyincavusbi commented Jun 13, 2026

Description

Key Implementation Details

Verification & Performance

Type of change

Screenshots

Checklist:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants