Skip to content

Gemma4 31b rope fix and ci#19627

Open
Gasoonjia wants to merge 3 commits into
gemma4-chat-templatefrom
gemma4-31b-rope-fix-and-ci
Open

Gemma4 31b rope fix and ci#19627
Gasoonjia wants to merge 3 commits into
gemma4-chat-templatefrom
gemma4-31b-rope-fix-and-ci

Conversation

@Gasoonjia
Copy link
Copy Markdown
Contributor

@Gasoonjia Gasoonjia commented May 18, 2026

Summary

Currently materialize_runtime_buffers in model.py was zeroing out ALL meta buffers, including each layer's inv_freq (RoPE frequencies). The follow-up attn.inv_freq.to(device) was a no-op on already-zero tensors. So RoPE produced cos=1, sin=0 for every position → model had NO positional information → introduce the period-N echo cycle pattern.

This PR fix the issue by recomputing inv_freq per-layer with real values (using the layer's head_dim, partial_rotary, rope_theta, is_sliding flag) in materialize_runtime_buffers.

Test plan

Add e2e ci for gemma4-31b model and check its output.

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented May 18, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19627

Note: Links to docs will display an error until the docs builds have been completed.

❗ 2 Active SEVs

There are 2 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure, 3 Unclassified Failures

As of commit 86ba97b with merge base 54f1f28 (image):

NEW FAILURE - The following job has failed:

UNCLASSIFIED FAILURES - DrCI could not classify the following jobs because the workflow did not run on the merge base. The failures may be pre-existing on trunk or introduced by this PR:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 18, 2026
Gasoonjia and others added 2 commits May 17, 2026 23:55
…mma4_31b CI

- model.py: strip explanatory comment from materialize_runtime_buffers RoPE
  inv_freq block (keep hand-rolled formula as-is).
- inference.py: revert all hf_validator + quant_compile_validator additions
  (--use-hf-api / --compare / --compare-quant / --prompts-file flags and
  their helpers); keep --bf16 HF checkpoint load path and existing
  prequantized / gguf flows.
- .github/workflows/cuda.yml: add SocialLocalMobile/gemma-4-31B-it-HQQ-INT4
  matrix entry (prequant tile-packed only) to export-model-cuda-artifact
  and test-model-cuda-e2e; pin to linux.aws.a100 like qwen3_5_moe.
- .ci/scripts/export_model_artifact.sh: add gemma4_31b export branch
  mirroring qwen3_5_moe pattern.
- .ci/scripts/test_model_e2e.sh: add gemma4_31b runner args + tokenizer
  handling.
@Gasoonjia Gasoonjia force-pushed the gemma4-31b-rope-fix-and-ci branch from d0214b5 to 7cffb3d Compare May 18, 2026 06:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant