Skip to content

models: fix Qwen3.5 dense/MoE load when MTP block is absent (trunk-only GGUF)#25024

Closed
rohithj7 wants to merge 3 commits into
ggml-org:masterfrom
rohithj7:master
Closed

models: fix Qwen3.5 dense/MoE load when MTP block is absent (trunk-only GGUF)#25024
rohithj7 wants to merge 3 commits into
ggml-org:masterfrom
rohithj7:master

Conversation

@rohithj7

@rohithj7 rohithj7 commented Jun 26, 2026

Copy link
Copy Markdown

Overview

Fixes loading of Qwen3.5 dense (Qwen3_5ForCausalLM) and MoE (Qwen3_5MoeForCausalLM) GGUFs that fail at load time with:

llama_model_load: error loading model: missing tensor 'blk.<N>.attn_norm.weight'

where <N> == num_hidden_layers (the first index past the trunk).

The converter writes block_count = num_hidden_layers + mtp_num_hidden_layers and a nextn_predict_layers key whenever config.json declares mtp_num_hidden_layers, even when the checkpoint contains no mtp.* weights. The runtime then derives n_layer_all = block_count and unconditionally constructs the trailing MTP/NextN block, marking blk.<N>.attn_norm.weight (and the other MTP tensors) as required. For a trunk-only GGUF this block is never present, so load aborts.

src/models/step35.cpp already handles this: it probes for the defining MTP tensor and, when absent, marks the MTP block tensors TENSOR_NOT_REQUIRED ("trunk-only"). This PR ports that same trunk_only handling to src/models/qwen35.cpp and src/models/qwen35moe.cpp, which previously hardcoded the MTP block tensors as required.

After the change:

  • Trunk-only GGUFs load and run normal inference (the MTP block is never executed in the main graph; n_layer() excludes nextn layers).
  • GGUFs that actually bundle the MTP block are unchanged - the tensors are still required and the speculative (graph_mtp) path keeps working.

Closes #24737.
Closes #24211.

Additional information

Same failure family reported in #24737 (Qwen3.5-4B, blk.32), #24211 (Nex N2 Pro / Qwen3.5 397B MoE, blk.60), and the Qwen3.5-122B MoE GGUF discussion (blk.48). The MTP-in-GGUF mapping and runtime were added in #20533 / #22673; the step35 trunk-only fix landed in #24340 but was not ported to the qwen35 loaders.

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES - I used an AI assistant to help me understand the issue and identify what needed to change, and to get a more thorough understanding of the relevant code. It helped me realize that step35 already had this change so I had to replicate that for qwen3.5. I made the changes myself. Further, I used AI to write this PR description.

@rohithj7 rohithj7 requested a review from CISC as a code owner June 26, 2026 00:03
@github-actions github-actions Bot added the model Model specific label Jun 26, 2026
@CISC

CISC commented Jun 26, 2026

Copy link
Copy Markdown
Member

The converter writes block_count = num_hidden_layers + mtp_num_hidden_layers and a nextn_predict_layers key whenever config.json declares mtp_num_hidden_layers, even when the checkpoint contains no mtp.* weights. The runtime then derives n_layer_all = block_count and unconditionally constructs the trailing MTP/NextN block, marking blk.<N>.attn_norm.weight (and the other MTP tensors) as required. For a trunk-only GGUF this block is never present, so load aborts.

First of all a model's config.json should not declare MTP layers if it does not have any, this is a model bug. Failing to load such a GGUF is perfectly valid (and can be fixed by editing the config or using --no-mtp at conversion, alternatively update the GGUF with gguf-set-metadata).

Secondly, allowing this probably leads to other subtle issues as hparams.n_layer_all is now incorrect. In fact the correct fix is to remove this from step35.

@CISC CISC closed this Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model Model specific

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval bug: Qwen3.5-4B: GGUF conversion/load expects 33 blocks, model only has 32 Eval bug: Missing layer error when running a quant of Nex N2 Pro

2 participants