perf: use fused gdn gating for qwen3.5 prefill. by yingxudeng · Pull Request #1301 · jd-opensource/xllm

yingxudeng · 2026-04-16T15:08:41Z

No description provided.

gemini-code-assist

Code Review

This pull request replaces manual PyTorch operations with a fused GDN gating kernel in the prefill path of the Qwen3 Gated Delta Net and adds a new test case for large batch sizes. Review feedback suggests optimizing tensor reshaping by using .reshape() instead of the .contiguous().view() pattern and removing redundant .squeeze(0) and .contiguous() calls on kernel outputs to improve performance in the prefill hot path.

gemini-code-assist · 2026-04-16T15:18:08Z

+    gdn_params.a = a.contiguous().view({-1, a.size(-1)});
+    gdn_params.b = b.contiguous().view({-1, b.size(-1)});


Prefer using .reshape() over the .contiguous().view() pattern. reshape() is more idiomatic in PyTorch; it returns a view if the tensor is already contiguous and only performs a copy if necessary. This avoids redundant operations and potential memory allocations if the input tensors a and b are already contiguous.

Suggested change

gdn_params.a = a.contiguous().view({-1, a.size(-1)});

gdn_params.b = b.contiguous().view({-1, b.size(-1)});

gdn_params.a = a.reshape({-1, a.size(-1)});

gdn_params.b = b.reshape({-1, b.size(-1)});

gemini-code-assist · 2026-04-16T15:18:08Z

+    g = g.squeeze(0).contiguous().view({batch_size, seq_len, a.size(-1)});
+    beta = beta.squeeze(0).contiguous().view({batch_size, seq_len, b.size(-1)});


The calls to .squeeze(0) and .contiguous() are redundant here. Tensors returned by custom kernels are typically contiguous, and view() can handle the reshaping directly from the kernel's output shape (whether it is [total_tokens, hidden] or [1, total_tokens, hidden]) to the target [batch, seq, hidden] shape. Removing these unnecessary calls improves performance in the prefill hot path.

Suggested change

g = g.squeeze(0).contiguous().view({batch_size, seq_len, a.size(-1)});

beta = beta.squeeze(0).contiguous().view({batch_size, seq_len, b.size(-1)});

g = g.view({batch_size, seq_len, a.size(-1)});

beta = beta.view({batch_size, seq_len, b.size(-1)});

yingxudeng · 2026-04-16T16:54:37Z

用融合算子前：

用融合算子后：

用融合算子后，性能略微裂化，此pr暂不合并

yingxudeng · 2026-04-16T17:56:08Z

修改代码使其编译出来 bs3520 对应kernel，并命中，性能有提升但是仍然不如原版。因此暂不合并。

yingxudeng requested review from DongheJin, JimHsiung, RobbieLeung, XuZhang99, liutongxuan, walsonyang and yq33victor as code owners April 16, 2026 15:08

gemini-code-assist Bot reviewed Apr 16, 2026

View reviewed changes

perf: use fused gdn gating for qwen3.5 prefill.

05a5e86

yingxudeng force-pushed the perf/qwen3-gdn-prefill-fusion-2 branch from e7994c6 to 05a5e86 Compare April 16, 2026 18:01

yingxudeng closed this Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: use fused gdn gating for qwen3.5 prefill.#1301

perf: use fused gdn gating for qwen3.5 prefill.#1301
yingxudeng wants to merge 1 commit intojd-opensource:mainfrom
yingxudeng:perf/qwen3-gdn-prefill-fusion-2

yingxudeng commented Apr 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 16, 2026

Uh oh!

gemini-code-assist Bot Apr 16, 2026

Uh oh!

yingxudeng commented Apr 16, 2026 •

edited

Loading

Uh oh!

yingxudeng commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		gdn_params.a = a.contiguous().view({-1, a.size(-1)});
		gdn_params.b = b.contiguous().view({-1, b.size(-1)});

		g = g.squeeze(0).contiguous().view({batch_size, seq_len, a.size(-1)});
		beta = beta.squeeze(0).contiguous().view({batch_size, seq_len, b.size(-1)});

Conversation

yingxudeng commented Apr 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

yingxudeng commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yingxudeng commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yingxudeng commented Apr 16, 2026 •

edited

Loading