Skip to content

bugfix: fix the accuracy error of NPU xattention.#1303

Merged
LMX-xin merged 1 commit intojd-opensource:mainfrom
LMX-xin:feat/xllm_npu_xattention
Apr 20, 2026
Merged

bugfix: fix the accuracy error of NPU xattention.#1303
LMX-xin merged 1 commit intojd-opensource:mainfrom
LMX-xin:feat/xllm_npu_xattention

Conversation

@LMX-xin
Copy link
Copy Markdown
Collaborator

@LMX-xin LMX-xin commented Apr 17, 2026

No description provided.

XuZhang99
XuZhang99 previously approved these changes Apr 17, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the xllm_ops subproject and refactors the NPU beam search implementation in rec_worker_impl.cpp to manually initialize beam tensors during the first round. A logic error was identified where out_token_index and out_beam_count_prefix_sums are incorrectly zeroed, which breaks functionality for batch sizes greater than one. The reviewer provided a code block to correctly calculate base indices for proper KV cache selection and output attribution.

Comment thread xllm/core/runtime/rec_worker_impl.cpp
@LMX-xin LMX-xin force-pushed the feat/xllm_npu_xattention branch from 4e5a6de to 512ef71 Compare April 20, 2026 03:28
@LMX-xin LMX-xin merged commit 724c8ed into jd-opensource:main Apr 20, 2026
15 of 29 checks passed
maojunx99 pushed a commit to maojunx99/xllm that referenced this pull request Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants