Skip to content

feat: add onerec 3b performance optimization and support old model.#1286

Merged
DragonFive merged 4 commits intojd-opensource:mainfrom
DragonFive:feat/onerec-rec-upstreamize-range2
Apr 20, 2026
Merged

feat: add onerec 3b performance optimization and support old model.#1286
DragonFive merged 4 commits intojd-opensource:mainfrom
DragonFive:feat/onerec-rec-upstreamize-range2

Conversation

@DragonFive
Copy link
Copy Markdown
Collaborator

Summary

  • align OneRec 3B NPU runtime with xllm_rec
  • keep legacy and 3B scaling paths gated in the NPU runtime
  • update xllm_atb_layers submodule to merged master commit 96d3deb2

Notes

  • branch keeps two commits intentionally: one for 3B runtime alignment, one for legacy/3B gating and submodule linkage
  • xllm_atb_layers changes are already merged into git_code master

@DragonFive DragonFive force-pushed the feat/onerec-rec-upstreamize-range2 branch from dcbb93e to b11c793 Compare April 15, 2026 09:01
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the NPU OneRec block layer by introducing ACLNN-based attention linear support, a decoder prefill-only execution mode, and improved handling for fused FFN and MoE expert weights. It also adds a compatibility path for specific checkpoint prefixes and updates the MoE attention mask logic. Review feedback highlights a critical memory safety issue where a local tensor is referenced via a dangling pointer in the variant pack. Furthermore, multiple style guide violations were noted, specifically regarding the use of auto for simple types, missing vector reserve() calls, and the need for parameter annotations on constant arguments.

Comment thread xllm/core/layers/npu/npu_onerec_block_layer_impl.cpp Outdated
Comment thread xllm/core/layers/npu/npu_onerec_block_layer_impl.cpp
Comment thread xllm/core/layers/npu/npu_onerec_block_layer_impl.cpp Outdated
Comment thread xllm/core/layers/npu/npu_onerec_block_layer_impl.cpp Outdated
Comment thread xllm/models/rec/npu/onerec_npu_impl.h Outdated
Comment thread xllm/core/layers/npu/npu_onerec_block_layer_impl.cpp Outdated
@DragonFive DragonFive changed the title bugfix: align onerec 3b and legacy npu runtime with xllm_rec. feat: add onerec 3b performance optimization and support old model. Apr 15, 2026
@DragonFive
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the OneRec NPU implementation to support ACLNN-based attention linear operations, introduces a prefill-only execution path for decoders, and enhances weight loading and merging for MoE layers, including support for fused expert weights and dynamic quantization. Feedback highlights a critical thread-safety concern regarding the storage of request-specific data as member variables in the layer implementation, which could lead to data corruption in concurrent environments. Additionally, several style guide violations were noted, specifically the improper use of 'auto' for simple types and the use of plain 'int' instead of fixed-width integers.

Comment thread xllm/core/layers/npu/npu_onerec_block_layer_impl.h
Comment thread xllm/core/layers/npu/npu_onerec_block_layer_impl.cpp Outdated
Comment thread xllm/core/layers/npu/npu_onerec_block_layer_impl.cpp Outdated
Comment thread xllm/core/layers/npu/npu_onerec_block_layer_impl.cpp Outdated
Comment thread xllm/core/layers/npu/npu_onerec_block_layer_impl.cpp Outdated
Comment thread xllm/models/rec/npu/onerec.h Outdated
@DragonFive DragonFive merged commit 0f5c74f into jd-opensource:main Apr 20, 2026
24 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants