bugfix: fix DeepSeek V3 crash and V3.2 prefix-cache OOM. by DongheJin · Pull Request #1258 · jd-opensource/xllm

DongheJin · 2026-04-10T10:17:33Z

No description provided.

gemini-code-assist

Code Review

This pull request updates the xllm_atb_layers subproject and modifies the NPU format allocation logic in worker_impl.cpp to include the deepseek_v3_mtp model type. A review comment identifies a critical correctness issue where a corresponding update is needed in llm_engine.cpp to prevent shape mismatches. Additionally, the use of auto for a string type violates the repository style guide, and the associated comment requires updating for accuracy.

gemini-code-assist · 2026-04-10T10:18:48Z

+        // Keep runtime allocation format consistent with capacity estimation in
+        // llm_engine: only deepseek_v3 uses FRACTAL_NZ with prefix cache.
+        const auto& model_type = context_.get_model_args().model_type();
        aclFormat npu_format_type =
-            context_.get_model_args().model_type() == "deepseek_v3" &&
-                    FLAGS_enable_prefix_cache
+            ((model_type == "deepseek_v3" || model_type == "deepseek_v3_mtp") &&
+             FLAGS_enable_prefix_cache)
                ? ACL_FORMAT_FRACTAL_NZ
                : ACL_FORMAT_ND;


This change introduces a critical correctness issue and a style violation:

Correctness (Critical): By adding deepseek_v3_mtp to the FRACTAL_NZ format logic here without updating the corresponding capacity estimation logic in llm_engine.cpp (line 456), you create a mismatch between the allocated tensor shape and the NPU format. This will likely lead to a shape mismatch error or crash during npu_format_cast. You must update llm_engine.cpp to include deepseek_v3_mtp in its format check as well.

Style (High): Per Rule 89 of the Repository Style Guide, auto should not be used for std::string.

Documentation: The comment on lines 291-292 is now inaccurate as it claims only deepseek_v3 uses this format.

Suggested change

// Keep runtime allocation format consistent with capacity estimation in

// llm_engine: only deepseek_v3 uses FRACTAL_NZ with prefix cache.

const auto& model_type = context_.get_model_args().model_type();

aclFormat npu_format_type =

context_.get_model_args().model_type() == "deepseek_v3" &&

FLAGS_enable_prefix_cache

((model_type == "deepseek_v3" || model_type == "deepseek_v3_mtp") &&

FLAGS_enable_prefix_cache)

? ACL_FORMAT_FRACTAL_NZ

: ACL_FORMAT_ND;

// Keep runtime allocation format consistent with capacity estimation in

// llm_engine: deepseek_v3 and deepseek_v3_mtp use FRACTAL_NZ with prefix cache.

const std::string& model_type = context_.get_model_args().model_type();

aclFormat npu_format_type =

((model_type == "deepseek_v3" || model_type == "deepseek_v3_mtp") &&

FLAGS_enable_prefix_cache)

? ACL_FORMAT_FRACTAL_NZ

: ACL_FORMAT_ND;

References

Rule 89: Do not use auto for simple/primitive types (including std::string). ^(link)

DongheJin requested review from JimHsiung, RobbieLeung, XuZhang99, liutongxuan, walsonyang and yq33victor as code owners April 10, 2026 10:17

gemini-code-assist Bot reviewed Apr 10, 2026

View reviewed changes

DongheJin force-pushed the bugfix/dsv3_main branch from 44656e1 to 57f48b0 Compare April 23, 2026 10:33

bugfix: fix DeepSeek V3 crash and V3.2 prefix-cache OOM.

8a03f17

DongheJin force-pushed the bugfix/dsv3_main branch from 57f48b0 to 8a03f17 Compare April 23, 2026 11:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bugfix: fix DeepSeek V3 crash and V3.2 prefix-cache OOM.#1258

bugfix: fix DeepSeek V3 crash and V3.2 prefix-cache OOM.#1258
DongheJin wants to merge 1 commit intojd-opensource:mainfrom
DongheJin:bugfix/dsv3_main

DongheJin commented Apr 10, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DongheJin commented Apr 10, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant