Skip to content

bugfix: fix DeepSeek V3 crash and V3.2 prefix-cache OOM.#1258

Open
DongheJin wants to merge 1 commit intojd-opensource:mainfrom
DongheJin:bugfix/dsv3_main
Open

bugfix: fix DeepSeek V3 crash and V3.2 prefix-cache OOM.#1258
DongheJin wants to merge 1 commit intojd-opensource:mainfrom
DongheJin:bugfix/dsv3_main

Conversation

@DongheJin
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the xllm_atb_layers subproject and modifies the NPU format allocation logic in worker_impl.cpp to include the deepseek_v3_mtp model type. A review comment identifies a critical correctness issue where a corresponding update is needed in llm_engine.cpp to prevent shape mismatches. Additionally, the use of auto for a string type violates the repository style guide, and the associated comment requires updating for accuracy.

Comment thread xllm/core/runtime/worker_impl.cpp Outdated
Comment on lines 291 to 298
// Keep runtime allocation format consistent with capacity estimation in
// llm_engine: only deepseek_v3 uses FRACTAL_NZ with prefix cache.
const auto& model_type = context_.get_model_args().model_type();
aclFormat npu_format_type =
context_.get_model_args().model_type() == "deepseek_v3" &&
FLAGS_enable_prefix_cache
((model_type == "deepseek_v3" || model_type == "deepseek_v3_mtp") &&
FLAGS_enable_prefix_cache)
? ACL_FORMAT_FRACTAL_NZ
: ACL_FORMAT_ND;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This change introduces a critical correctness issue and a style violation:

  1. Correctness (Critical): By adding deepseek_v3_mtp to the FRACTAL_NZ format logic here without updating the corresponding capacity estimation logic in llm_engine.cpp (line 456), you create a mismatch between the allocated tensor shape and the NPU format. This will likely lead to a shape mismatch error or crash during npu_format_cast. You must update llm_engine.cpp to include deepseek_v3_mtp in its format check as well.
  2. Style (High): Per Rule 89 of the Repository Style Guide, auto should not be used for std::string.
  3. Documentation: The comment on lines 291-292 is now inaccurate as it claims only deepseek_v3 uses this format.
Suggested change
// Keep runtime allocation format consistent with capacity estimation in
// llm_engine: only deepseek_v3 uses FRACTAL_NZ with prefix cache.
const auto& model_type = context_.get_model_args().model_type();
aclFormat npu_format_type =
context_.get_model_args().model_type() == "deepseek_v3" &&
FLAGS_enable_prefix_cache
((model_type == "deepseek_v3" || model_type == "deepseek_v3_mtp") &&
FLAGS_enable_prefix_cache)
? ACL_FORMAT_FRACTAL_NZ
: ACL_FORMAT_ND;
// Keep runtime allocation format consistent with capacity estimation in
// llm_engine: deepseek_v3 and deepseek_v3_mtp use FRACTAL_NZ with prefix cache.
const std::string& model_type = context_.get_model_args().model_type();
aclFormat npu_format_type =
((model_type == "deepseek_v3" || model_type == "deepseek_v3_mtp") &&
FLAGS_enable_prefix_cache)
? ACL_FORMAT_FRACTAL_NZ
: ACL_FORMAT_ND;
References
  1. Rule 89: Do not use auto for simple/primitive types (including std::string). (link)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant