bugfix: fix DeepSeek V3 crash and V3.2 prefix-cache OOM.#1258
Open
DongheJin wants to merge 1 commit intojd-opensource:mainfrom
Open
bugfix: fix DeepSeek V3 crash and V3.2 prefix-cache OOM.#1258DongheJin wants to merge 1 commit intojd-opensource:mainfrom
DongheJin wants to merge 1 commit intojd-opensource:mainfrom
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request updates the xllm_atb_layers subproject and modifies the NPU format allocation logic in worker_impl.cpp to include the deepseek_v3_mtp model type. A review comment identifies a critical correctness issue where a corresponding update is needed in llm_engine.cpp to prevent shape mismatches. Additionally, the use of auto for a string type violates the repository style guide, and the associated comment requires updating for accuracy.
Comment on lines
291
to
298
| // Keep runtime allocation format consistent with capacity estimation in | ||
| // llm_engine: only deepseek_v3 uses FRACTAL_NZ with prefix cache. | ||
| const auto& model_type = context_.get_model_args().model_type(); | ||
| aclFormat npu_format_type = | ||
| context_.get_model_args().model_type() == "deepseek_v3" && | ||
| FLAGS_enable_prefix_cache | ||
| ((model_type == "deepseek_v3" || model_type == "deepseek_v3_mtp") && | ||
| FLAGS_enable_prefix_cache) | ||
| ? ACL_FORMAT_FRACTAL_NZ | ||
| : ACL_FORMAT_ND; |
Contributor
There was a problem hiding this comment.
This change introduces a critical correctness issue and a style violation:
- Correctness (Critical): By adding
deepseek_v3_mtpto theFRACTAL_NZformat logic here without updating the corresponding capacity estimation logic inllm_engine.cpp(line 456), you create a mismatch between the allocated tensor shape and the NPU format. This will likely lead to a shape mismatch error or crash duringnpu_format_cast. You must updatellm_engine.cppto includedeepseek_v3_mtpin its format check as well. - Style (High): Per Rule 89 of the Repository Style Guide,
autoshould not be used forstd::string. - Documentation: The comment on lines 291-292 is now inaccurate as it claims only
deepseek_v3uses this format.
Suggested change
| // Keep runtime allocation format consistent with capacity estimation in | |
| // llm_engine: only deepseek_v3 uses FRACTAL_NZ with prefix cache. | |
| const auto& model_type = context_.get_model_args().model_type(); | |
| aclFormat npu_format_type = | |
| context_.get_model_args().model_type() == "deepseek_v3" && | |
| FLAGS_enable_prefix_cache | |
| ((model_type == "deepseek_v3" || model_type == "deepseek_v3_mtp") && | |
| FLAGS_enable_prefix_cache) | |
| ? ACL_FORMAT_FRACTAL_NZ | |
| : ACL_FORMAT_ND; | |
| // Keep runtime allocation format consistent with capacity estimation in | |
| // llm_engine: deepseek_v3 and deepseek_v3_mtp use FRACTAL_NZ with prefix cache. | |
| const std::string& model_type = context_.get_model_args().model_type(); | |
| aclFormat npu_format_type = | |
| ((model_type == "deepseek_v3" || model_type == "deepseek_v3_mtp") && | |
| FLAGS_enable_prefix_cache) | |
| ? ACL_FORMAT_FRACTAL_NZ | |
| : ACL_FORMAT_ND; |
References
- Rule 89: Do not use auto for simple/primitive types (including std::string). (link)
44656e1 to
57f48b0
Compare
57f48b0 to
8a03f17
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.