perf: Qwen Image Optimize.#1242
perf: Qwen Image Optimize.#1242shan-chen-feng wants to merge 2 commits intojd-opensource:release/v0.9.0from
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces significant refactoring to the attention mechanism within the Qwen image editing pipeline, particularly for NPU environments, to enable communication and computation overlap for sequence parallelism. Key changes include updating the dit_sp_communication_overlap flag from an integer to a boolean, centralizing the management and application of Rotary Positional Embeddings (RoPE) in the QwenImageEditPlusPipelineImpl, and refactoring the attention processor into a base class with two derived implementations: one for standard processing and another (QwenDoubleStreamAttnProcessorCMO2_0Impl) specifically designed for communication-computation overlap using all_to_all_4D operations and npu_fusion_attention. The choice between these processors is now dynamic based on the new boolean flag. Additionally, the AttentionImpl class has been enhanced to explicitly track query and key-value heads and head dimensions, and its output projection structure has been simplified. There are no review comments to address.
No description provided.