[DEV] Add MoE inference operators and EP communication primitives

**目标版本**
main

**功能描述**
为 InfiniLM MoE 推理接入补充 InfiniCore 底层能力，支持 Qwen3-MoE 类模型的 router、token alignment、专家计算以及 EP 通信路径。

本次开发包含：

1. 新增 MoE 推理相关 InfiniCore/InfiniOP 算子：
   - moe_align
   - moe_fused_dense
   - moe_fused_gate
   - moe_sum
   - moe_topk_softmax
   - moe_topk_sigmoid
   - prepare_moe_input

2. 新增 MoE EP 所需通信原语：
   - allgather / allgatherv / allgatherv_many
   - reduce_scatter / reduce_scatterv / reduce_scatterv_many

3. 为新增算子补充 NVIDIA 实现、InfiniCore C++ 封装、InfiniOP C API 暴露和 graph recording wrapper。

4. 接入 xmake NVIDIA 构建配置。

预期效果：
- InfiniLM 可以通过 InfiniCore 调用 MoE router、align、fused dense 和 EP 通信能力。
- 支持当前 Qwen3-MoE 的 correctness 推理路径。
- 为后续 DeepEP、CUTLASS grouped GEMM、FP8/W4A8 MoE 路径预留底层接口。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DEV] Add MoE inference operators and EP communication primitives #1308

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[DEV] Add MoE inference operators and EP communication primitives #1308

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions