Skip to content

Enable gpt-oss moe and mxfp4 support in OpenVINO backend#231

Open
zhaixuejun1993 wants to merge 2 commits into
ravi9:dev_backend_openvinofrom
zhaixuejun1993:xuejun/enable_gpt-oss_all-v1
Open

Enable gpt-oss moe and mxfp4 support in OpenVINO backend#231
zhaixuejun1993 wants to merge 2 commits into
ravi9:dev_backend_openvinofrom
zhaixuejun1993:xuejun/enable_gpt-oss_all-v1

Conversation

@zhaixuejun1993

Copy link
Copy Markdown
Collaborator

This pull request adds support for the MXFP4 quantization format to the OpenVINO backend in GGML, with special handling for 3D MoE (Mixture-of-Experts) weights and necessary updates throughout the quantization, tensor extraction, and operator support logic. Additionally, it cleans up and simplifies the handling of certain MoE-related operations for improved compatibility and numerical stability.

MXFP4 Quantization Support

  • Added MXFP4 (GGML_TYPE_MXFP4) to the list of supported quantization types throughout the OpenVINO backend, including tensor creation, device support checks, and buffer allocation logic. [1] [2] [3] [4]
  • Implemented MXFP4-specific extraction, packing, and conversion logic in ggml-quants.cpp, including new routines for extracting data, handling scales as E8M0, and creating the appropriate OpenVINO nodes. [1] [2] [3]
  • Added special handling for 3D MXFP4 MoE weights in process_weight_tensor and related extraction/layout code, ensuring correct tensor shapes and memory layout for OpenVINO. [1] [2] [3] [4] [5]

Operator and Shape Handling Improvements

  • Updated operator support checks and device compatibility logic to allow 3D MXFP4 tensors for MoE use cases, and to reject unsupported 3D quantized tensors for other types. [1] [2]
  • Refined buffer allocation and tensor setting logic to recognize MXFP4 as a supported weight shape, not just 2D. [1] [2]

MoE Operation Handling Cleanup

  • Removed or simplified several MoE-specific operation exclusions in is_op_unsupported_case, enabling more MoE operations to run on the OpenVINO backend and improving parity with CPU execution for numerically sensitive paths. [1] [2] [3] [4]

Codebase Maintenance

  • Added necessary OpenVINO includes for new element types (float4_e2m1, float8_e8m0) to support MXFP4 quantization.

These changes collectively enable efficient and correct use of MXFP4 quantized weights, especially for advanced architectures like MoE, while also improving the backend's flexibility and maintainability.## Overview

Additional information

Requirements

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant