Enable gpt-oss moe and mxfp4 support in OpenVINO backend#231
Open
zhaixuejun1993 wants to merge 2 commits into
Open
Enable gpt-oss moe and mxfp4 support in OpenVINO backend#231zhaixuejun1993 wants to merge 2 commits into
zhaixuejun1993 wants to merge 2 commits into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request adds support for the MXFP4 quantization format to the OpenVINO backend in GGML, with special handling for 3D MoE (Mixture-of-Experts) weights and necessary updates throughout the quantization, tensor extraction, and operator support logic. Additionally, it cleans up and simplifies the handling of certain MoE-related operations for improved compatibility and numerical stability.
MXFP4 Quantization Support
GGML_TYPE_MXFP4) to the list of supported quantization types throughout the OpenVINO backend, including tensor creation, device support checks, and buffer allocation logic. [1] [2] [3] [4]ggml-quants.cpp, including new routines for extracting data, handling scales as E8M0, and creating the appropriate OpenVINO nodes. [1] [2] [3]process_weight_tensorand related extraction/layout code, ensuring correct tensor shapes and memory layout for OpenVINO. [1] [2] [3] [4] [5]Operator and Shape Handling Improvements
MoE Operation Handling Cleanup
is_op_unsupported_case, enabling more MoE operations to run on the OpenVINO backend and improving parity with CPU execution for numerically sensitive paths. [1] [2] [3] [4]Codebase Maintenance
float4_e2m1,float8_e8m0) to support MXFP4 quantization.These changes collectively enable efficient and correct use of MXFP4 quantized weights, especially for advanced architectures like MoE, while also improving the backend's flexibility and maintainability.## Overview
Additional information
Requirements