Enable gpt-oss moe and mxfp4 support in OpenVINO backend by zhaixuejun1993 · Pull Request #231 · ravi9/llama.cpp

zhaixuejun1993 · 2026-06-25T02:17:08Z

This pull request adds support for the MXFP4 quantization format to the OpenVINO backend in GGML, with special handling for 3D MoE (Mixture-of-Experts) weights and necessary updates throughout the quantization, tensor extraction, and operator support logic. Additionally, it cleans up and simplifies the handling of certain MoE-related operations for improved compatibility and numerical stability.

MXFP4 Quantization Support

Added MXFP4 (GGML_TYPE_MXFP4) to the list of supported quantization types throughout the OpenVINO backend, including tensor creation, device support checks, and buffer allocation logic. [1] [2] [3] [4]
Implemented MXFP4-specific extraction, packing, and conversion logic in ggml-quants.cpp, including new routines for extracting data, handling scales as E8M0, and creating the appropriate OpenVINO nodes. [1] [2] [3]
Added special handling for 3D MXFP4 MoE weights in process_weight_tensor and related extraction/layout code, ensuring correct tensor shapes and memory layout for OpenVINO. [1] [2] [3] [4] [5]

Operator and Shape Handling Improvements

Updated operator support checks and device compatibility logic to allow 3D MXFP4 tensors for MoE use cases, and to reject unsupported 3D quantized tensors for other types. [1] [2]
Refined buffer allocation and tensor setting logic to recognize MXFP4 as a supported weight shape, not just 2D. [1] [2]

MoE Operation Handling Cleanup

Removed or simplified several MoE-specific operation exclusions in is_op_unsupported_case, enabling more MoE operations to run on the OpenVINO backend and improving parity with CPU execution for numerically sensitive paths. [1] [2] [3] [4]

Codebase Maintenance

Added necessary OpenVINO includes for new element types (float4_e2m1, float8_e8m0) to support MXFP4 quantization.

These changes collectively enable efficient and correct use of MXFP4 quantized weights, especially for advanced architectures like MoE, while also improving the backend's flexibility and maintainability.## Overview

Additional information

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure:

…port

zhaixuejun1993 requested review from cavusmustafa and wine99 as code owners June 25, 2026 02:17

zhaixuejun1993 mentioned this pull request Jun 25, 2026

Enable GPT-OSS MOE and MXFP4 support on OpenVINO backend #228

Closed

zhaixuejun1993 added 2 commits June 25, 2026 10:20

OpenVINO backend: 1) enable gpt-oss moe on OV bk; 2) enable mxfp4 sup…

ac7d478

…port

OpenVINO backend: disable TOPK_MOE op test

66ddab5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable gpt-oss moe and mxfp4 support in OpenVINO backend#231

Enable gpt-oss moe and mxfp4 support in OpenVINO backend#231
zhaixuejun1993 wants to merge 2 commits into
ravi9:dev_backend_openvinofrom
zhaixuejun1993:xuejun/enable_gpt-oss_all-v1

zhaixuejun1993 commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zhaixuejun1993 commented Jun 25, 2026

MXFP4 Quantization Support

Operator and Shape Handling Improvements

MoE Operation Handling Cleanup

Codebase Maintenance

Additional information

Requirements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant