Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
135 commits
Select commit Hold shift + click to select a range
4bc1eb1
Add interface is_model_splitted() to check the c-graph is splited or not
zhaixuejun1993 Mar 6, 2026
c49ec28
Infer and propagate dynamic-dimension indices for all tensors in the …
zhaixuejun1993 Mar 17, 2026
76eb69e
Only do this for fallback sub graph
zhaixuejun1993 Mar 19, 2026
7e6caef
Move dynamic dims compute in graph missmatch
zhaixuejun1993 Mar 23, 2026
d306b0b
ggml-openvino: fix tensor data handling for PERMUTE/VIEW ops in split…
zhaixuejun1993 Mar 19, 2026
01088c2
ggml-openvino:add comments
zhaixuejun1993 Mar 19, 2026
126d758
ggml-openvino: override VIEW op_case to 0 for split model inputs
zhaixuejun1993 Mar 19, 2026
32f9cb7
openvino backend: Handle unsupported VIEW shape-mismatch in OpenVINO …
zhaixuejun1993 Mar 19, 2026
f812f78
Enable additional mul_mat tests and add tensor data saving function (…
zhaixuejun1993 Mar 23, 2026
865f121
ggml-openvino: fix CONT/TRANSPOSE mapping and improve dynamic-dimensi…
zhaixuejun1993 Mar 26, 2026
ca3a176
OpenVINO: add NORM/TANH support and rework SOFT_MAX translation
zhaixuejun1993 Mar 28, 2026
a73a6dc
ggml-openvino: extend VIEW handling
zhaixuejun1993 Mar 30, 2026
bfa4c53
Enable -fa off (#118)
wine99 Apr 2, 2026
9c922b1
Enable --context-shift
wine99 Apr 10, 2026
59f0e3c
Fix llm param compute error for normal softmax not the softmax in att…
zhaixuejun1993 Apr 13, 2026
c8e9ce4
OpenVINO backend: fix error for attention size compute in llm param
zhaixuejun1993 Apr 13, 2026
9f355ed
use tensor->extra in infer_request i/o
wine99 Apr 27, 2026
dc5ed75
OpenVINO backend: refacter the compute_llm_params() func add get_atte…
zhaixuejun1993 Apr 29, 2026
e2ce59c
OpenVINO backend: clean unused code
zhaixuejun1993 Apr 29, 2026
130ef39
1to1 match op update (#146)
cavusmustafa May 6, 2026
13ddbf3
initial gemma4 support
May 5, 2026
7597773
removed hardcoded names for kv cache slicing
cavusmustafa May 5, 2026
a1baa1a
OpenVINO backend: Add new attention pattern for llm parameters compute
zhaixuejun1993 May 6, 2026
dad8acd
flash attn Q shape static conversion
cavusmustafa May 4, 2026
760e86d
Remove slice in permute translation when n_seq is 1
cavusmustafa May 4, 2026
5a39967
return optional in extract_layer_from_name
wine99 May 7, 2026
d289bbd
OpenVINO backend: refactor VIEW related operation (#148)
zhaixuejun1993 May 7, 2026
e0caf43
OpenVINO backend: Add ops l2_norm & pad
zhaixuejun1993 May 6, 2026
8c83092
OpenVINO backend does not support CPY with non-contiguous data or mis…
zhaixuejun1993 May 7, 2026
a08546f
add op SSM_CONV GATED_DELTA_NET
wine99 May 7, 2026
0d0fb42
OpenVINO backend: fix error for bf16 in OV gpu plugin
zhaixuejun1993 May 7, 2026
c064e87
reverted static Q input shape for attention layer
cavusmustafa May 7, 2026
d44fa9c
OpenVINO backend: remove hardcode name inp_tokens, which ignore some …
zhaixuejun1993 May 8, 2026
6afe652
Disable remote tensor due to bug in ov gpu
wine99 May 12, 2026
d2279ae
Disable n_token > 1 GATED_DELTA_NET on gpu
wine99 May 12, 2026
e6a7a9e
OpenVINO backend: fix the view op dynamic handling issue in gemma4 & …
zhaixuejun1993 May 13, 2026
418c5e5
OpenVINO backend: clean code
zhaixuejun1993 May 13, 2026
c6efcb6
OpenVINO backend: enable view + norm/rms_norm
zhaixuejun1993 May 9, 2026
eafd08e
OpenVINO backend: concat op
zhaixuejun1993 May 9, 2026
89858ec
OpenVINO backend: argsort op
zhaixuejun1993 May 9, 2026
e25ed8f
OpenVINO backend: enable unary + view & GGML_UNARY_OP_SOFTPLUS
zhaixuejun1993 May 11, 2026
996f0c7
Fix issue for test-backend-ops in TOPK_MOE, which compare VIEW ops re…
zhaixuejun1993 May 11, 2026
12863b8
OpenVINO backend: enable sum_rows
zhaixuejun1993 May 11, 2026
404d6b3
OpenVINO backend: enable clamp
zhaixuejun1993 May 11, 2026
41c35a3
OpenVINO backend: enable DIV
zhaixuejun1993 May 11, 2026
03e835c
OpenVINO backend: enable GGML_OP_MUL_MAT_ID
zhaixuejun1993 May 11, 2026
08438be
OpenVINO backend: disable MUL_MAT_ID_FUSION case with large mem needed
zhaixuejun1993 May 11, 2026
904c608
OpenVINO backend: Disable GGML_OP_ARGSORT, cause test_backend-ops failed
zhaixuejun1993 May 13, 2026
d2ca0f8
OpenVINO backend: fix issue in mul_mat_id
zhaixuejun1993 May 14, 2026
2aa3b2d
OpenVINO backend: Disable DIV with broadcast on GPU
zhaixuejun1993 May 14, 2026
59e3d64
OpenVINO backend: update DIV
zhaixuejun1993 May 15, 2026
4472ce0
use ov internal op GatedDeltaNet
wine99 May 19, 2026
4bbb85f
OpenVINO backend: enable llama erch test qwen3next
zhaixuejun1993 May 19, 2026
3032423
OpenVINO backend: enable RMS_NORM + VIEW & remove op_case 2 for rope
zhaixuejun1993 May 7, 2026
c4bd20f
OpenVINO backend: fix error
zhaixuejun1993 May 7, 2026
2c2541c
suggested changes, need review
wine99 May 7, 2026
d11e198
suggested changes, need review
wine99 May 7, 2026
c4f2ec7
OpenVINO backend: clean unused code & fix build warning
zhaixuejun1993 May 20, 2026
46bddb1
OpenVINO backend: enable minicpm3 for arch test
zhaixuejun1993 May 20, 2026
bb38483
Disable GDN op (#177)
wine99 May 21, 2026
645df27
disable gated_delta_net
wine99 May 22, 2026
08b4fd6
update stateful_kv_size correctly in mismatch case
wine99 May 19, 2026
d2c7549
OpenVINO backend: enable arch test for qwen3vl
May 19, 2026
e05da27
OpenVINO backend: enable cohere2 for arch test
zhaixuejun1993 May 20, 2026
c3c4dba
OpenVINO backend: enable t5 for arch test
zhaixuejun1993 May 20, 2026
a32aeb5
OpenVINO backend: enable jamba for arch test
zhaixuejun1993 May 21, 2026
a0155c4
OpenVINO backend: remove warning for tmp
zhaixuejun1993 May 21, 2026
b1f6fb4
OpenVINO backend: enable kimi-linear for arch test
zhaixuejun1993 May 21, 2026
603c7dc
Remove unused
zhaixuejun1993 May 25, 2026
21bab71
Fix gpt-oss accuracy issue
yangwang201911 May 22, 2026
f49b026
OpenVINO backend: enable arctic for arch test
zhaixuejun1993 May 24, 2026
65ec35a
OpenVINO backend: enable grok for arch test
zhaixuejun1993 May 25, 2026
292b156
Gemma4 initial npu support (#179)
cavusmustafa May 26, 2026
c832153
ggml-openvino: add GGML_OPENVINO_ENABLE_CACHE env var to control deco…
zhaixuejun1993 May 26, 2026
af2a8e1
Revert "Gemma4 initial npu support (#179)"
wine99 May 26, 2026
a16cfb4
OpenVINO backend: disable debug log print
zhaixuejun1993 May 26, 2026
36c5cd5
Update TBB discovery. Delegated to OpenVINOs own config.
ravi9 May 26, 2026
6df01a7
OpenVINO backend: GGML_OPENVINO_ENABLE_CACHE YES -> 1
zhaixuejun1993 May 27, 2026
2ab4121
OpenVINO backend: fallback FLASH_ATTN_EXT in gemma3n to CPU backend
zhaixuejun1993 May 28, 2026
6b1c5aa
Add raw ov infer profiling metric
virajwad May 28, 2026
d194391
Add OV raw infer time metric to static compute path
Copilot May 28, 2026
f1a5340
Modify precision of static profiling
virajwad May 28, 2026
88f22fd
update to OV 2026.2, add OV windows CI
ravi9 May 29, 2026
ccb1b23
fix editorconfig-checks
ravi9 May 29, 2026
df50c52
Initiall gemma4 npu support
cavusmustafa May 21, 2026
b397e94
temp. fix for gemma4 accuracy bug on npu
cavusmustafa May 21, 2026
41ce1c7
Remove hardcoded names for npu-fold handling
cavusmustafa May 21, 2026
7baa213
revert static n tokens for cont translation as it is not needed
cavusmustafa May 21, 2026
5fa8e5e
removed unused variable
cavusmustafa May 25, 2026
b9cba9d
test-llama-archs fix
cavusmustafa May 28, 2026
e8324ac
Fix gemma4 flash_attn fallback
cavusmustafa May 28, 2026
10a2cfd
support im2col
mostafafaheem May 28, 2026
38e9d59
fix code style
mostafafaheem May 29, 2026
9c0ca74
disable add_rope_sin_cos optimization
wine99 Jun 1, 2026
bbc4319
stateless boradcast and rope optimizations
cavusmustafa May 22, 2026
e3bdd6b
Enable manual gqa attn by default for stateless gpu
cavusmustafa May 22, 2026
bc11c32
manual gqa: fixed static batch
Jun 1, 2026
d551f5b
gemma4 llama-bench ctx update fix
cavusmustafa Jun 2, 2026
2e33f25
Update OV win CI
ravi9 Jun 2, 2026
699fd7d
stateful rope fusion temp. fix
cavusmustafa Jun 2, 2026
d05ce54
OpenVINO backend: Conslolidate supported ops
mostafafaheem Jun 4, 2026
b32c04e
Exclude unsupported GGML_OP_SUB cases
mostafafaheem Jun 4, 2026
1d9fa46
Exclude unsupported TOPK_MOE cases
mostafafaheem Jun 4, 2026
71ba113
OpenVINO Backend: MUL_MAT enhancements
mostafafaheem Jun 4, 2026
1c64362
Update OV CI
ravi9 Jun 5, 2026
f7bbe7c
support f16 mask input for npu
wine99 Jun 5, 2026
efbc565
Make GGML_OPENVINO_* env vars usage uniform
ravi9 Jun 9, 2026
34b2bee
OpenVINO backend: Enhance envvar handling
mostafafaheem Jun 8, 2026
b3f21ea
more cleanup
mostafafaheem Jun 9, 2026
e68a103
move ggml_openvino_env_flag to appropriate place
mostafafaheem Jun 9, 2026
65e2ecc
OpenVINO backend: add REPEAT translator, Q5_1 weights, and GLU view-i…
cavusmustafa Jun 9, 2026
8ea91dd
Merge pull request #208 from mostafafaheem/envvar_cleanup
ravi9 Jun 9, 2026
341b615
ggml-openvino: fix -Werror=cast-qual in extract_q5_1_data
cavusmustafa Jun 9, 2026
4c878bd
Merge pull request #209 from cavusmustafa/op_translations_q51
ravi9 Jun 9, 2026
971816c
Update openvino.Dockerfile
ravi9 Jun 10, 2026
835121d
ggml-openvino: centralize env var access via *getenv_str/getenv_int h…
ravi9 Jun 10, 2026
3365e31
OpenVINO backend: Enable GGML_OP_ADD_ID
zhaixuejun1993 Jun 10, 2026
906a48d
Merge pull request #210 from zhaixuejun1993/xuejun/add_op_add_id
zhaixuejun1993 Jun 10, 2026
dd5c58d
Uptade openvino backend clamg-format
wine99 Jun 12, 2026
a9045e0
clang-format
wine99 Jun 12, 2026
fb924cb
Update OPENVINO.md (#211)
ravi9 Jun 12, 2026
ba6c06d
Merge branch 'master' into dev_backend_openvino
ravi9 Jun 12, 2026
1d3035b
Merge branch 'master' into dev_backend_openvino
ravi9 Jun 13, 2026
90ae917
OpenVINO backend: fix accuracy issue for op CONCAT with i64 precision
zhaixuejun1993 Jun 15, 2026
383d163
Merge pull request #214 from zhaixuejun1993/xuejun/fix-error-op-concat
zhaixuejun1993 Jun 15, 2026
00e80a9
Remove strict concurrency for gpu-openvino-low-perf
ravi9 Jun 15, 2026
65d4041
Update openvino CI keynames; add ccache-clear
ravi9 Jun 16, 2026
ce52f0a
Apply suggestions from code review
ravi9 Jun 16, 2026
3481530
Fix formatting
ravi9 Jun 16, 2026
b7b94ec
ggml-openvino: add Gemma-4 26B MoE support
cavusmustafa Jun 12, 2026
f349771
ggml-openvino: tie GET_ROWS batched-gather indices to data batch dim
cavusmustafa Jun 12, 2026
a51a1e2
ggml-openvino: keep MoE token dim dynamic (gemma4 decode + GPU prefill)
cavusmustafa Jun 15, 2026
c8538a2
ggml-openvino: add GGML_OPENVINO_GPU_FULL_MOE to keep MoE on one OV s…
cavusmustafa Jun 15, 2026
5f01724
ggml-openvino: auto-enable full-MoE GPU path + dodge GPU rms_fusion bug
cavusmustafa Jun 15, 2026
0886e0f
ggml-openvino: fix op-test regressions (MUL_MAT_ID large-tmp cap + q4…
cavusmustafa Jun 18, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 63 additions & 46 deletions .devops/openvino.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
ARG OPENVINO_VERSION_MAJOR=2026.0
ARG OPENVINO_VERSION_FULL=2026.0.0.20965.c6d6a13a886
ARG OPENVINO_VERSION_MAJOR=2026.2
ARG OPENVINO_VERSION_FULL=2026.2.0.21903.52ddc073857
ARG UBUNTU_VERSION=24.04

# Intel GPU driver versions. https://github.com/intel/compute-runtime/releases
ARG IGC_VERSION=v2.30.1
ARG IGC_VERSION_FULL=2_2.30.1+20950
ARG COMPUTE_RUNTIME_VERSION=26.09.37435.1
ARG COMPUTE_RUNTIME_VERSION_FULL=26.09.37435.1-0
ARG IGDGMM_VERSION=22.9.0
ARG IGC_VERSION=v2.34.4
ARG IGC_VERSION_FULL=2_2.34.4+21428
ARG COMPUTE_RUNTIME_VERSION=26.18.38308.1
ARG COMPUTE_RUNTIME_VERSION_FULL=26.18.38308.1-0
ARG IGDGMM_VERSION=22.10.0

# Intel NPU driver versions. https://github.com/intel/linux-npu-driver/releases
ARG NPU_DRIVER_VERSION=v1.32.0
ARG NPU_DRIVER_FULL=v1.32.0.20260402-23905121947
ARG NPU_DRIVER_VERSION=v1.33.0
ARG NPU_DRIVER_FULL=v1.33.0.20260529-26625960453
ARG LIBZE1_VERSION=1.27.0-1~24.04~ppa2

# Optional proxy build arguments
Expand Down Expand Up @@ -46,13 +46,18 @@ RUN apt-get update && \
intel-opencl-icd && \
rm -rf /var/lib/apt/lists/*

# Install OpenVINO for Ubuntu 24.04
# OpenVINO toolkit and GPU/NPU drivers are cached via BuildKit cache mounts to avoid re-downloading on rebuilds.
# Install OpenVINO for Ubuntu 24.04.
ARG OPENVINO_VERSION_MAJOR
ARG OPENVINO_VERSION_FULL
RUN mkdir -p /opt/intel && \
wget https://storage.openvinotoolkit.org/repositories/openvino/packages/${OPENVINO_VERSION_MAJOR}/linux/openvino_toolkit_ubuntu24_${OPENVINO_VERSION_FULL}_x86_64.tgz && \
tar -xf openvino_toolkit_ubuntu24_${OPENVINO_VERSION_FULL}_x86_64.tgz && \
mv openvino_toolkit_ubuntu24_${OPENVINO_VERSION_FULL}_x86_64 /opt/intel/openvino_${OPENVINO_VERSION_MAJOR} && \
RUN --mount=type=cache,target=/var/cache/openvino,sharing=locked \
mkdir -p /opt/intel && \
TGZ=/var/cache/openvino/openvino_toolkit_ubuntu24_${OPENVINO_VERSION_FULL}_x86_64.tgz && \
if [ ! -f "$TGZ" ]; then \
wget -O "$TGZ" https://storage.openvinotoolkit.org/repositories/openvino/packages/${OPENVINO_VERSION_MAJOR}/linux/openvino_toolkit_ubuntu24_${OPENVINO_VERSION_FULL}_x86_64.tgz; \
fi && \
tar -xf "$TGZ" -C /opt/intel/ && \
mv /opt/intel/openvino_toolkit_ubuntu24_${OPENVINO_VERSION_FULL}_x86_64 /opt/intel/openvino_${OPENVINO_VERSION_MAJOR} && \
cd /opt/intel/openvino_${OPENVINO_VERSION_MAJOR} && \
echo "Y" | ./install_dependencies/install_openvino_dependencies.sh && \
cd - && \
Expand All @@ -68,14 +73,14 @@ COPY . .
RUN bash -c "source ${OpenVINO_DIR}/setupvars.sh && \
cmake -B build/ReleaseOV -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DLLAMA_BUILD_TESTS=OFF \
-DGGML_OPENVINO=ON && \
cmake --build build/ReleaseOV -j$(nproc)"
cmake --build build/ReleaseOV --parallel "

# Copy all necessary libraries
# Copy all necessary libraries (build outputs + OpenVINO runtime libs)
RUN mkdir -p /app/lib && \
find build/ReleaseOV -name '*.so*' -exec cp {} /app/lib \; && \
find ${OpenVINO_DIR}/runtime/lib/intel64 -name '*.so*' -exec cp -P {} /app/lib \; 2>/dev/null || \
find ${OpenVINO_DIR}/lib/intel64 -name '*.so*' -exec cp -P {} /app/lib \;
find build/ReleaseOV -name '*.so*' -exec cp -P {} /app/lib \; && \
find "${OpenVINO_DIR}/runtime/lib/intel64" -name '*.so*' -exec cp -P {} /app/lib \;

# Create runtime directories and copy binaries
RUN mkdir -p /app/full \
Expand Down Expand Up @@ -120,33 +125,41 @@ ARG IGC_VERSION_FULL
ARG COMPUTE_RUNTIME_VERSION
ARG COMPUTE_RUNTIME_VERSION_FULL
ARG IGDGMM_VERSION
RUN mkdir /tmp/neo/ && cd /tmp/neo/ \
&& wget https://github.com/intel/intel-graphics-compiler/releases/download/${IGC_VERSION}/intel-igc-core-${IGC_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/intel-graphics-compiler/releases/download/${IGC_VERSION}/intel-igc-opencl-${IGC_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/intel-ocloc-dbgsym_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.ddeb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/intel-ocloc_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/intel-opencl-icd-dbgsym_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.ddeb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/intel-opencl-icd_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/libigdgmm12_${IGDGMM_VERSION}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/libze-intel-gpu1-dbgsym_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.ddeb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/libze-intel-gpu1_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.deb \
&& dpkg --install *.deb \
&& rm -rf /tmp/neo/
RUN --mount=type=cache,target=/var/cache/intel-gpu,sharing=locked \
set -eux; \
cd /var/cache/intel-gpu; \
for url in \
https://github.com/intel/intel-graphics-compiler/releases/download/${IGC_VERSION}/intel-igc-core-${IGC_VERSION_FULL}_amd64.deb \
https://github.com/intel/intel-graphics-compiler/releases/download/${IGC_VERSION}/intel-igc-opencl-${IGC_VERSION_FULL}_amd64.deb \
https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/intel-ocloc_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.deb \
https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/intel-opencl-icd_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.deb \
https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/libigdgmm12_${IGDGMM_VERSION}_amd64.deb \
https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/libze-intel-gpu1_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.deb ; do \
f=$(basename "$url"); \
[ -f "$f" ] || wget -q -O "$f" "$url"; \
done; \
apt-get update; \
apt-get install -y --no-install-recommends ./*.deb; \
rm -rf /var/lib/apt/lists/*

# Install NPU drivers
ARG NPU_DRIVER_VERSION
ARG NPU_DRIVER_FULL
ARG LIBZE1_VERSION
RUN mkdir /tmp/npu/ && cd /tmp/npu/ \
&& wget https://github.com/intel/linux-npu-driver/releases/download/${NPU_DRIVER_VERSION}/linux-npu-driver-${NPU_DRIVER_FULL}-ubuntu2404.tar.gz \
&& tar -xf linux-npu-driver-${NPU_DRIVER_FULL}-ubuntu2404.tar.gz \
&& dpkg --install *.deb \
&& rm -rf /tmp/npu/

RUN cd /tmp \
&& wget https://snapshot.ppa.launchpadcontent.net/kobuk-team/intel-graphics/ubuntu/20260324T100000Z/pool/main/l/level-zero-loader/libze1_${LIBZE1_VERSION}_amd64.deb \
&& dpkg --install libze1_${LIBZE1_VERSION}_amd64.deb \
&& rm libze1_${LIBZE1_VERSION}_amd64.deb
RUN --mount=type=cache,target=/var/cache/intel-npu,sharing=locked \
set -eux; \
TGZ=/var/cache/intel-npu/linux-npu-driver-${NPU_DRIVER_FULL}-ubuntu2404.tar.gz; \
if [ ! -f "$TGZ" ]; then \
wget -q -O "$TGZ" https://github.com/intel/linux-npu-driver/releases/download/${NPU_DRIVER_VERSION}/linux-npu-driver-${NPU_DRIVER_FULL}-ubuntu2404.tar.gz; \
fi; \
DEB=/var/cache/intel-npu/libze1_${LIBZE1_VERSION}_amd64.deb; \
if [ ! -f "$DEB" ]; then \
wget -q -O "$DEB" https://snapshot.ppa.launchpadcontent.net/kobuk-team/intel-graphics/ubuntu/20260324T100000Z/pool/main/l/level-zero-loader/libze1_${LIBZE1_VERSION}_amd64.deb; \
fi; \
mkdir /tmp/npu/ && cd /tmp/npu/ && tar -xf "$TGZ" && cp "$DEB" .; \
apt-get update; \
apt-get install -y --no-install-recommends ./*.deb; \
rm -rf /tmp/npu/ /var/lib/apt/lists/*

COPY --from=build /app/lib/ /app/

Expand All @@ -166,22 +179,26 @@ RUN apt-get update && \
python3 \
python3-venv \
python3-pip && \
python3 -m venv /ov-venv && \
/ov-venv/bin/pip install --no-cache-dir --upgrade pip setuptools wheel && \
/ov-venv/bin/pip install --no-cache-dir -r requirements.txt && \
python3 -m venv /openvino-venv && \
/openvino-venv/bin/pip install --no-cache-dir --upgrade pip setuptools wheel && \
/openvino-venv/bin/pip install --no-cache-dir -r requirements.txt && \
apt-get autoremove -y && \
apt-get clean && \
rm -rf /tmp/* /var/tmp/* && \
find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete && \
find /var/cache -type f -delete

ENTRYPOINT ["/bin/bash", "-c", "source /ov-venv/bin/activate && exec /app/tools.sh \"$@\"", "--"]
# Activate the venv
ENV VIRTUAL_ENV=/openvino-venv \
PATH=/openvino-venv/bin:$PATH

ENTRYPOINT ["/app/tools.sh"]


### Light, CLI only
FROM base AS light

COPY --from=build /app/full/llama-cli /app/
COPY --from=build /app/full/llama-cli /app/full/llama-completion /app/

WORKDIR /app

Expand Down
24 changes: 24 additions & 0 deletions .github/actions/windows-setup-openvino/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
name: "Windows - Setup OpenVINO Toolkit"
description: "Setup OpenVINO Toolkit for Windows"
inputs:
path:
description: "Installation path"
required: true
version_major:
description: "OpenVINO major version (e.g., 2026.2)"
required: true
version_full:
description: "OpenVINO full version"
required: true

runs:
using: "composite"
steps:
- name: Download and extract OpenVINO Runtime
shell: powershell
run: |
$url = "https://storage.openvinotoolkit.org/repositories/openvino/packages/${{ inputs.version_major }}/windows/openvino_toolkit_windows_${{ inputs.version_full }}_x86_64.zip"
$out = "openvino.zip"
Invoke-WebRequest -Uri $url -OutFile $out
Expand-Archive -Path $out -DestinationPath ${{ inputs.path }} -Force
Remove-Item $out
32 changes: 30 additions & 2 deletions .github/workflows/build-cache.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,8 +68,8 @@ jobs:

env:
# Sync versions in build.yml, build-self-hosted.yml, release.yml, build-cache.yml, .devops/openvino.Dockerfile
OPENVINO_VERSION_MAJOR: "2026.0"
OPENVINO_VERSION_FULL: "2026.0.0.20965.c6d6a13a886"
OPENVINO_VERSION_MAJOR: "2026.2"
OPENVINO_VERSION_FULL: "2026.2.0.21903.52ddc073857"

steps:
- name: Clone
Expand All @@ -91,6 +91,34 @@ jobs:
version_major: ${{ env.OPENVINO_VERSION_MAJOR }}
version_full: ${{ env.OPENVINO_VERSION_FULL }}

windows-2022-openvino-cache:
runs-on: windows-2022

env:
# Sync versions in build.yml, build-self-hosted.yml, release.yml, build-cache.yml, .devops/openvino.Dockerfile
OPENVINO_VERSION_MAJOR: "2026.2"
OPENVINO_VERSION_FULL: "2026.2.0.21903.52ddc073857"

steps:
- name: Clone
id: checkout
uses: actions/checkout@v6

- name: Setup Cache
uses: actions/cache@v5
id: cache-openvino
with:
path: ./openvino_toolkit
key: cache-gha-openvino-toolkit-v${{ env.OPENVINO_VERSION_FULL }}-${{ runner.os }}

- name: Setup OpenVINO Toolkit
if: steps.cache-openvino.outputs.cache-hit != 'true'
uses: ./.github/actions/windows-setup-openvino
with:
path: ./openvino_toolkit
version_major: ${{ env.OPENVINO_VERSION_MAJOR }}
version_full: ${{ env.OPENVINO_VERSION_FULL }}

windows-2022-rocm-cache:
runs-on: windows-2022

Expand Down
89 changes: 81 additions & 8 deletions .github/workflows/build-openvino.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,14 +37,10 @@ jobs:
ubuntu-24-openvino:
runs-on: [self-hosted, Linux, Intel, OpenVINO]

concurrency:
group: openvino-gpu-${{ github.head_ref || github.ref }}
cancel-in-progress: false

env:
# Sync versions in build-openvino.yml, build-self-hosted.yml, release.yml, build-cache.yml, .devops/openvino.Dockerfile
OPENVINO_VERSION_MAJOR: "2026.0"
OPENVINO_VERSION_FULL: "2026.0.0.20965.c6d6a13a886"
OPENVINO_VERSION_MAJOR: "2026.2"
OPENVINO_VERSION_FULL: "2026.2.0.21903.52ddc073857"

steps:
- name: Clone
Expand Down Expand Up @@ -78,7 +74,7 @@ jobs:
cmake -B build/ReleaseOV -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_OPENVINO=ON
time cmake --build build/ReleaseOV --config Release -j $(nproc)
time cmake --build build/ReleaseOV --config Release --parallel

- name: Test (CPU)
id: cmake_test_cpu
Expand All @@ -93,4 +89,81 @@ jobs:
run: |
cd ${{ github.workspace }}
export GGML_OPENVINO_DEVICE=GPU
ctest --test-dir build/ReleaseOV -L main -E "test-llama-archs" --verbose --timeout 2000
ctest --test-dir build/ReleaseOV -L main -E "test-llama-archs" --verbose --timeout 3000

openvino-windows-2022:
runs-on: windows-2022

env:
# Sync versions in build-openvino.yml, build-self-hosted.yml, release.yml, build-cache.yml, .devops/openvino.Dockerfile
OPENVINO_VERSION_MAJOR: "2026.2"
OPENVINO_VERSION_FULL: "2026.2.0.21903.52ddc073857"

steps:
- name: Clone
id: checkout
uses: actions/checkout@v6

- name: ccache
uses: ggml-org/ccache-action@v1.2.21
with:
key: openvino-windows-2022
variant: ccache
evict-old-files: 1d
save: ${{ github.event_name == 'push' && github.ref == 'refs/heads/master' }}

- name: Setup Cache
uses: actions/cache@v5
id: cache-openvino
with:
path: ./openvino_toolkit
key: cache-gha-openvino-toolkit-v${{ env.OPENVINO_VERSION_FULL }}-${{ runner.os }}

- name: Setup OpenVINO Toolkit
if: steps.cache-openvino.outputs.cache-hit != 'true'
uses: ./.github/actions/windows-setup-openvino
with:
path: ./openvino_toolkit
version_major: ${{ env.OPENVINO_VERSION_MAJOR }}
version_full: ${{ env.OPENVINO_VERSION_FULL }}

- name: Install OpenCL using vcpkg
shell: powershell
run: |
git clone https://github.com/microsoft/vcpkg C:\vcpkg
C:\vcpkg\bootstrap-vcpkg.bat
C:\vcpkg\vcpkg install opencl

- name: Build
id: cmake_build
shell: cmd
run: |
REM Find extracted OpenVINO folder dynamically
for /d %%i in (openvino_toolkit\*) do set OPENVINO_ROOT=%%i

if not exist "%OPENVINO_ROOT%\runtime\cmake\OpenVINOConfig.cmake" (
echo ERROR: OpenVINOConfig.cmake not found
exit /b 1
)

call "%OPENVINO_ROOT%\setupvars.bat"

cmake -B build\ReleaseOV -G "Visual Studio 17 2022" ^
-A x64 ^
-DCMAKE_BUILD_TYPE=Release ^
-DGGML_OPENVINO=ON ^
-DCMAKE_TOOLCHAIN_FILE=C:\vcpkg\scripts\buildsystems\vcpkg.cmake

cmake --build build\ReleaseOV --config Release -- /m

- name: Test (CPU)
id: cmake_test_cpu
shell: cmd
# TODO: fix and re-enable the `test-llama-archs` test below
run: |
REM Find extracted OpenVINO folder dynamically
for /d %%i in (openvino_toolkit\*) do set OPENVINO_ROOT=%%i
call "%OPENVINO_ROOT%\setupvars.bat"

cd build
ctest --test-dir ReleaseOV -L main -E "test-llama-archs" -C Release --verbose --timeout 3000
8 changes: 2 additions & 6 deletions .github/workflows/build-self-hosted.yml
Original file line number Diff line number Diff line change
Expand Up @@ -264,14 +264,10 @@ jobs:
gpu-openvino-low-perf:
runs-on: [self-hosted, Linux, Intel, OpenVINO]

concurrency:
group: openvino-gpu-${{ github.head_ref || github.ref }}
cancel-in-progress: false

env:
# Sync versions in build.yml, build-self-hosted.yml, release.yml, build-cache.yml, .devops/openvino.Dockerfile
OPENVINO_VERSION_MAJOR: "2026.0"
OPENVINO_VERSION_FULL: "2026.0.0.20965.c6d6a13a886"
OPENVINO_VERSION_MAJOR: "2026.2"
OPENVINO_VERSION_FULL: "2026.2.0.21903.52ddc073857"

steps:
- name: Clone
Expand Down
Loading
Loading