Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
a8f15fb
Add interface is_model_splitted() to check the c-graph is splited or not
zhaixuejun1993 Mar 6, 2026
c8c3bd4
Infer and propagate dynamic-dimension indices for all tensors in the …
zhaixuejun1993 Mar 17, 2026
6c855e7
Only do this for fallback sub graph
zhaixuejun1993 Mar 19, 2026
c7af12b
Move dynamic dims compute in graph missmatch
zhaixuejun1993 Mar 23, 2026
2a118eb
ggml-openvino: fix tensor data handling for PERMUTE/VIEW ops in split…
zhaixuejun1993 Mar 19, 2026
54fe67e
ggml-openvino:add comments
zhaixuejun1993 Mar 19, 2026
74ba8fd
ggml-openvino: override VIEW op_case to 0 for split model inputs
zhaixuejun1993 Mar 19, 2026
5ec12bd
openvino backend: Handle unsupported VIEW shape-mismatch in OpenVINO …
zhaixuejun1993 Mar 19, 2026
6f3e20f
Enable additional mul_mat tests and add tensor data saving function (…
zhaixuejun1993 Mar 23, 2026
713bcb0
ggml-openvino: fix CONT/TRANSPOSE mapping and improve dynamic-dimensi…
zhaixuejun1993 Mar 26, 2026
4fbc557
OpenVINO: add NORM/TANH support and rework SOFT_MAX translation
zhaixuejun1993 Mar 28, 2026
015b607
ggml-openvino: extend VIEW handling
zhaixuejun1993 Mar 30, 2026
9e0f352
Enable -fa off (#118)
wine99 Apr 2, 2026
8f05691
Enable --context-shift
wine99 Apr 10, 2026
4c9b609
Fix llm param compute error for normal softmax not the softmax in att…
zhaixuejun1993 Apr 13, 2026
1ba5fd8
OpenVINO backend: fix error for attention size compute in llm param
zhaixuejun1993 Apr 13, 2026
644dbea
use tensor->extra in infer_request i/o
wine99 Apr 27, 2026
a979e24
OpenVINO backend: refacter the compute_llm_params() func add get_atte…
zhaixuejun1993 Apr 29, 2026
3f433c5
OpenVINO backend: clean unused code
zhaixuejun1993 Apr 29, 2026
3bc7e76
1to1 match op update (#146)
cavusmustafa May 6, 2026
19c79fd
initial gemma4 support
May 5, 2026
7897870
removed hardcoded names for kv cache slicing
cavusmustafa May 5, 2026
329c4b5
OpenVINO backend: Add new attention pattern for llm parameters compute
zhaixuejun1993 May 6, 2026
f1e32c5
flash attn Q shape static conversion
cavusmustafa May 4, 2026
33a2160
Remove slice in permute translation when n_seq is 1
cavusmustafa May 4, 2026
05c0385
return optional in extract_layer_from_name
wine99 May 7, 2026
bdc858d
OpenVINO backend: refactor VIEW related operation (#148)
zhaixuejun1993 May 7, 2026
51114e5
OpenVINO backend: Add ops l2_norm & pad
zhaixuejun1993 May 6, 2026
05ff7d0
OpenVINO backend does not support CPY with non-contiguous data or mis…
zhaixuejun1993 May 7, 2026
322bb87
add op SSM_CONV GATED_DELTA_NET
wine99 May 7, 2026
8cae14e
OpenVINO backend: fix error for bf16 in OV gpu plugin
zhaixuejun1993 May 7, 2026
f80474c
reverted static Q input shape for attention layer
cavusmustafa May 7, 2026
b61ffd4
OpenVINO backend: remove hardcode name inp_tokens, which ignore some …
zhaixuejun1993 May 8, 2026
8ba38ca
Disable remote tensor due to bug in ov gpu
wine99 May 12, 2026
edc0630
Disable n_token > 1 GATED_DELTA_NET on gpu
wine99 May 12, 2026
9331bb3
OpenVINO backend: fix the view op dynamic handling issue in gemma4 & …
zhaixuejun1993 May 13, 2026
fd0ac6d
OpenVINO backend: clean code
zhaixuejun1993 May 13, 2026
f9c343c
OpenVINO backend: enable view + norm/rms_norm
zhaixuejun1993 May 9, 2026
ebccf37
OpenVINO backend: concat op
zhaixuejun1993 May 9, 2026
0a08624
OpenVINO backend: argsort op
zhaixuejun1993 May 9, 2026
42241a2
OpenVINO backend: enable unary + view & GGML_UNARY_OP_SOFTPLUS
zhaixuejun1993 May 11, 2026
6ed8f78
Fix issue for test-backend-ops in TOPK_MOE, which compare VIEW ops re…
zhaixuejun1993 May 11, 2026
b75e927
OpenVINO backend: enable sum_rows
zhaixuejun1993 May 11, 2026
2f32361
OpenVINO backend: enable clamp
zhaixuejun1993 May 11, 2026
ba3754a
OpenVINO backend: enable DIV
zhaixuejun1993 May 11, 2026
f27b978
OpenVINO backend: enable GGML_OP_MUL_MAT_ID
zhaixuejun1993 May 11, 2026
13b71f0
OpenVINO backend: disable MUL_MAT_ID_FUSION case with large mem needed
zhaixuejun1993 May 11, 2026
9384961
OpenVINO backend: Disable GGML_OP_ARGSORT, cause test_backend-ops failed
zhaixuejun1993 May 13, 2026
833111b
OpenVINO backend: fix issue in mul_mat_id
zhaixuejun1993 May 14, 2026
7f48bc7
OpenVINO backend: Disable DIV with broadcast on GPU
zhaixuejun1993 May 14, 2026
24f2bde
OpenVINO backend: update DIV
zhaixuejun1993 May 15, 2026
952d10a
use ov internal op GatedDeltaNet
wine99 May 19, 2026
5c7fc91
OpenVINO backend: enable llama erch test qwen3next
zhaixuejun1993 May 19, 2026
af9d8c5
OpenVINO backend: enable RMS_NORM + VIEW & remove op_case 2 for rope
zhaixuejun1993 May 7, 2026
4b86839
OpenVINO backend: fix error
zhaixuejun1993 May 7, 2026
2443297
suggested changes, need review
wine99 May 7, 2026
f825020
suggested changes, need review
wine99 May 7, 2026
b86472a
OpenVINO backend: clean unused code & fix build warning
zhaixuejun1993 May 20, 2026
4f247b1
OpenVINO backend: enable minicpm3 for arch test
zhaixuejun1993 May 20, 2026
d81ede3
Disable GDN op (#177)
wine99 May 21, 2026
40e0d19
disable gated_delta_net
wine99 May 22, 2026
9e589ee
update stateful_kv_size correctly in mismatch case
wine99 May 19, 2026
4c8db1e
OpenVINO backend: enable arch test for qwen3vl
May 19, 2026
3884cdc
OpenVINO backend: enable cohere2 for arch test
zhaixuejun1993 May 20, 2026
da48690
Merge pull request #180 from zhaixuejun1993/xuejun/arch-test-qw3vl-co…
zhaixuejun1993 May 25, 2026
c2c5fe7
OpenVINO backend: enable t5 for arch test
zhaixuejun1993 May 20, 2026
58e411d
Merge pull request #181 from zhaixuejun1993/xuejun/arch-test-t5
zhaixuejun1993 May 25, 2026
e2e143d
OpenVINO backend: enable jamba for arch test
zhaixuejun1993 May 21, 2026
0e80117
OpenVINO backend: remove warning for tmp
zhaixuejun1993 May 21, 2026
1564679
OpenVINO backend: enable kimi-linear for arch test
zhaixuejun1993 May 21, 2026
2e7bb2f
Remove unused
zhaixuejun1993 May 25, 2026
24393b2
Merge pull request #182 from zhaixuejun1993/xuejun/arch-test-jamba
zhaixuejun1993 May 25, 2026
25cd873
Fix gpt-oss accuracy issue
yangwang201911 May 22, 2026
5dd95ea
OpenVINO backend: enable arctic for arch test
zhaixuejun1993 May 24, 2026
6665562
OpenVINO backend: enable grok for arch test
zhaixuejun1993 May 25, 2026
48ef5fe
Merge pull request #183 from zhaixuejun1993/xuejun/arch-test-gpt-oss
zhaixuejun1993 May 25, 2026
0d29a9c
Gemma4 initial npu support (#179)
cavusmustafa May 26, 2026
23b4ae2
ggml-openvino: add GGML_OPENVINO_ENABLE_CACHE env var to control deco…
zhaixuejun1993 May 26, 2026
7466c28
Merge pull request #185 from zhaixuejun1993/xuejun/enable_cache_model…
zhaixuejun1993 May 26, 2026
5f868d1
OpenVINO backend: disable debug log print
zhaixuejun1993 May 26, 2026
fff8cd7
Revert "Gemma4 initial npu support (#179)"
wine99 May 26, 2026
f84c065
Merge pull request #187 from zhaixuejun1993/xuejun/disable_debug_log
zhaixuejun1993 May 26, 2026
b2bcc3b
Update TBB discovery. Delegated to OpenVINOs own config.
ravi9 May 26, 2026
c933520
OpenVINO backend: GGML_OPENVINO_ENABLE_CACHE YES -> 1
zhaixuejun1993 May 27, 2026
5d51822
Merge pull request #191 from zhaixuejun1993/xuejun/cache-modify
zhaixuejun1993 May 27, 2026
9ae7e83
OpenVINO backend: 1) ensure unique node names for OpenVINO; 2) add or…
zhaixuejun1993 May 27, 2026
cec9de2
OpenVINO backend: enable fallback for openVINO to CPU backend
zhaixuejun1993 May 27, 2026
800a338
OpenVINO backend: fprintf -> GGML_LOG_INFO
zhaixuejun1993 May 27, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion ggml/include/ggml.h
Original file line number Diff line number Diff line change
Expand Up @@ -694,7 +694,9 @@ extern "C" {

void * extra; // extra things e.g. for ggml-cuda.cu

char padding[8];
char padding[16];
// add a struct ggml_tensor * named org_src, initialized to NULL, for keeping track of original source tensors in case of in-place operations
struct ggml_tensor * org_src;
};

static const size_t GGML_TENSOR_SIZE = sizeof(struct ggml_tensor);
Expand Down
23 changes: 23 additions & 0 deletions ggml/src/ggml-backend.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1242,6 +1242,28 @@ void ggml_backend_sched_split_graph(ggml_backend_sched_t sched, struct ggml_cgra
GGML_ASSERT(*cur_backend_id != -1);
}

// OpenVINO currently uses ggml tensor names as graph indices. Some models (e.g. gpt-oss and
// llama4) can contain duplicate ggml tensor names, so we append node ids here to keep names
// unique. This is a temporary workaround and will be further optimized away in the future.
{
bool has_openvino_backend = false;
for (int i = 0; i < sched->n_backends; i++) {
if (strcmp(ggml_backend_name(sched->backends[i]), "OPENVINO") == 0) {
has_openvino_backend = true;
break;
}
}

if (has_openvino_backend) {
for (int i = 0; i < graph->n_nodes; i++) {
struct ggml_tensor * node = graph->nodes[i];
char new_name[128];
snprintf(new_name, sizeof(new_name), "%s#%d", node->name, i);
ggml_format_name(node, "%s", new_name);
}
}
}

// pass 5: split graph, find tensors that need to be copied
{
int i_split = 0;
Expand Down Expand Up @@ -1360,6 +1382,7 @@ void ggml_backend_sched_split_graph(ggml_backend_sched_t sched, struct ggml_cgra
ggml_set_input(tensor_copy);
ggml_set_output(tensor_copy); // prevent ggml-alloc from overwriting the tensor
}
tensor_copy->org_src = src;
tensor_id_copy(src_id, cur_backend_id, c) = tensor_copy;
SET_CAUSE(tensor_copy, "4.cpy");
}
Expand Down
6 changes: 2 additions & 4 deletions ggml/src/ggml-openvino/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
find_package(OpenVINO REQUIRED)
find_package(OpenVINO REQUIRED COMPONENTS Runtime Threading)
find_package(OpenCL REQUIRED)

include("${OpenVINO_DIR}/../3rdparty/tbb/lib/cmake/TBB/TBBConfig.cmake")

file(GLOB_RECURSE GGML_HEADERS_OPENVINO "*.h" "*.hpp")
file(GLOB_RECURSE GGML_SOURCES_OPENVINO "*.cpp")

Expand All @@ -11,7 +9,7 @@ ggml_add_backend_library(ggml-openvino
${GGML_HEADERS_OPENVINO}
)

target_link_libraries(ggml-openvino PRIVATE openvino::runtime TBB::tbb OpenCL::OpenCL)
target_link_libraries(ggml-openvino PRIVATE openvino::runtime openvino::threading OpenCL::OpenCL)

if (GGML_OPENVINO)
if (CMAKE_SYSTEM_PROCESSOR STREQUAL "aarch64")
Expand Down
Loading
Loading