-
Notifications
You must be signed in to change notification settings - Fork 19.9k
Pull requests: ggml-org/llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
ggml : fix tensor-parallel + -ncmoe crash on MoE models
ggml
changes relating to the ggml tensor library for machine learning
#25028
opened Jun 26, 2026 by
liminfei-amd
Contributor
Loading…
1 task done
SYCL: add oneMKL GEMM flash attention for XMX-accelerated prompt proc…
ggml
changes relating to the ggml tensor library for machine learning
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
#25025
opened Jun 26, 2026 by
johnkarlhill
Loading…
models: fix Qwen3.5 dense/MoE load when MTP block is absent (trunk-only GGUF)
model
Model specific
#25024
opened Jun 26, 2026 by
rohithj7
Loading…
1 task done
2
ci : add windows-openvino to check-release
devops
improvements to build systems and github actions
#25022
opened Jun 25, 2026 by
CISC
Member
Loading…
mtmd: add more validations
mtmd
Related to multimodal functionality (video/image/audio)
#25013
opened Jun 25, 2026 by
ngxson
Collaborator
Loading…
vulkan: use flops instead of weight tensor size for submission heuristic
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
#25005
opened Jun 25, 2026 by
0cc4m
Contributor
Loading…
recurrent : support equal splits for recurrent-state rollback
model
Model specific
testing
Everything test related
#25004
opened Jun 25, 2026 by
arielgindi
Loading…
llama : error clearly when a non-causal model is used for generation
server
#24998
opened Jun 25, 2026 by
liminfei-amd
Contributor
Loading…
1 task done
gguf: add upper bound check for general.alignment (CWE-1284)
ggml
changes relating to the ggml tensor library for machine learning
#24997
opened Jun 25, 2026 by
hhy569
Loading…
cuda: sanitize invalid Blackwell sharedMemPerBlockOptin
CUDA
Related to the CUDA backend
ggml
changes relating to the ggml tensor library for machine learning
#24991
opened Jun 25, 2026 by
wgu9
Loading…
arg: detect console width dynamically for CLI help wrapping
#24989
opened Jun 24, 2026 by
tanishqtayade
Loading…
gguf : reject non-u32 general.alignment
ggml
changes relating to the ggml tensor library for machine learning
testing
Everything test related
#24988
opened Jun 24, 2026 by
Adel-Ayoub
Loading…
tests: add mixed quant KV FlashAttention cases
testing
Everything test related
#24981
opened Jun 24, 2026 by
ravel7524
Contributor
Loading…
openvino: Update to OV 2026.2.1, self-contained release packages, operator improvements
devops
improvements to build systems and github actions
documentation
Improvements or additions to documentation
ggml
changes relating to the ggml tensor library for machine learning
OpenVINO
#24974
opened Jun 24, 2026 by
ravi9
Contributor
Loading…
ggml: add Rockchip NPU (RKNPU2) backend for RK3588
build
Compilation issues
ggml
changes relating to the ggml tensor library for machine learning
vibe-coded
Created with heavy use of LLM assistants, requires human verification
#24972
opened Jun 24, 2026 by
alexinthesky
Loading…
2 of 5 tasks
vulkan: disable MMVQ on AMD UMA devices
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
#24966
opened Jun 24, 2026 by
winstonma
Contributor
Loading…
Improve Server OAI Responses API streaming compatibility
examples
server
#24957
opened Jun 23, 2026 by
boondocklabs
Contributor
Loading…
server : create context checkpoint on slot restore
examples
server
#24956
opened Jun 23, 2026 by
julio50
Loading…
bench: Fix misc. bug #24951 - Standard Deviation issues
examples
#24953
opened Jun 23, 2026 by
surfidaho
Loading…
refactor(server): move speculative init to speculative.cpp
examples
server
#24952
opened Jun 23, 2026 by
wadealexc
Loading…
cli : move to HTTP-based implementation
examples
server
#24948
opened Jun 23, 2026 by
ngxson
Collaborator
Loading…
cuda : prevent integer truncation and overflow errors when using KQ mask strides in flash_attn_mask_to_KV_max kernel
CUDA
Related to the CUDA backend
ggml
changes relating to the ggml tensor library for machine learning
#24945
opened Jun 23, 2026 by
fairydreaming
Collaborator
Loading…
server : disable embeddings/pooling on the speculative draft/MTP context
examples
server
#24942
opened Jun 23, 2026 by
liminfei-amd
Contributor
Loading…
1 task done
Previous Next
ProTip!
Exclude everything labeled
bug with -label:bug.