Skip to content

ggml : process data in smaller chunks in CUDA ggml_top_k() implementation to reduce temporary buffers memory usage#24776

Open
fairydreaming wants to merge 4 commits into
ggml-org:masterfrom
fairydreaming:chunked-top-k
Open

ggml : process data in smaller chunks in CUDA ggml_top_k() implementation to reduce temporary buffers memory usage#24776
fairydreaming wants to merge 4 commits into
ggml-org:masterfrom
fairydreaming:chunked-top-k

Commits

Commits on Jun 19, 2026