ggml : process data in smaller chunks in CUDA ggml_top_k() implementation to reduce temporary buffers memory usage#24776

Open

fairydreaming wants to merge 4 commits into

ggml-org:masterfrom

fairydreaming:chunked-top-k

Commits on Jun 18, 2026

ggml : process data in smaller chunks in CUDA ggml_top_k() implementation to reduce temporary buffers memory usage
sszymczy
committed

Commits on Jun 19, 2026

Commits on Jun 25, 2026

ggml : use chunked processing in both CUDA CUB top-k and argsort implementations
sszymczy
committed