cuda : prevent integer truncation and overflow errors when using KQ mask strides in flash_attn_mask_to_KV_max kernel by fairydreaming · Pull Request #24945 · ggml-org/llama.cpp