vulkan: use flops instead of weight tensor size for submission heuristic by 0cc4m · Pull Request #25005 · ggml-org/llama.cpp

0cc4m · 2026-06-25T13:02:22Z

Overview

My guess for the long-running issues with DeviceLost errors on AMD and Intel due to submission timeouts is that we currently only treat matmuls as special in the submission batching logic, by taking the combined size of their weight matrices into account. But Flash Attention and convolutions are also very heavy operations that should be considered here. This doesn't work in the same way, so in this PR I'm trying to use FLOPs instead of weight matrix size weights for submission estimation. That should be easier to expand to other operators that might come in in the future.

This also fixes a bug where previously uint64_t mul_mat_bytes_per_submit = std::min(uint64_t(100*1000*1000), ctx->last_total_mul_mat_bytes / 40u); would always submit each operator separately on the first run, because ctx->last_total_mul_mat_bytes would start out as 0, so always be smaller than 100 MiB.

I'm still trying to reproduce the DeviceLost error on one of my devices.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES, Claude was used for assistance, code was reviewed and tested by me.

0cc4m added 2 commits June 25, 2026 13:44

vulkan: extract flops calculation into function

492adff

use flops instead of matmul src0 tensor size for submission threshold

cb2a425

0cc4m requested a review from a team as a code owner June 25, 2026 13:02

use unsigned ints

bf05250

github-actions Bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Jun 25, 2026

jeffbolznv approved these changes Jun 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vulkan: use flops instead of weight tensor size for submission heuristic#25005

vulkan: use flops instead of weight tensor size for submission heuristic#25005
0cc4m wants to merge 3 commits into
masterfrom
0cc4m/vulkan-submission-threshold-flops

0cc4m commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

0cc4m commented Jun 25, 2026

Overview

Requirements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants