Skip to content

Misc. bug: Optimize GPU token gen perf when context depth is passed (llama-bench) #188

Description

@virajwad

Name and Version

llama-bench tool
e.g. PP0, TG128, D8192, NITER3

Operating systems

No response

Which llama.cpp modules do you know to be affected?

No response

Command line

Problem description & steps to reproduce

See token gen graphs showing D8192 perf

First Bad Commit

No response

Relevant log output

Logs

Metadata

Metadata

Projects

Status
In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions