perf: add llm decode metadata update fast path. by RobbieLeung · Pull Request #1294 · jd-opensource/xllm

RobbieLeung · 2026-04-16T03:40:07Z

add a decode-only fused metadata update kernel for ordinary LLM CUDA graph execution
reuse persistent kv seq len delta buffers and keep block_tables on the legacy copy path
add decode fast-path coverage and fallback equivalence tests

gemini-code-assist

Code Review

This PR implements a CUDA-based fast path for updating LLM decode metadata, replacing standard copy operations to optimize persistent buffer management in the CUDA graph executor. Feedback primarily addresses style guide compliance, including renaming functions to snake_case, replacing magic numbers with named constants, annotating constant arguments, and removing an unused include.

zhang-minchao · 2026-04-16T03:46:59Z

加一个启用llm_decode_metadata_update.cu前后的timeline对比吧

- add a decode-only fused metadata update kernel for ordinary LLM CUDA graph execution - reuse persistent kv seq len delta buffers and keep block_tables on the legacy copy path - add decode fast-path coverage and fallback equivalence tests

RobbieLeung · 2026-04-16T06:21:22Z

加一个启用llm_decode_metadata_update.cu前后的timeline对比吧

让AI写了个benchmark，加速了2-2.6x左右

zhang-minchao · 2026-04-16T08:04:29Z

加一个启用llm_decode_metadata_update.cu前后的timeline对比吧

让AI写了个beachmark，加速了2-2.6x左右

我的意思可以贴上GPU profiling timeline 的图示，展示效果更清晰

RobbieLeung requested review from DongheJin, JimHsiung, XuZhang99, liutongxuan, walsonyang and yq33victor as code owners April 16, 2026 03:40

gemini-code-assist Bot reviewed Apr 16, 2026

View reviewed changes

zhang-minchao reviewed Apr 16, 2026

View reviewed changes

Comment thread xllm/core/runtime/cuda_graph_executor_impl.cpp Outdated

RobbieLeung force-pushed the feat/fast_graph branch from 6e86fe3 to c0b88d1 Compare April 16, 2026 06:24

zhang-minchao approved these changes Apr 18, 2026

View reviewed changes

liutongxuan approved these changes Apr 23, 2026

View reviewed changes

RobbieLeung merged commit be14834 into jd-opensource:main Apr 28, 2026
31 of 41 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: add llm decode metadata update fast path.#1294

perf: add llm decode metadata update fast path.#1294
RobbieLeung merged 1 commit intojd-opensource:mainfrom
RobbieLeung:feat/fast_graph

RobbieLeung commented Apr 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhang-minchao commented Apr 16, 2026

Uh oh!

Uh oh!

RobbieLeung commented Apr 16, 2026 •

edited

Loading

Uh oh!

zhang-minchao commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

RobbieLeung commented Apr 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhang-minchao commented Apr 16, 2026

Uh oh!

Uh oh!

RobbieLeung commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhang-minchao commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RobbieLeung commented Apr 16, 2026 •

edited

Loading