Skip to content

docs: add SGLang HiCache L3 observability note#2603

Open
xzh25 wants to merge 1 commit into
kvcache-ai:mainfrom
xzh25:codex/sglang-hicache-l3-observability
Open

docs: add SGLang HiCache L3 observability note#2603
xzh25 wants to merge 1 commit into
kvcache-ai:mainfrom
xzh25:codex/sglang-hicache-l3-observability

Conversation

@xzh25

@xzh25 xzh25 commented Jun 24, 2026

Copy link
Copy Markdown

Summary

Adds a conservative documentation note for the AI Studio A800 SGLang HiCache + Mooncake Store L3 experiment.

The note records:

  • the tested platform/model/workload scope
  • one verified L3 read-back case with exists_hit=3095 and get_success=3095
  • prior Store write-back counters
  • the negative performance result, where Store reload did not beat no-store in this constrained setup
  • follow-up work needed before claiming a production optimization

Validation

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new documentation page detailing SGLang HiCache + Mooncake L3 observability on an AI Studio A800 runtime, along with updating the index file to link to it. The reviewer pointed out a typo in the model name ('Qwen3-0.6B') and suggested correcting it to a valid model version.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

## Scope

- Platform: Baidu AI Studio A800 runtime.
- Model: Qwen3-0.6B.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The model name Qwen3-0.6B appears to be a typo, as there is no Qwen3 model series currently released. Please correct this to the actual model used in the experiment, such as Qwen1.5-0.6B or Qwen2-0.5B.

Suggested change
- Model: Qwen3-0.6B.
- Model: Qwen1.5-0.6B.

@github-actions github-actions Bot added documentation Improvements or additions to documentation run-ci labels Jun 24, 2026
@ykwd

ykwd commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Thanks for putting this together and for documenting the experiment carefully. I think this is useful as a personal small-scale experiment and can provide some reference value. However, I’m not sure it is a good fit to merge into the official docs at this point. I’d suggest keeping this as an external note or discussion for now.


## Scope

- Platform: Baidu AI Studio A800 runtime.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make the example more broadly applicable for wider use?

|----------|----------|---------------|
| [PD Disaggregation Performance](../sglang-benchmark-results-v1) | SGLang PD disaggregation with Mooncake Transfer Engine | 1P1D PD disaggregation achieves approximately **30% lower ITL** while maintaining comparable throughput against two regular instances. |
| [HiCache with Mooncake Backend Benchmark](../sglang-hicache-benchmark-results-v1) | SGLang HiCache using Mooncake Store as L3 storage | Mooncake-backed HiCache improves prefill performance in multi-turn workloads by maintaining higher KV cache hit rates as conversation rounds grow. |
| [AI Studio A800 L3 Observability](../sglang-hicache-l3-aistudio-observability) | SGLang HiCache with Mooncake Store on a single AI Studio A800 runtime | L3 write-back and one read-back case were observed, but Store reload did not beat no-store in this constrained setup. |

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be L3 Observability?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants