docs: add SGLang HiCache L3 observability note#2603
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds a new documentation page detailing SGLang HiCache + Mooncake L3 observability on an AI Studio A800 runtime, along with updating the index file to link to it. The reviewer pointed out a typo in the model name ('Qwen3-0.6B') and suggested correcting it to a valid model version.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| ## Scope | ||
|
|
||
| - Platform: Baidu AI Studio A800 runtime. | ||
| - Model: Qwen3-0.6B. |
There was a problem hiding this comment.
|
Thanks for putting this together and for documenting the experiment carefully. I think this is useful as a personal small-scale experiment and can provide some reference value. However, I’m not sure it is a good fit to merge into the official docs at this point. I’d suggest keeping this as an external note or discussion for now. |
|
|
||
| ## Scope | ||
|
|
||
| - Platform: Baidu AI Studio A800 runtime. |
There was a problem hiding this comment.
Could you make the example more broadly applicable for wider use?
| |----------|----------|---------------| | ||
| | [PD Disaggregation Performance](../sglang-benchmark-results-v1) | SGLang PD disaggregation with Mooncake Transfer Engine | 1P1D PD disaggregation achieves approximately **30% lower ITL** while maintaining comparable throughput against two regular instances. | | ||
| | [HiCache with Mooncake Backend Benchmark](../sglang-hicache-benchmark-results-v1) | SGLang HiCache using Mooncake Store as L3 storage | Mooncake-backed HiCache improves prefill performance in multi-turn workloads by maintaining higher KV cache hit rates as conversation rounds grow. | | ||
| | [AI Studio A800 L3 Observability](../sglang-hicache-l3-aistudio-observability) | SGLang HiCache with Mooncake Store on a single AI Studio A800 runtime | L3 write-back and one read-back case were observed, but Store reload did not beat no-store in this constrained setup. | |
There was a problem hiding this comment.
Should be L3 Observability?
Summary
Adds a conservative documentation note for the AI Studio A800 SGLang HiCache + Mooncake Store L3 experiment.
The note records:
exists_hit=3095andget_success=3095Validation
kvcache-ai/Mooncake:main