From ea718a0bb1ac709436e233c394be8e0976be7f4b Mon Sep 17 00:00:00 2001
From: Ke Yang <yangke@approaching.ai>
Date: Sun, 28 Jun 2026 03:16:23 +0000
Subject: [PATCH 1/9] shrink citation

---
 README.md | 31 +++++++++++++++++++------------
 1 file changed, 19 insertions(+), 12 deletions(-)
diff --git a/README.md b/README.md
index 981d9d2a54..d5138a2a01 100644
--- a/README.md
+++ b/README.md
@@ -324,6 +324,24 @@ The above presents two samples from our trace dataset. The trace includes the ti
 <h2 id="citation">📑 Citation</h2>
 Please kindly cite our papers if you find the papers or the traces are useful:
 
+```bibtex
+@inproceedings{qin2025mooncake,
+  author    = {Ruoyu Qin and Zheming Li and Weiran He and Jialei Cui and Feng Ren and Mingxing Zhang and Yongwei Wu and Weimin Zheng and Xinran Xu},
+  title     = {Mooncake: Trading More Storage for Less Computation {\textemdash} A {KVCache-centric} Architecture for Serving {LLM} Chatbot},
+  booktitle = {23rd USENIX Conference on File and Storage Technologies (FAST 25)},
+  year      = {2025},
+  isbn      = {978-1-939133-45-8},
+  address   = {Santa Clara, CA},
+  pages     = {155--170},
+  url       = {https://www.usenix.org/conference/fast25/presentation/qin},
+  publisher = {USENIX Association},
+  month     = {feb},
+}
+```
+
+<details>
+<summary>Highlights</summary>
+
 ```bibtex
 @misc{ren2026tentdeclarativeslicespraying,
   title     = {TENT: A Declarative Slice Spraying Engine for Performant and Resilient Data Movement in Disaggregated LLM Serving},
@@ -356,18 +374,7 @@ Please kindly cite our papers if you find the papers or the traces are useful:
   keywords  = {Machine learning system, LLM serving, KVCache},
 }
 
-@inproceedings{qin2025mooncake,
-  author    = {Ruoyu Qin and Zheming Li and Weiran He and Jialei Cui and Feng Ren and Mingxing Zhang and Yongwei Wu and Weimin Zheng and Xinran Xu},
-  title     = {Mooncake: Trading More Storage for Less Computation {\textemdash} A {KVCache-centric} Architecture for Serving {LLM} Chatbot},
-  booktitle = {23rd USENIX Conference on File and Storage Technologies (FAST 25)},
-  year      = {2025},
-  isbn      = {978-1-939133-45-8},
-  address   = {Santa Clara, CA},
-  pages     = {155--170},
-  url       = {https://www.usenix.org/conference/fast25/presentation/qin},
-  publisher = {USENIX Association},
-  month     = {feb},
-}
+
 
 @article{qin2024mooncake_arxiv,
   title  = {Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving},

From 40ea55f809cfd8dad8a738d6b5c1c26d9e06a7cd Mon Sep 17 00:00:00 2001
From: Ke Yang <yangke@approaching.ai>
Date: Sun, 28 Jun 2026 03:48:28 +0000
Subject: [PATCH 2/9] update the trace docs

---
 FAST25-release/README.md | 17 +++++++++++++++++
 README.md                | 26 +++++---------------------
 2 files changed, 22 insertions(+), 21 deletions(-)
 create mode 100644 FAST25-release/README.md

diff --git a/FAST25-release/README.md b/FAST25-release/README.md
new file mode 100644
index 0000000000..7ef9b3ba93
--- /dev/null
+++ b/FAST25-release/README.md
@@ -0,0 +1,17 @@
+```json
+{
+    "timestamp": 27482,
+    "input_length": 6955,
+    "output_length": 52,
+    "hash_ids": [46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 2353, 2354]
+}
+{
+    "timestamp": 30535,
+    "input_length": 6472,
+    "output_length": 26,
+    "hash_ids": [46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 2366]
+}
+```
+The above presents two samples from our trace dataset. The trace includes the timing of request arrivals, the number of input tokens, the number of output tokens, and the remapped block hash. To protect our customers' privacy, we applied several mechanisms to remove user-related information while preserving the dataset's utility for simulated evaluation. More descriptions of the trace (e.g., up to 50% cache hit ratio) can be found in Section 4 of the technical report.
+
+**_Update[Feb 21, 2025]: The updated [traces](traces) used in our FAST'25 paper have been released! Please refer to the paper's appendix (found [here](Mooncake-FAST25.pdf)) for more details._**
\ No newline at end of file
diff --git a/README.md b/README.md
index d5138a2a01..1eebbc1681 100644
--- a/README.md
+++ b/README.md
@@ -301,25 +301,11 @@ Install them without cloning the repository via the [Claude Code plugin marketpl
 
 The `--sparse .claude-plugin` flag fetches only the marketplace catalog, and each plugin is published as a `git-subdir` source, so installing one fetches only that single skill directory — never the whole repo. If you are already working inside a Mooncake checkout, the skills under `.claude/skills/` load automatically with no setup.
 
-<h2 id="trace">📦 Open Source Trace</h2>
-
-```json
-{
-    "timestamp": 27482,
-    "input_length": 6955,
-    "output_length": 52,
-    "hash_ids": [46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 2353, 2354]
-}
-{
-    "timestamp": 30535,
-    "input_length": 6472,
-    "output_length": 26,
-    "hash_ids": [46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 2366]
-}
-```
-The above presents two samples from our trace dataset. The trace includes the timing of request arrivals, the number of input tokens, the number of output tokens, and the remapped block hash. To protect our customers' privacy, we applied several mechanisms to remove user-related information while preserving the dataset's utility for simulated evaluation. More descriptions of the trace (e.g., up to 50% cache hit ratio) can be found in Section 4 of the technical report.
+<h2 id="trace">📦 Open Source Traces and Tools </h2>
+
+We open-source anonymized request traces containing request arrival times, input and output token counts, and remapped block hashes. These traces are designed to support reproducible simulation and evaluation of caching behavior while preserving user privacy. The released traces and related details are available in [FAST25-release](FAST25-release).
 
-**_Update[Feb 21, 2025]: The updated [traces](FAST25-release/traces) used in our FAST'25 paper have been released! Please refer to the paper's appendix (found [here](FAST25-release/Mooncake-FAST25.pdf)) for more details._**
+Together with the released traces, we also provide two KV cache analysis tools: a [KV Cache Size Calculator](https://kvcache.ai/tools/kv-cache-size-calculator/) for calculating cache capacity across popular LLM model families, and a [KV Cache Hit Rate Simulator](https://kvcache.ai/tools/kv-cache-hit-rate-simulator/) for analyzing KV cache hit rates and planning cache capacity under different workloads and models. These tools help users better understand KV cache storage costs and caching effectiveness when analyzing or reproducing serving workloads. The tools are open-sourced at [here](https://github.com/kvcache-ai/kvcache-blog).
 
 <h2 id="citation">📑 Citation</h2>
 Please kindly cite our papers if you find the papers or the traces are useful:
@@ -340,7 +326,7 @@ Please kindly cite our papers if you find the papers or the traces are useful:
 ```
 
 <details>
-<summary>Highlights</summary>
+<summary>More</summary>
 
 ```bibtex
 @misc{ren2026tentdeclarativeslicespraying,
@@ -374,8 +360,6 @@ Please kindly cite our papers if you find the papers or the traces are useful:
   keywords  = {Machine learning system, LLM serving, KVCache},
 }
 
-
-
 @article{qin2024mooncake_arxiv,
   title  = {Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving},
   author = {Ruoyu Qin and Zheming Li and Weiran He and Mingxing Zhang and Yongwei Wu and Weimin Zheng and Xinran Xu},

From 478511a054e7fcfad4aebdd3a54679719c29fc81 Mon Sep 17 00:00:00 2001
From: Ke Yang <yangke@approaching.ai>
Date: Sun, 28 Jun 2026 03:57:01 +0000
Subject: [PATCH 3/9] shrink skill part

Co-authored-by: Cursor <cursoragent@cursor.com>
---
 README.md | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 1eebbc1681..5e0b3986ef 100644
--- a/README.md
+++ b/README.md
@@ -280,9 +280,12 @@ sudo make install # optional, make it ready to be used by vLLM/SGLang
 
 For custom accelerator backends, Docker deployment, NVMe-oF, EFA, CXL, Redis / HTTP metadata, Rust bindings, or other advanced build options, see the [Build Guide](https://kvcache-ai.github.io/Mooncake/getting_started/build.html).
 
-### Skills for AI coding assistants
+### Skills for AI Assistants
 
-Mooncake ships a set of **built-in skills** under [`.claude/skills`](.claude/skills) — reusable, task-focused playbooks that an AI coding assistant (such as Claude Code) invokes automatically when your request matches, or that you can run as a slash command:
+Mooncake ships a set of **built-in skills** under [`.claude/skills`](.claude/skills) — reusable, task-focused playbooks that an AI coding assistant (such as Claude Code) invokes automatically when your request matches, or that you can run as a slash command.
+
+<details>
+<summary>More</summary>
 
 | Skill | Description |
 |-------|-------------|
@@ -301,6 +304,8 @@ Install them without cloning the repository via the [Claude Code plugin marketpl
 
 The `--sparse .claude-plugin` flag fetches only the marketplace catalog, and each plugin is published as a `git-subdir` source, so installing one fetches only that single skill directory — never the whole repo. If you are already working inside a Mooncake checkout, the skills under `.claude/skills/` load automatically with no setup.
 
+</details>
+
 <h2 id="trace">📦 Open Source Traces and Tools </h2>
 
 We open-source anonymized request traces containing request arrival times, input and output token counts, and remapped block hashes. These traces are designed to support reproducible simulation and evaluation of caching behavior while preserving user privacy. The released traces and related details are available in [FAST25-release](FAST25-release).
@@ -367,3 +372,5 @@ Please kindly cite our papers if you find the papers or the traces are useful:
   url    = {https://arxiv.org/abs/2407.00079},
 }
 ```
+
+</details>

From 949ffa3bf274117cba55663cc7932a85bc2d1e89 Mon Sep 17 00:00:00 2001
From: Ke Yang <yangke@approaching.ai>
Date: Sun, 28 Jun 2026 03:58:29 +0000
Subject: [PATCH 4/9] minor update

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 5e0b3986ef..a3d8596268 100644
--- a/README.md
+++ b/README.md
@@ -285,7 +285,7 @@ For custom accelerator backends, Docker deployment, NVMe-oF, EFA, CXL, Redis / H
 Mooncake ships a set of **built-in skills** under [`.claude/skills`](.claude/skills) — reusable, task-focused playbooks that an AI coding assistant (such as Claude Code) invokes automatically when your request matches, or that you can run as a slash command.
 
 <details>
-<summary>More</summary>
+<summary>Details</summary>
 
 | Skill | Description |
 |-------|-------------|

From 699dd5223217d876555e81df105165fb68dbdee2 Mon Sep 17 00:00:00 2001
From: Ke Yang <yangke@approaching.ai>
Date: Sun, 28 Jun 2026 05:04:24 +0000
Subject: [PATCH 5/9] Shrink quick start

---
 README.md                                  | 55 +++-------------------
 docs/source/getting_started/quick-start.md | 35 +++++++++++---
 2 files changed, 35 insertions(+), 55 deletions(-)

diff --git a/README.md b/README.md
index a3d8596268..ca664d722b 100644
--- a/README.md
+++ b/README.md
@@ -217,24 +217,9 @@ The following hardware partners and cloud platforms are supported by the Mooncak
 
 For complete protocol behavior, SDK requirements, and vendor-specific configuration, see the [supported protocols](https://kvcache-ai.github.io/Mooncake/getting_started/supported-protocols.html), [build guide](https://kvcache-ai.github.io/Mooncake/getting_started/build.html), and [Transfer Engine design docs](https://kvcache-ai.github.io/Mooncake/design/transfer-engine/index.html).
 
-<h2 id="quick-start">🚀 Quick Start</h2>
+<h2 id="quick-start">🚀 Getting Started</h2>
 
-### Before using Mooncake
-
-Mooncake is designed and optimized for high-speed RDMA networks. Though Mooncake supports TCP-only data transfer, we **strongly** recommend users to evaluate the functionality and performance of Mooncake with RDMA network support.
-
-The following need to be installed before running any component of Mooncake:
-- RDMA Driver & SDK, such as Mellanox OFED.
-- Python 3.10, virtual environment is recommended.
-- CUDA 12.1 and above, including NVIDIA GPUDirect Storage Support, if the package is built with `-DUSE_CUDA` (disabled by default). *You may install them from [here](https://developer.nvidia.com/cuda-downloads)*.
-- Cambricon Neuware, if the package is built with `-DUSE_MLU`. By default Mooncake looks for Neuware under `NEUWARE_HOME` or `/usr/local/neuware`.
-- Hygon DTK SDK, if the package is built with `-DUSE_HYGON`. By default Mooncake looks for DTK under `DTK_HOME` or `/opt/dtk`.
-- Iluvatar CoreX SDK, if the package is built with `-DUSE_COREX`. By default Mooncake looks for CoreX under `COREX_HOME` or `/usr/local/corex`.
-
-### Use Python package
-The simplest way to use Mooncake Transfer Engine is using `pip`:
-
-**For CUDA-enabled systems:**
+Install Mooncake using `pip`. The `mooncake-transfer-engine` package includes Mooncake Transfer Engine, Mooncake Store, Mooncake EP and PG:
 
 - CUDA < 13.0
 ```bash
@@ -245,40 +230,12 @@ pip install mooncake-transfer-engine
 pip install mooncake-transfer-engine-cuda13
 ```
 
-**For non-CUDA systems:**
-```bash
-pip install mooncake-transfer-engine-non-cuda
-```
-
-**For NPU systems:**
-```bash
-pip install mooncake-transfer-engine-npu
-```
-
-> [!IMPORTANT]
-> - The CUDA version (`mooncake-transfer-engine`) includes Mooncake-EP and GPU topology detection, requiring CUDA 12.1+.
-> - The non-CUDA version (`mooncake-transfer-engine-non-cuda`) is for environments without CUDA dependencies, but it still needs system runtime libraries such as `libcurl4`, `libibverbs1`, `rdma-core`, `librdmacm1`, `libnuma1`, and `liburing2` on Ubuntu. In a fresh environment, run `sudo apt-get update` before installing them.
-> - MLU support is currently available through source builds with `-DUSE_MLU=ON`; there is no dedicated prebuilt MLU wheel yet.
-> - If users encounter problems such as missing `lib*.so`, first install the corresponding system runtime libraries. If the issue persists, uninstall the package and build the binaries manually.
-
-### Build From Source
-
-For the default source build, use the automatic dependency script and standard CMake flow:
+In addition to CUDA, Mooncake also supports other accelerator backends, along with flexible installation and deployment options. See the guides below for details:
 
-```bash
-git clone https://github.com/kvcache-ai/Mooncake.git
-cd Mooncake
-
-sudo bash dependencies.sh
-
-mkdir build
-cd build
-cmake ..
-make -j
-sudo make install # optional, make it ready to be used by vLLM/SGLang
-```
+- [Quick Start](https://kvcache-ai.github.io/Mooncake/getting_started/quick-start.html)
+- [Build from Source](https://kvcache-ai.github.io/Mooncake/getting_started/build.html)
+- [Deployment Guide](https://kvcache-ai.github.io/Mooncake/deployment/mooncake-store-deployment-guide.html)
 
-For custom accelerator backends, Docker deployment, NVMe-oF, EFA, CXL, Redis / HTTP metadata, Rust bindings, or other advanced build options, see the [Build Guide](https://kvcache-ai.github.io/Mooncake/getting_started/build.html).
 
 ### Skills for AI Assistants
 
diff --git a/docs/source/getting_started/quick-start.md b/docs/source/getting_started/quick-start.md
index cfed298dd8..ac5724d342 100644
--- a/docs/source/getting_started/quick-start.md
+++ b/docs/source/getting_started/quick-start.md
@@ -2,27 +2,50 @@
 
 This document describes how to quickly start using Mooncake Transfer Engine and Mooncake Store.
 
+## Before using Mooncake
+
+The following need to be installed before running any component of Mooncake:
+- RDMA Driver & SDK, such as Mellanox OFED.
+- Python 3.10, virtual environment is recommended.
+- CUDA 12.1 and above, including NVIDIA GPUDirect Storage Support, if the package is built with `-DUSE_CUDA` (disabled by default). *You may install them from [here](https://developer.nvidia.com/cuda-downloads)*.
+- Cambricon Neuware, if the package is built with `-DUSE_MLU`. By default Mooncake looks for Neuware under `NEUWARE_HOME` or `/usr/local/neuware`.
+- Hygon DTK SDK, if the package is built with `-DUSE_HYGON`. By default Mooncake looks for DTK under `DTK_HOME` or `/opt/dtk`.
+- Iluvatar CoreX SDK, if the package is built with `-DUSE_COREX`. By default Mooncake looks for CoreX under `COREX_HOME` or `/usr/local/corex`.
+
 ## Installation
 
 Install the Mooncake Transfer Engine package from PyPI, which includes both Mooncake Transfer Engine and Mooncake Store Python bindings:
 
 **For CUDA-enabled systems:**
+
+- CUDA < 13.0
 ```bash
 pip install mooncake-transfer-engine numpy pyzmq
 ```
-📦 **Package Details**: [https://pypi.org/project/mooncake-transfer-engine/](https://pypi.org/project/mooncake-transfer-engine/)
+
+- CUDA >= 13.0
+```bash
+pip install mooncake-transfer-engine-cuda13 numpy pyzmq
+```
 
 **For non-CUDA systems:**
 ```bash
 pip install mooncake-transfer-engine-non-cuda numpy pyzmq
 ```
 
-📦 **Package Details**: [https://pypi.org/project/mooncake-transfer-engine-non-cuda/](https://pypi.org/project/mooncake-transfer-engine-non-cuda/)
+**For NPU systems:**
+```bash
+pip install mooncake-transfer-engine-npu
+```
 
-> **Note**: The CUDA version includes Mooncake-EP and GPU topology detection, requiring CUDA 12.1+. The non-CUDA version is for environments without CUDA dependencies, but it still requires the system runtime libraries used by the transfer stack. On Ubuntu, install them with:
-> ```bash
-> sudo apt-get update && sudo apt-get install -y libcurl4 libibverbs1 rdma-core librdmacm1 libnuma1 liburing2
-> ```
+> **Important**:
+> - The CUDA version (`mooncake-transfer-engine`) includes Mooncake-EP and GPU topology detection, requiring CUDA 12.1+.
+> - The non-CUDA version (`mooncake-transfer-engine-non-cuda`) is for environments without CUDA dependencies, but it still needs system runtime libraries such as `libcurl4`, `libibverbs1`, `rdma-core`, `librdmacm1`, `libnuma1`, and `liburing2` on Ubuntu. In a fresh environment, run `sudo apt-get update` before installing them:
+>   ```bash
+>   sudo apt-get update && sudo apt-get install -y libcurl4 libibverbs1 rdma-core librdmacm1 libnuma1 liburing2
+>   ```
+> - MLU support is currently available through source builds with `-DUSE_MLU=ON`; there is no dedicated prebuilt MLU wheel yet.
+> - If users encounter problems such as missing `lib*.so`, first install the corresponding system runtime libraries. If the issue persists, uninstall the package and build the binaries manually.
 
 ## Transfer Engine Quick Start
 

From f9a01e76c160095e021f0c7fb6b7bd8fb157aaa8 Mon Sep 17 00:00:00 2001
From: Ke Yang <yangke@approaching.ai>
Date: Sun, 28 Jun 2026 05:54:13 +0000
Subject: [PATCH 6/9] Shrink overview

---
 README.md | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/README.md b/README.md
index ca664d722b..2a5a02ba78 100644
--- a/README.md
+++ b/README.md
@@ -27,8 +27,7 @@
 <br/>
 
 Mooncake is the serving platform for  <a href="https://kimi.ai/"><img src="image/kimi.png" alt="icon" style="height: 16px; vertical-align: middle;"> Kimi</a>, a leading LLM service provided by <a href="https://www.moonshot.cn/"><img src="image/moonshot.jpg" alt="icon" style="height: 16px; vertical-align: middle;"> Moonshot AI</a>.
-Now both the Transfer Engine and Mooncake Store are open-sourced!
-This repository also hosts its technical report and the open-sourced traces.
+Under real workloads, Mooncake’s innovative architecture enables Kimi to handle 75% more requests while adhering to SLOs.
 
 <h2 id="updates">🔄 Updates</h2>
 
@@ -74,19 +73,17 @@ This repository also hosts its technical report and the open-sourced traces.
 
 <h2 id="overview">🎉 Overview</h2>
 
-Mooncake features a KVCache-centric disaggregated architecture that separates the prefill and decoding clusters. It also leverages the underutilized CPU, DRAM, and SSD resources of the GPU cluster to implement a disaggregated KVCache pool.
-
-![architecture](image/architecture.png)
-
-The core of Mooncake is its KVCache-centric scheduler, which balances maximizing overall effective throughput while meeting latency-related Service Level Objectives (SLOs). Unlike traditional studies that assume all requests will be processed, Mooncake faces challenges in highly overloaded scenarios. To mitigate these, we developed a prediction-based early rejection policy. Experiments show that Mooncake excels in long-context scenarios. Compared to the baseline method, Mooncake can achieve up to a 525% increase in throughput in certain simulated scenarios while adhering to SLOs. Under real workloads, Mooncake’s innovative architecture enables <a href="https://kimi.ai/">Kimi</a> to handle 75% more requests.
-
-<h2 id="show-cases">🔥 Show Cases</h2>
-
 <!-- ![components](image/components.png) -->
 <div align="center">
   <img src=image/components.png width=74% />
 </div>
 
+Mooncake is an infrastructure project for large-scale LLM inference and training. It features a KV cache-centric disaggregated architecture that separates prefill and decode clusters, while leveraging otherwise underutilized CPU, DRAM, and SSD resources in GPU clusters to build a disaggregated KV cache pool.
+
+Mooncake includes a high-performance Transfer Engine for low-latency data movement across heterogeneous networks and accelerators; Mooncake Store for distributed KV cache and model-weight management; and Mooncake EP & PG for elastic MoE serving. Deeply integrated with ecosystems such as SGLang and vLLM, Mooncake helps LLM systems improve cache reuse, reduce serving latency, and scale efficiently across multi-node clusters.
+
+<h2 id="show-cases">🔥 Show Cases</h2>
+
 ### Transfer Engine (TE)
 
 The core of Mooncake is the Transfer Engine (TE), a high-performance data transfer framework. TE offers a unified interface for batched data movement across diverse storage, network, and accelerator environments. By supporting multiple transport protocols, topology-aware routing, multi-NIC bandwidth aggregation, and automatic failover, TE delivers low-latency, scalable, and robust data transmission for distributed AI workloads. See the [Transfer Engine guide](https://kvcache-ai.github.io/Mooncake/design/transfer-engine/index.html) for details.

From 0c93d8bbcadc2c1dd21a2030a3910c7bbeb59cc4 Mon Sep 17 00:00:00 2001
From: ykwd <oneday117@qq.com>
Date: Sun, 28 Jun 2026 14:14:21 +0800
Subject: [PATCH 7/9] Update README.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 2a5a02ba78..db26a12dad 100644
--- a/README.md
+++ b/README.md
@@ -264,7 +264,7 @@ The `--sparse .claude-plugin` flag fetches only the marketplace catalog, and eac
 
 We open-source anonymized request traces containing request arrival times, input and output token counts, and remapped block hashes. These traces are designed to support reproducible simulation and evaluation of caching behavior while preserving user privacy. The released traces and related details are available in [FAST25-release](FAST25-release).
 
-Together with the released traces, we also provide two KV cache analysis tools: a [KV Cache Size Calculator](https://kvcache.ai/tools/kv-cache-size-calculator/) for calculating cache capacity across popular LLM model families, and a [KV Cache Hit Rate Simulator](https://kvcache.ai/tools/kv-cache-hit-rate-simulator/) for analyzing KV cache hit rates and planning cache capacity under different workloads and models. These tools help users better understand KV cache storage costs and caching effectiveness when analyzing or reproducing serving workloads. The tools are open-sourced at [here](https://github.com/kvcache-ai/kvcache-blog).
+Together with the released traces, we also provide two KV cache analysis tools: a [KV Cache Size Calculator](https://kvcache.ai/tools/kv-cache-size-calculator/) for calculating cache capacity across popular LLM model families, and a [KV Cache Hit Rate Simulator](https://kvcache.ai/tools/kv-cache-hit-rate-simulator/) for analyzing KV cache hit rates and planning cache capacity under different workloads and models. These tools help users better understand KV cache storage costs and caching effectiveness when analyzing or reproducing serving workloads. The tools are open-sourced [here](https://github.com/kvcache-ai/kvcache-blog).
 
 <h2 id="citation">📑 Citation</h2>
 Please kindly cite our papers if you find the papers or the traces are useful:

From cfa5584f3efb9a9e88ab0f711504d5d0560c2d5d Mon Sep 17 00:00:00 2001
From: ykwd <oneday117@qq.com>
Date: Sun, 28 Jun 2026 14:14:33 +0800
Subject: [PATCH 8/9] Update FAST25-release/README.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
---
 FAST25-release/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/FAST25-release/README.md b/FAST25-release/README.md
index 7ef9b3ba93..395186de64 100644
--- a/FAST25-release/README.md
+++ b/FAST25-release/README.md
@@ -14,4 +14,4 @@
 ```
 The above presents two samples from our trace dataset. The trace includes the timing of request arrivals, the number of input tokens, the number of output tokens, and the remapped block hash. To protect our customers' privacy, we applied several mechanisms to remove user-related information while preserving the dataset's utility for simulated evaluation. More descriptions of the trace (e.g., up to 50% cache hit ratio) can be found in Section 4 of the technical report.
 
-**_Update[Feb 21, 2025]: The updated [traces](traces) used in our FAST'25 paper have been released! Please refer to the paper's appendix (found [here](Mooncake-FAST25.pdf)) for more details._**
\ No newline at end of file
+**_Update [Feb 21, 2025]: The updated [traces](./traces) used in our FAST'25 paper have been released! Please refer to the paper's appendix (found [here](Mooncake-FAST25.pdf)) for more details._**
\ No newline at end of file

From 9d470edc9480b9365b0960cbb856e9b9603617bf Mon Sep 17 00:00:00 2001
From: Ke Yang <yangke@approaching.ai>
Date: Sun, 28 Jun 2026 09:02:00 +0000
Subject: [PATCH 9/9] shrink supported hardware

---
 README.md | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/README.md b/README.md
index db26a12dad..ccc04ceb9e 100644
--- a/README.md
+++ b/README.md
@@ -190,9 +190,7 @@ Mooncake integrates with [vLLM](https://github.com/vllm-project/vllm) to acceler
 
 <h2 id="supported-hardware">🖥️ Supported Hardware</h2>
 
-Mooncake supports hardware backends across accelerator vendors, cloud fabrics, and standard datacenter interconnects.
-
-The following hardware partners and cloud platforms are supported by the Mooncake, covering GPUs, specialized AI accelerators, and cloud-native interconnects:
+Mooncake supports hardware backends across accelerator vendors, cloud fabrics, and standard datacenter interconnects, as listed below. See the [supported protocols](https://kvcache-ai.github.io/Mooncake/getting_started/supported-protocols.html) and [Transfer Engine design docs](https://kvcache-ai.github.io/Mooncake/design/transfer-engine/index.html) for details.
 
 <div align="center">
   <table>
@@ -212,8 +210,6 @@ The following hardware partners and cloud platforms are supported by the Mooncak
   </table>
 </div>
 
-For complete protocol behavior, SDK requirements, and vendor-specific configuration, see the [supported protocols](https://kvcache-ai.github.io/Mooncake/getting_started/supported-protocols.html), [build guide](https://kvcache-ai.github.io/Mooncake/getting_started/build.html), and [Transfer Engine design docs](https://kvcache-ai.github.io/Mooncake/design/transfer-engine/index.html).
-
 <h2 id="quick-start">🚀 Getting Started</h2>
 
 Install Mooncake using `pip`. The `mooncake-transfer-engine` package includes Mooncake Transfer Engine, Mooncake Store, Mooncake EP and PG: