Skip to content

[Feature]Prefetch SSD-Only Objects to DRAM on Exist#2646

Draft
huangdong2022 wants to merge 2 commits into
kvcache-ai:mainfrom
huangdong2022:main_prefetch_pr
Draft

[Feature]Prefetch SSD-Only Objects to DRAM on Exist#2646
huangdong2022 wants to merge 2 commits into
kvcache-ai:mainfrom
huangdong2022:main_prefetch_pr

Conversation

@huangdong2022

@huangdong2022 huangdong2022 commented Jun 27, 2026

Copy link
Copy Markdown

Description

Implements SSD prefetch-on-exist for Mooncake Store (RFC #2213): when is_exist / batch_is_exist is called with ExistOptions.prefetch_to_memory=true, asynchronously promote SSD-only keys (LOCAL_DISK, no MEMORY) back to DRAM, so later get() can hit DRAM instead of SSD.

Core changes:

  • Dedicated prefetch RPC path (GetReplicaListForPrefetch, BatchGetReplicaListForPrefetch, RegisterPrefetchTask) — no lease/sketch/promotion-on-hit queue side effects.
  • Client triggerSsdPrefetch: chunked batch query (128 keys/chunk), pipelined register+promote, bounded prefetch_pool_ (4 threads), PrefetchThrottle (dedup TTL + DRAM-pressure cooldown).
  • Cross-node holder delegation via prefetch_offload_object RPC.
  • Get-side optional wait (ssd_get_wait_ms, default 10ms) with [GET-SRC] / [PREFETCH-OUTCOME] logging.
  • NotifyPromotionSuccess(from_prefetch=true) grants normal KV lease.
  • Bug fixes: tenant-scoped staging key in PrefetchKeys; BatchOffload commits local index before NotifyOffloadSuccess.

Python/C API: ExistOptions.prefetch_to_memory; setup() adds ssd_prefetch_* / ssd_get_wait_ms.

Related: RFC #2213, PR #2071. Validated with vLLM-Ascend KV pool (HBM/DRAM/SSD).

Module

  • Transfer Engine (mooncake-transfer-engine)
  • Mooncake Store (mooncake-store)
  • Mooncake EP (mooncake-ep)
  • Mooncake PG (mooncake-pg)
  • Integration (mooncake-integration)
  • P2P Store (mooncake-p2p-store)
  • Python Wheel (mooncake-wheel)
  • Common (mooncake-common)
  • Mooncake RL (mooncake-rl)
  • CI/CD
  • Docs
  • Other

Type of Change

  • Bug fix
  • New feature
  • Refactor
  • Breaking change
  • Documentation update
  • Performance improvement
  • Other

How Has This Been Tested?

Test commands:

# Mooncake Python integration test (requires running master + SSD offload env)
export MOONCAKE_OFFLOAD_FILE_STORAGE_PATH=/path/to/offload
export MOONCAKE_OFFLOAD_BUCKET_KEYS_LIMIT=10
export MOONCAKE_OFFLOAD_BUCKET_SIZE_LIMIT_BYTES=10485760
python -m unittest mooncake-wheel.tests.test_prefetch_on_exist.TestPrefetchOnExist -v

# Optional: cross-node case (opt-in)
# export MC_TEST_CROSS_NODE=1 NODE_A_HOSTNAME=... NODE_B_HOSTNAME=...

Manual integration (vLLM-Ascend + Mooncake master, SSD offload enabled):

  • End-to-end prefix-cache workload (80×32K): cold run → warm SSD → re-run; TTFT and Prefill improved on re-run.
  • GPQA accuracy run: no INVALID_KEY / get failures after B10 fix.

Test results:

  • Unit tests pass
  • Integration tests pass (if applicable)
  • Manual testing done (describe below)

Highlights:

  • test_prefetch_on_exist: is_exist / batch_is_exist with prefetch_to_memory=true promotes LOCAL_DISK-only keys to MEMORY; post-prefetch get does not hit SSD offload RPC path.
  • B10 fix: concurrent get INVALID_KEY eliminated under offload+prefetch load.
  • vLLM-Ascend re-run after SSD warm-up: TTFT −220ms (−3.2%), Prefill +162 t/s (+3.4%) on 80×32K workload.

Checklist

  • I have performed a self-review of my own code
  • I have formatted my code using ./scripts/code_format.sh
  • I have run pre-commit run --all-files and all hooks pass
  • I have updated the documentation (if applicable)
  • I have added tests to prove my changes are effective
  • For changes >500 LOC: I have filed an RFC issue

AI Assistance Disclosure

  • No AI tools were used
  • AI tools were used (specify below)

AI tools (Cursor/Claude) assisted with design doc, log analysis, test updates, and PR description drafting. All changes reviewed by the submitter.

Signed-off-by: h30027576 <huangdong51@huawei.com>
Signed-off-by: h30027576 <huangdong51@huawei.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the SSD Prefetch-on-Exist feature, enabling asynchronous, best-effort promotion of SSD-only keys to DRAM during the cache-probing phase to reduce TTFT. It introduces throttling mechanisms, bounded thread pools, get-side waiting, cross-node delegation, and corresponding Python bindings and tests. The review feedback identifies several critical improvement opportunities: a bug where failed prefetch task registration leaves throttle states stuck in a triggered state, a duplicate header include, a flaw in the throttle reservation logic that prematurely erases active entries, and a performance concern regarding high-frequency synchronous RPC polling of the master service during get-side waiting.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +2153 to +2159
auto register_result = client->RegisterPrefetchTask(local_keys[i]);
if (!register_result) {
VLOG(1) << "SSD prefetch: RegisterPrefetchTask failed for"
<< "key=" << local_keys[i]
<< ", error=" << register_result.error();
continue;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If RegisterPrefetchTask fails, the key is skipped and never passed to file_storage->PrefetchKeys. As a result, on_key_done is never invoked for this key, leaving its throttle state stuck in kTriggered until the TTL expires. This prevents subsequent prefetch attempts and causes any get-side waiting threads to block until they timeout.\n\nPlease invoke on_key_done(local_keys[i], false) before continuing the loop.

        auto register_result = client->RegisterPrefetchTask(local_keys[i]);
        if (!register_result) {
            VLOG(1) << "SSD prefetch: RegisterPrefetchTask failed for"
                    << "key=" << local_keys[i]
                    << ", error=" << register_result.error();
            if (on_key_done) {
                on_key_done(local_keys[i], false);
            }
            continue;
        }

#include "storage_backend.h"
#include "pinned_buffer_pool.h"

#include <functional>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The header is included twice in this file (at line 3 and line 10). Please remove the duplicate include at line 10.

Comment on lines +149 to +158
for (auto it = entries_.begin(); it != entries_.end();) {
const int64_t last_ms = it->second.completed_ms >= 0
? it->second.completed_ms
: it->second.trigger_ms;
if (now - last_ms > ttl_ms) {
it = entries_.erase(it);
} else {
++it;
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In PrefetchThrottle::reserve, entries are erased from entries_ solely based on the elapsed time since trigger_ms (if completed_ms is -1) exceeding ttl_ms.\n\nIf a prefetch task takes longer than ttl_ms to complete (or if ttl_ms is configured to be very short), an active entry in the kTriggered or kInFlight state can be prematurely erased. This would:\n1. Allow duplicate prefetch tasks to be triggered for the same key (defeating the deduplication logic).\n2. Cause any concurrent waitForCompletion calls for that key to fail immediately because the entry is no longer found in entries_.\n\nConsider only erasing entries that are in a terminal state (such as kCompleted, kFailed, or kAlreadyResident).

        for (auto it = entries_.begin(); it != entries_.end();) {
            const int64_t last_ms = it->second.completed_ms >= 0
                                        ? it->second.completed_ms
                                        : it->second.trigger_ms;
            const bool is_terminal = it->second.state == State::kCompleted ||
                                     it->second.state == State::kFailed ||
                                     it->second.state == State::kAlreadyResident;
            if (is_terminal && now - last_ms > ttl_ms) {
                it = entries_.erase(it);
            } else {
                ++it;
            }
        }

Comment on lines +5404 to +5420
} else {
prefetch_wait_mode = "master";
const int64_t deadline =
PrefetchThrottle::NowMs() + ssd_get_wait_ms_;
while (PrefetchThrottle::NowMs() < deadline) {
if (auto qr = TryRefreshBestMemoryReplica(
client_.get(), key, local_endpoints)) {
refreshed_qr.emplace(std::move(*qr));
best_replica = SelectBestReplica(refreshed_qr->replicas,
local_endpoints);
prefetch_done_ms = PrefetchThrottle::NowMs();
break;
}
std::this_thread::sleep_for(
std::chrono::milliseconds(kPollMs));
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In the master wait mode, the client polls the master via TryRefreshBestMemoryReplica (which performs a synchronous master RPC) every 1 ms (kPollMs = 1) up to ssd_get_wait_ms_ (default 10 ms) per key.\n\nSince this loop runs sequentially for each key in the batch, if a batch contains multiple SSD-only keys that are not yet promoted, this can lead to:\n1. A massive flood of synchronous RPC queries to the master, potentially overwhelming the master service under high concurrency.\n2. Significant accumulation of latency on the client side (e.g., 10 keys * 10 ms = 100 ms delay).\n\nConsider increasing the poll interval for the master query path (e.g., to 2-5 ms) or batching/limiting the master queries to avoid overwhelming the master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant