Skip to content

server : create context checkpoint on slot restore#24956

Open
julio50 wants to merge 1 commit into
ggml-org:masterfrom
julio50:server-checkpoint-on-slot-restore
Open

server : create context checkpoint on slot restore#24956
julio50 wants to merge 1 commit into
ggml-org:masterfrom
julio50:server-checkpoint-on-slot-restore

Conversation

@julio50

@julio50 julio50 commented Jun 23, 2026

Copy link
Copy Markdown

A restored slot had no context checkpoint, so the next request with cache_prompt found no reuse anchor and reprocessed the entire restored prefix (cache_n=0), defeating the purpose of /slots restore. Create a checkpoint spanning the restored span so the restored KV is reused.

Tested on a transformer model (gemma-4-12B, 15924-token prefix): a restore followed by an identical prompt went from prompt_n=15924/15924 (full reprocess) to prompt_n=1/15924 (reused).

Overview

Additional information

Requirements

A restored slot had no context checkpoint, so the next request with
cache_prompt found no reuse anchor and reprocessed the entire restored
prefix (cache_n=0), defeating the purpose of /slots restore. Create a
checkpoint spanning the restored span so the restored KV is reused.

Tested on a transformer model (gemma-4-12B, 15924-token prefix): a
restore followed by an identical prompt went from prompt_n=15924/15924
(full reprocess) to prompt_n=1/15924 (reused).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@julio50 julio50 requested a review from a team as a code owner June 23, 2026 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant