perf(lexical/search): wire the decoded posting-list cache by mosuka · Pull Request #770 · mosuka/laurus

mosuka · 2026-06-02T13:19:49Z

Closes #612 (audit task LS-35, umbrella #534).

Problem

CacheManager::posting_cache was declared #[allow(dead_code)] and the enable_posting_cache config flag was unused. Every SegmentReader::postings therefore re-opened the segment's .post file, re-decoded the varint posting list, re-applied deletions, and rebuilt the skip table on every query — and on cloud/remote storage the read alone dominates.

Change

New posting_cache.rs — PostingCache: a per-SegmentReader byte-budget LRU over Arc<DecodedPostingList>. It wraps lru::LruCache::unbounded() with a cur_bytes/max_bytes accountant — put evicts least-recently-used entries until under budget, a single list larger than the whole budget is not cached (avoids thrash), and a budget of 0 disables it. Byte-budget (not entry-count) because posting lists range from a few bytes to many megabytes.
SegmentReader caches the decoded, deletion-filtered list per (field, term). A segment is immutable for a reader snapshot (deletions are fixed: has_deletions is set at open, the bitmap is load-once), so the filtered list is always consistent; a commit builds fresh segment readers with empty caches. A hit clones the list to build the existing owned-data iterator (from_decoded_soa) — a Vec memcpy, cheaper than the varint decode it replaces, so never a regression vs today — and skips the storage read entirely.
Disabled by default in SegmentReader::open (budget 0), so merge-engine and test readers — which read each term once — are byte-for-byte unchanged (no key allocation, no clone). InvertedIndexReader::new enables it per query reader, gated by enable_posting_cache (default true) and budgeted by max_cache_memory — no new config fields.
Removed the dead CacheManager::posting_cache field.

A zero-copy Arc-sharing iterator (no clone on hit) is a larger hot-path refactor, noted as a follow-up.

Scope / API

CacheManager::posting_cache and enable_posting_cache are internal (not exposed in any binding); no public API change. The previously-dead enable_posting_cache flag is now honoured.
Same documents returned — only the decode/read is memoised.

Verification

cargo build (full workspace + bindings) ✅
cargo clippy --all-targets -- -D warnings — zero warnings ✅
cargo fmt --check — clean ✅
cargo test -p laurus --lib — 1102 passed / 0 failed (+6: 5 unit + 1 integration); cargo test --workspace — exit 0, 51 binaries ✅
markdownlint-cli2 — 0 errors; docs (en + ja) updated ✅

Tests: PostingCache unit tests (hit/miss, byte-budget LRU eviction, oversized-entry skip, disabled no-op), and a reader integration test (a repeated postings(field, term) is a cache hit via stats; a delete + commit excludes the doc in the new snapshot — the cache never serves a stale, pre-deletion list).

🤖 Generated with Claude Code

`CacheManager::posting_cache` was declared `#[allow(dead_code)]` and the `enable_posting_cache` config flag was unused, so every `SegmentReader::postings` re-opened the segment's `.post` file, re-decoded the varint posting list, re-applied deletions, and rebuilt the skip table on every query — the read alone dominates on cloud/remote storage. Wire a real posting cache: - New `posting_cache.rs`: `PostingCache`, a per-`SegmentReader` byte-budget LRU over `Arc<DecodedPostingList>`. It wraps `lru::LruCache::unbounded()` with a `cur_bytes`/`max_bytes` accountant — `put` evicts the least-recently-used entries until under budget, a single list larger than the whole budget is not cached, and a budget of 0 disables it. - `SegmentReader` caches the decoded, deletion-filtered list per `(field, term)`. A segment is immutable for a reader snapshot (deletions are fixed), so the filtered list is always consistent; a commit builds fresh readers with empty caches. A hit clones the list to build the existing owned-data iterator (a Vec memcpy, cheaper than the decode it replaces) and skips the storage read. - The cache is disabled by default in `SegmentReader::open` (budget 0), so merge-engine and test readers — which read each term once — are byte-for-byte unchanged. `InvertedIndexReader::new` enables it per query reader, gated by `enable_posting_cache` and budgeted by `max_cache_memory` (no new config). - Removed the dead `CacheManager::posting_cache` field. Tests: `PostingCache` unit tests (hit/miss, byte-budget eviction, oversized skip, disabled) and a reader test (a repeated `postings` is a cache hit; a delete + commit excludes the doc in the new snapshot — no stale list). Docs (en/ja) updated. No public API change (internal; `enable_posting_cache` is now honoured). Closes #612

mosuka merged commit acf5737 into main Jun 2, 2026
22 checks passed

mosuka deleted the perf/612-posting-cache branch June 2, 2026 14:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(lexical/search): wire the decoded posting-list cache#770

perf(lexical/search): wire the decoded posting-list cache#770
mosuka merged 1 commit into
mainfrom
perf/612-posting-cache

mosuka commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mosuka commented Jun 2, 2026

Problem

Change

Scope / API

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant