Skip to content

perf(lexical/search): cache parsed DSL queries#769

Merged
mosuka merged 1 commit into
mainfrom
perf/590-parsed-query-cache
Jun 2, 2026
Merged

perf(lexical/search): cache parsed DSL queries#769
mosuka merged 1 commit into
mainfrom
perf/590-parsed-query-cache

Conversation

@mosuka
Copy link
Copy Markdown
Owner

@mosuka mosuka commented Jun 2, 2026

Closes #590 (audit task LS-13, umbrella #534).

Problem

InvertedIndexSearcher::search re-parsed every LexicalSearchQuery::Dsl(string) on each call — the pest grammar ran and the analyzer re-tokenised the query terms. Autocomplete / popular-query workloads repeat the same strings and paid that cost per call.

Change

A snapshot-scoped LRU ParsedQueryCache (dsl string → Arc<dyn Query>), mirroring the QueryFilterCache pattern from #578:

  • parsed_query_cache.rs: Mutex<LruCache<String, Arc<dyn Query>>> (a Mutex because LruCache::get takes &mut self), with hit/miss stats; capacity 0 disables it.
  • The cache lives on InvertedIndexSearcher, which the store rebuilds on every commit() / optimize() / refresh(). The analyzer and default_fields are fixed for that searcher, so the DSL string alone keys the cache; a schema/analyzer change yields a fresh, empty cache (no manual invalidation). On a hit the parsed query is reused via clone_box (a refcount bump for boolean clause subtrees, perf(lexical/query): use Arc<dyn Query> instead of Box<dyn Query> to make clones cheap #413) — far cheaper than re-parsing.
  • Only search() uses the cache. count() parses via into_query with a different parser config (no default fields), so it stays uncached to avoid cross-contamination.
  • New parsed_query_cache_capacity config (default 1024, 0 = disabled) threaded through InvertedIndexConfig and the LexicalIndexConfig builder.

Scope / API

  • Public API is unchanged: parsed_query_cache_capacity is a new opt-in config (enabled by default); ParsedQueryCache and the parsed_query_cache_stats accessor are internal. Not exposed in any binding.
  • Membership/score semantics are unchanged — the same DSL parses to the same query; only repeated parsing is avoided.

Verification

  • cargo build (full workspace + bindings) ✅
  • cargo clippy --all-targets -- -D warnings — zero warnings ✅
  • cargo fmt --check — clean ✅
  • cargo test -p laurus --lib — 1096 passed / 0 failed (+6: 4 unit + 2 integration); cargo test --workspace — exit 0, 51 binaries ✅
  • markdownlint-cli2 — 0 errors; docs (en + ja) updated ✅

New tests: a repeated DSL search is parsed once and returns identical results (cache hit asserted via stats); with the cache disabled (capacity 0) results are unchanged.

🤖 Generated with Claude Code

`InvertedIndexSearcher::search` re-parsed every `LexicalSearchQuery::Dsl(string)`
on each call — the pest grammar ran and the analyzer re-tokenised the query
terms. Autocomplete / popular-query workloads repeat the same strings and paid
that cost per call.

Add a snapshot-scoped LRU `ParsedQueryCache` (`dsl string -> Arc<dyn Query>`),
mirroring the QueryFilterCache pattern from #578:

- `parsed_query_cache.rs`: `Mutex<LruCache<String, Arc<dyn Query>>>` (a `Mutex`
  because `LruCache::get` takes `&mut self`), with hit/miss stats; capacity 0
  disables it.
- The cache lives on `InvertedIndexSearcher`, which the store rebuilds on every
  commit/optimize/refresh. The analyzer and `default_fields` are fixed for that
  searcher, so the DSL string alone keys the cache; a schema/analyzer change
  yields a fresh, empty cache (no manual invalidation). On a hit the parsed
  query is reused via `clone_box` (a refcount bump for boolean clause subtrees).
- Only `search()` uses the cache; `count()` parses via `into_query` with a
  different parser config (no default fields), so it stays uncached to avoid
  cross-contamination.
- New `parsed_query_cache_capacity` config (default 1024, 0 = disabled) threaded
  through `InvertedIndexConfig` and the `LexicalIndexConfig` builder.

Tests: cache hit on a repeated DSL search (parsed once, identical results), and
correct results with the cache disabled. Docs (en/ja) updated. Public API is
unchanged (new opt-in config; the cache and its stats accessor are internal).

Closes #590
@mosuka mosuka merged commit 1322db0 into main Jun 2, 2026
22 checks passed
@mosuka mosuka deleted the perf/590-parsed-query-cache branch June 2, 2026 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf(lexical/search): DSL queries are reparsed per call; no parsed-query cache

1 participant