perf(lexical/search): cache parsed DSL queries by mosuka · Pull Request #769 · mosuka/laurus

mosuka · 2026-06-02T12:11:34Z

Closes #590 (audit task LS-13, umbrella #534).

Problem

InvertedIndexSearcher::search re-parsed every LexicalSearchQuery::Dsl(string) on each call — the pest grammar ran and the analyzer re-tokenised the query terms. Autocomplete / popular-query workloads repeat the same strings and paid that cost per call.

Change

A snapshot-scoped LRU ParsedQueryCache (dsl string → Arc<dyn Query>), mirroring the QueryFilterCache pattern from #578:

parsed_query_cache.rs: Mutex<LruCache<String, Arc<dyn Query>>> (a Mutex because LruCache::get takes &mut self), with hit/miss stats; capacity 0 disables it.
The cache lives on InvertedIndexSearcher, which the store rebuilds on every commit() / optimize() / refresh(). The analyzer and default_fields are fixed for that searcher, so the DSL string alone keys the cache; a schema/analyzer change yields a fresh, empty cache (no manual invalidation). On a hit the parsed query is reused via clone_box (a refcount bump for boolean clause subtrees, perf(lexical/query): use Arc<dyn Query> instead of Box<dyn Query> to make clones cheap #413) — far cheaper than re-parsing.
Only search() uses the cache. count() parses via into_query with a different parser config (no default fields), so it stays uncached to avoid cross-contamination.
New parsed_query_cache_capacity config (default 1024, 0 = disabled) threaded through InvertedIndexConfig and the LexicalIndexConfig builder.

Scope / API

Public API is unchanged: parsed_query_cache_capacity is a new opt-in config (enabled by default); ParsedQueryCache and the parsed_query_cache_stats accessor are internal. Not exposed in any binding.
Membership/score semantics are unchanged — the same DSL parses to the same query; only repeated parsing is avoided.

Verification

cargo build (full workspace + bindings) ✅
cargo clippy --all-targets -- -D warnings — zero warnings ✅
cargo fmt --check — clean ✅
cargo test -p laurus --lib — 1096 passed / 0 failed (+6: 4 unit + 2 integration); cargo test --workspace — exit 0, 51 binaries ✅
markdownlint-cli2 — 0 errors; docs (en + ja) updated ✅

New tests: a repeated DSL search is parsed once and returns identical results (cache hit asserted via stats); with the cache disabled (capacity 0) results are unchanged.

🤖 Generated with Claude Code

`InvertedIndexSearcher::search` re-parsed every `LexicalSearchQuery::Dsl(string)` on each call — the pest grammar ran and the analyzer re-tokenised the query terms. Autocomplete / popular-query workloads repeat the same strings and paid that cost per call. Add a snapshot-scoped LRU `ParsedQueryCache` (`dsl string -> Arc<dyn Query>`), mirroring the QueryFilterCache pattern from #578: - `parsed_query_cache.rs`: `Mutex<LruCache<String, Arc<dyn Query>>>` (a `Mutex` because `LruCache::get` takes `&mut self`), with hit/miss stats; capacity 0 disables it. - The cache lives on `InvertedIndexSearcher`, which the store rebuilds on every commit/optimize/refresh. The analyzer and `default_fields` are fixed for that searcher, so the DSL string alone keys the cache; a schema/analyzer change yields a fresh, empty cache (no manual invalidation). On a hit the parsed query is reused via `clone_box` (a refcount bump for boolean clause subtrees). - Only `search()` uses the cache; `count()` parses via `into_query` with a different parser config (no default fields), so it stays uncached to avoid cross-contamination. - New `parsed_query_cache_capacity` config (default 1024, 0 = disabled) threaded through `InvertedIndexConfig` and the `LexicalIndexConfig` builder. Tests: cache hit on a repeated DSL search (parsed once, identical results), and correct results with the cache disabled. Docs (en/ja) updated. Public API is unchanged (new opt-in config; the cache and its stats accessor are internal). Closes #590

mosuka merged commit 1322db0 into main Jun 2, 2026
22 checks passed

mosuka deleted the perf/590-parsed-query-cache branch June 2, 2026 12:32

mosuka mentioned this pull request Jun 2, 2026

perf(lexical/search): LexicalQueryParser allocates Box<dyn Query> even for trivial queries #618

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(lexical/search): cache parsed DSL queries#769

perf(lexical/search): cache parsed DSL queries#769
mosuka merged 1 commit into
mainfrom
perf/590-parsed-query-cache

mosuka commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mosuka commented Jun 2, 2026

Problem

Change

Scope / API

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant