perf(lexical/search): cache parsed DSL queries#769
Merged
Conversation
`InvertedIndexSearcher::search` re-parsed every `LexicalSearchQuery::Dsl(string)` on each call — the pest grammar ran and the analyzer re-tokenised the query terms. Autocomplete / popular-query workloads repeat the same strings and paid that cost per call. Add a snapshot-scoped LRU `ParsedQueryCache` (`dsl string -> Arc<dyn Query>`), mirroring the QueryFilterCache pattern from #578: - `parsed_query_cache.rs`: `Mutex<LruCache<String, Arc<dyn Query>>>` (a `Mutex` because `LruCache::get` takes `&mut self`), with hit/miss stats; capacity 0 disables it. - The cache lives on `InvertedIndexSearcher`, which the store rebuilds on every commit/optimize/refresh. The analyzer and `default_fields` are fixed for that searcher, so the DSL string alone keys the cache; a schema/analyzer change yields a fresh, empty cache (no manual invalidation). On a hit the parsed query is reused via `clone_box` (a refcount bump for boolean clause subtrees). - Only `search()` uses the cache; `count()` parses via `into_query` with a different parser config (no default fields), so it stays uncached to avoid cross-contamination. - New `parsed_query_cache_capacity` config (default 1024, 0 = disabled) threaded through `InvertedIndexConfig` and the `LexicalIndexConfig` builder. Tests: cache hit on a repeated DSL search (parsed once, identical results), and correct results with the cache disabled. Docs (en/ja) updated. Public API is unchanged (new opt-in config; the cache and its stats accessor are internal). Closes #590
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #590 (audit task LS-13, umbrella #534).
Problem
InvertedIndexSearcher::searchre-parsed everyLexicalSearchQuery::Dsl(string)on each call — the pest grammar ran and the analyzer re-tokenised the query terms. Autocomplete / popular-query workloads repeat the same strings and paid that cost per call.Change
A snapshot-scoped LRU
ParsedQueryCache(dsl string → Arc<dyn Query>), mirroring theQueryFilterCachepattern from #578:parsed_query_cache.rs:Mutex<LruCache<String, Arc<dyn Query>>>(aMutexbecauseLruCache::gettakes&mut self), with hit/miss stats; capacity0disables it.InvertedIndexSearcher, which the store rebuilds on everycommit()/optimize()/refresh(). The analyzer anddefault_fieldsare fixed for that searcher, so the DSL string alone keys the cache; a schema/analyzer change yields a fresh, empty cache (no manual invalidation). On a hit the parsed query is reused viaclone_box(a refcount bump for boolean clause subtrees, perf(lexical/query): use Arc<dyn Query> instead of Box<dyn Query> to make clones cheap #413) — far cheaper than re-parsing.search()uses the cache.count()parses viainto_querywith a different parser config (no default fields), so it stays uncached to avoid cross-contamination.parsed_query_cache_capacityconfig (default 1024,0= disabled) threaded throughInvertedIndexConfigand theLexicalIndexConfigbuilder.Scope / API
parsed_query_cache_capacityis a new opt-in config (enabled by default);ParsedQueryCacheand theparsed_query_cache_statsaccessor are internal. Not exposed in any binding.Verification
cargo build(full workspace + bindings) ✅cargo clippy --all-targets -- -D warnings— zero warnings ✅cargo fmt --check— clean ✅cargo test -p laurus --lib— 1096 passed / 0 failed (+6: 4 unit + 2 integration);cargo test --workspace— exit 0, 51 binaries ✅markdownlint-cli2— 0 errors; docs (en + ja) updated ✅New tests: a repeated DSL search is parsed once and returns identical results (cache hit asserted via stats); with the cache disabled (capacity 0) results are unchanged.
🤖 Generated with Claude Code