Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions docs/ja/src/laurus/faceting.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,3 +77,12 @@ Category
- **EC(電子商取引)**: カテゴリ、ブランド、価格帯、評価によるフィルタリング
- **ドキュメント検索**: 著者、部門、日付範囲、ドキュメントタイプによるフィルタリング
- **コンテンツ管理**: タグ、トピック、コンテンツステータスによるフィルタリング

## パフォーマンス

ファセットカウントは stored document ではなく、各フィールドの **DocValues** 列から読み取られます。
収集された各ヒットについて、コレクターはファセットフィールドの値だけを per-field の DocValues
ルックアップで読むため、ファセット対象の全フィールドが DocValues 列を持つ場合(既定では、index 時に
全 stored field が DocValues に書かれるため常に成立)、stored fields blob 全体を decode / clone しません。
DocValues を持たないフィールドは透過的に stored document へフォールバックするため、結果はどちらの経路でも
同一で、変わるのは読み取り経路だけです。
10 changes: 10 additions & 0 deletions docs/src/laurus/faceting.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,3 +77,13 @@ Each node in this tree corresponds to a `FacetCount` with its `children` populat
- **E-commerce**: Filter by category, brand, price range, rating
- **Document search**: Filter by author, department, date range, document type
- **Content management**: Filter by tags, topics, content status

## Performance

Facet counts are read from each field's **DocValues** column, not from the
stored document. For every collected hit the collector reads only the facet
field's value via the per-field DocValues lookup, so it never decodes or clones
the whole stored-fields blob when every faceted field has a DocValues column
(which is the default — every stored field is written to DocValues at index
time). A field that lacks DocValues transparently falls back to the stored
document, so results are identical either way; only the read path changes.
39 changes: 37 additions & 2 deletions laurus/benches/facet_bench.rs
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ use std::sync::Arc;

use criterion::{BatchSize, BenchmarkId, Criterion, Throughput, criterion_group, criterion_main};

use laurus::lexical::core::field::FieldValue;
use laurus::lexical::index::structures::bkd_tree::BKDTree;
use laurus::lexical::reader::{FieldStats, LexicalIndexReader, PostingIterator, ReaderTermInfo};
use laurus::lexical::search::features::facet::{FacetCollector, FacetConfig};
Expand Down Expand Up @@ -146,15 +147,43 @@ impl LexicalIndexReader for MockFacetReader {
fn as_any(&self) -> &dyn Any {
self
}

// DocValues are available for every stored field (mirrors the real
// writer, which stores every field into DocValues). This lets
// `collect_doc` take the #597 fast path and read facet values directly
// instead of decoding + cloning the whole document.
fn has_doc_values(&self, _field: &str) -> bool {
true
}

fn get_doc_value(&self, field: &str, doc_id: u64) -> LaurusResult<Option<FieldValue>> {
Ok(self
.documents
.get(doc_id as usize)
.and_then(|d| d.get(field).cloned()))
}
}

/// Build `n` documents with a single flat facet field (`field_a`),
/// drawing values from a 50-value pool deterministically.
/// A non-facet stored-field payload (title + body) so each document is
/// "fat". The pre-#597 path decodes and clones *every* field via
/// `reader.document()`, while the #597 path reads only the facet field's
/// DocValue — this payload models the asymmetry between a whole-document
/// decode and a single-field DocValues read.
fn payload(i: usize) -> String {
format!("doc {i}: {}", "lorem ipsum dolor sit amet ".repeat(10))
}

fn build_flat_documents(n: usize) -> Vec<Document> {
(0..n)
.map(|i| {
let value = format!("value_{}", i % FACET_VALUES_PER_FIELD);
Document::builder().add_text("field_a", value).build()
Document::builder()
.add_text("field_a", value)
.add_text("title", format!("Title {i}"))
.add_text("body", payload(i))
.build()
})
.collect()
}
Expand All @@ -172,6 +201,8 @@ fn build_multi_field_documents(n: usize) -> Vec<Document> {
.add_text("field_a", v_a)
.add_text("field_b", v_b)
.add_text("field_c", v_c)
.add_text("title", format!("Title {i}"))
.add_text("body", payload(i))
.build()
})
.collect()
Expand All @@ -192,7 +223,11 @@ fn build_hierarchical_documents(n: usize) -> Vec<Document> {
let state = (leaf_idx / 25) % 5;
let city = leaf_idx % 5;
let path = format!("r{region}/c{country}/s{state}/v{city}");
Document::builder().add_text("hier_field", path).build()
Document::builder()
.add_text("hier_field", path)
.add_text("title", format!("Title {i}"))
.add_text("body", payload(i))
.build()
})
.collect()
}
Expand Down
Loading
Loading