Cost-based join planner, bitmap index engine, two-pass boot indexing#1968
Draft
mjf wants to merge 4 commits into
Draft
Cost-based join planner, bitmap index engine, two-pass boot indexing#1968mjf wants to merge 4 commits into
mjf wants to merge 4 commits into
Conversation
Introduces a full cost-based query planner with join support, a roaring bitmap index engine for sub-millisecond predicate pushdown, and a restructured two-pass boot sequence that makes the UI interactive before full indexing completes. **Query Planner & Execution** - Cost-based join planner supporting hash join, nested-loop, and sort-merge strategies with automatic method selection based on cardinality estimates, NDV (number of distinct values), and MCV (most common values) statistics. Todo: The "real" SQL-like JOIN clause syntax (comming soon). - HalfXor sketch for approximate NDV estimation on large collections without materializing full sets. - Multi-source FROM clauses with cross-join detection, predicate extraction, and join-order enumeration. - EXPLAIN / EXPLAIN ANALYZE / EXPLAIN VERBOSE support with Postgres-style plan output including per-node timing, row counts, cost estimates, pushdown annotations, and Result Columns summary. - Predicate dispatch framework: WHERE predicates are classified and routed to capable scan engines (bitmap, KV-range, augmenter, or fallback compute) based on declared engine capabilities. - Bind-predicate analysis to decompose conjunctive WHERE clauses into inline KV filters (executed without Lua overhead) and residual predicates requiring full expression evaluation. - Query engine contract interface allowing pluggable scan backends with declared capability sets. **Bitmap Index Engine** - Roaring bitmap implementation (array, bitset, and run containers) with AND/OR/NOT/XOR operations for set-based filtering. - Bitmap index over the object store: per-column dictionaries encode attribute values into integer IDs; bitmaps track which rows carry each value. - Value codec layer supporting typed encoding of strings, numbers, booleans, and null for dictionary key ordering. - Bitmap query evaluator translating pushed-down predicates into bitmap intersections and producing row-ID sets for the scan layer. - Persisted bitmap state with incremental updates on page index/delete events. **Object Index & Boot Sequence** - Two-pass boot indexing: Pass-1 indexes Space Lua, Space Style, page metadata, and tags so Lua scripts and custom styles can load before the full index is built. Pass-2 runs the complete indexer set in background; the UI remains interactive during this phase! - `ObjectIndex` rewritten to coordinate bitmap engines, augmenter engines, KV scan, and the new planner through a unified query dispatch layer. - Augmenter engine providing virtual columns (e.g. `lastAccessed`, `lastRun`) backed by a persistent KV cache with synchronous overlay during query execution. - Collection-level statistics (row count, NDV, MCV) persisted alongside bitmap state and exposed to the planner for cost estimation. **SLIQ Language Extensions** - Wildcard SELECT (`select *`, `select source.*`, `select *.column`) with proper null-column preservation and GROUP BY integration. - SELECT output-name aliasing with automatic deduplication for unnamed expression fields. - ORDER BY projected column keys (referencing SELECT output names). - Intra-aggregate ORDER BY for ordered-set aggregates (quantile, `percentile_cont`, `percentile_disc`). - FILTER clause on aggregate calls. - GROUP BY wildcard forms (`group by *`, `group by source.*`). - LEADING join-order hint and WITH clause engine hints (`hash`/`loop`/`merge`). - DISTINCT / ALL modifiers on SELECT clause. - NULLS FIRST / NULLS LAST in ORDER BY. - `materialized` keyword in FROM clause to force early materialization of a source. - Extended parser to handle all new syntax nodes and clause types. **Aggregate Functions** - Expanded aggregate library: `min`/`max` with record-level comparison for wildcard arguments, product, covariance (`covar_pop`, `covar_samp`), correlation (`corr`), `quantile`, `percentile_cont`, `percentile_disc`, `mode`, `first`. - Numeric coercion helper (`numericValue`) consolidating null-guard and type-unwrap logic across all numeric aggregates. - Wildcard null-record semantics: rows whose source projection is all-null are treated as null input to the aggregate. **Bug Fixes** - Fixed SELECT nil-column loss: top-level SELECT table constructors now route through a null-preserving evaluator so that explicitly selected columns with nil values are stored as `SLIQ_NULL` rather than being omitted by Lua table-constructor semantics. This prevented column disappearance and spurious row deduplication under the default DISTINCT behaviour. - Added null-guard in `lintObjects` to avoid crash when page meta is unavailable. - Error handling in `indexPage` and `indexPagePass1` with try/catch to prevent single-page indexing failures from aborting the full reindex. **Documentation** - New SLIQ reference page documenting the full query language. - "Index Maintenance" page documenting bitmap lifecycle and rebuild procedures. - Updated Builtin Tags to reflect new virtual columns from augmenters. - Added queryPlanner configuration schema for tuning planner knobs. Todo: The documentation is fairly incomplete and will be subject of the next dedicated (sic!) development cycle. Therefor the new `SLIQ.md` is not yet referenced from the rest of the documentation.
Signed-off-by: Matouš Jan Fialka <mjf@mjf.cz>
`indexPagePass1` creates indexable objects for early UI interactivity (page queries, tag autocomplete, Lua/style execution) but _Pass-2_ already creates full anchor records. This removes duplicate anchors that likely bloated _Indexed DB_ size. Signed-off-by: Matouš Jan Fialka <mjf@mjf.cz>
after Pass-2. Signed-off-by: Matouš Jan Fialka <mjf@mjf.cz>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Introduces a full cost-based query planner with join support, a roaring bitmap index engine for sub-millisecond predicate pushdown, and a restructured two-pass boot sequence that makes the UI interactive before full indexing completes.
Query Planner & Execution
Cost-based join planner supporting hash join, nested-loop, and sort-merge strategies with automatic method selection based on cardinality estimates, NDV (number of distinct values), and MCV (most common values) statistics.
Todo: The "real" SQL-like JOIN clause syntax (comming soon).
HalfXor sketch for approximate NDV estimation on large collections without materializing full sets.
Multi-source FROM clauses with cross-join detection, predicate extraction, and join-order enumeration.
EXPLAIN / EXPLAIN ANALYZE / EXPLAIN VERBOSE support with Postgres-style plan output including per-node timing, row counts, cost estimates, pushdown annotations, and Result Columns summary.
Predicate dispatch framework: WHERE predicates are classified and routed to capable scan engines (bitmap, KV-range, augmenter, or fallback compute) based on declared engine capabilities.
Bind-predicate analysis to decompose conjunctive WHERE clauses into inline KV filters (executed without Lua overhead) and residual predicates requiring full expression evaluation.
Query engine contract interface allowing pluggable scan backends with declared capability sets.
Bitmap Index Engine
Roaring bitmap implementation (array, bitset, and run containers) with AND/OR/NOT/XOR operations for set-based filtering.
Bitmap index over the object store: per-column dictionaries encode attribute values into integer IDs; bitmaps track which rows carry each value.
Value codec layer supporting typed encoding of strings, numbers, booleans, and null for dictionary key ordering.
Bitmap query evaluator translating pushed-down predicates into bitmap intersections and producing row-ID sets for the scan layer.
Persisted bitmap state with incremental updates on page index/delete events.
Object Index & Boot Sequence
Two-pass boot indexing: Pass-1 indexes Space Lua, Space Style, page metadata, and tags so Lua scripts and custom styles can load before the full index is built. Pass-2 runs the complete indexer set in background; the UI remains interactive during this phase!
ObjectIndexrewritten to coordinate bitmap engines, augmenter engines, KV scan, and the new planner through a unified query dispatch layer.Augmenter engine providing virtual columns (e.g.
lastAccessed,lastRun) backed by a persistent KV cache with synchronous overlay during query execution.Collection-level statistics (row count, NDV, MCV) persisted alongside bitmap state and exposed to the planner for cost estimation.
SLIQ Language Extensions
Wildcard SELECT (
select *,select source.*,select *.column) with proper null-column preservation and GROUP BY integration.SELECT output-name aliasing with automatic deduplication for unnamed expression fields.
ORDER BY projected column keys (referencing SELECT output names).
Intra-aggregate ORDER BY for ordered-set aggregates (quantile,
percentile_cont,percentile_disc).FILTER clause on aggregate calls.
GROUP BY wildcard forms (
group by *,group by source.*).LEADING join-order hint and WITH clause engine hints (
hash/loop/merge).DISTINCT / ALL modifiers on SELECT clause.
NULLS FIRST / NULLS LAST in ORDER BY.
materializedkeyword in FROM clause to force early materialization of a source.Extended parser to handle all new syntax nodes and clause types.
Aggregate Functions
Expanded aggregate library:
min/maxwith record-level comparison for wildcard arguments, product, covariance (covar_pop,covar_samp), correlation (corr),quantile,percentile_cont,percentile_disc,mode,first.Numeric coercion helper (
numericValue) consolidating null-guard and type-unwrap logic across all numeric aggregates.Wildcard null-record semantics: rows whose source projection is all-null are treated as null input to the aggregate.
Bug Fixes
Fixed SELECT nil-column loss: top-level SELECT table constructors now route through a null-preserving evaluator so that explicitly selected columns with nil values are stored as
SLIQ_NULLrather than being omitted by Lua table-constructor semantics. This prevented column disappearance and spurious row deduplication under the default DISTINCT behaviour.Added null-guard in
lintObjectsto avoid crash when page meta is unavailable.Error handling in
indexPageandindexPagePass1with try/catch to prevent single-page indexing failures from aborting the full reindex.Documentation
New SLIQ reference page documenting the full query language.
"Index Maintenance" page documenting bitmap lifecycle and rebuild procedures.
Updated Builtin Tags to reflect new virtual columns from augmenters.
Added queryPlanner configuration schema for tuning planner knobs.
Todo: The documentation is fairly incomplete and will be subject of the next dedicated (sic!) development cycle. Therefor the new
SLIQ.mdis not yet referenced from the rest of the documentation.