Skip to content

Cost-based join planner, bitmap index engine, two-pass boot indexing#1968

Draft
mjf wants to merge 4 commits into
silverbulletmd:mainfrom
mjf:liq-cost-based-join-planner
Draft

Cost-based join planner, bitmap index engine, two-pass boot indexing#1968
mjf wants to merge 4 commits into
silverbulletmd:mainfrom
mjf:liq-cost-based-join-planner

Conversation

@mjf

@mjf mjf commented May 6, 2026

Copy link
Copy Markdown
Contributor

Introduces a full cost-based query planner with join support, a roaring bitmap index engine for sub-millisecond predicate pushdown, and a restructured two-pass boot sequence that makes the UI interactive before full indexing completes.

Query Planner & Execution

  • Cost-based join planner supporting hash join, nested-loop, and sort-merge strategies with automatic method selection based on cardinality estimates, NDV (number of distinct values), and MCV (most common values) statistics.

    Todo: The "real" SQL-like JOIN clause syntax (comming soon).

  • HalfXor sketch for approximate NDV estimation on large collections without materializing full sets.

  • Multi-source FROM clauses with cross-join detection, predicate extraction, and join-order enumeration.

  • EXPLAIN / EXPLAIN ANALYZE / EXPLAIN VERBOSE support with Postgres-style plan output including per-node timing, row counts, cost estimates, pushdown annotations, and Result Columns summary.

  • Predicate dispatch framework: WHERE predicates are classified and routed to capable scan engines (bitmap, KV-range, augmenter, or fallback compute) based on declared engine capabilities.

  • Bind-predicate analysis to decompose conjunctive WHERE clauses into inline KV filters (executed without Lua overhead) and residual predicates requiring full expression evaluation.

  • Query engine contract interface allowing pluggable scan backends with declared capability sets.

Bitmap Index Engine

  • Roaring bitmap implementation (array, bitset, and run containers) with AND/OR/NOT/XOR operations for set-based filtering.

  • Bitmap index over the object store: per-column dictionaries encode attribute values into integer IDs; bitmaps track which rows carry each value.

  • Value codec layer supporting typed encoding of strings, numbers, booleans, and null for dictionary key ordering.

  • Bitmap query evaluator translating pushed-down predicates into bitmap intersections and producing row-ID sets for the scan layer.

  • Persisted bitmap state with incremental updates on page index/delete events.

Object Index & Boot Sequence

  • Two-pass boot indexing: Pass-1 indexes Space Lua, Space Style, page metadata, and tags so Lua scripts and custom styles can load before the full index is built. Pass-2 runs the complete indexer set in background; the UI remains interactive during this phase!

  • ObjectIndex rewritten to coordinate bitmap engines, augmenter engines, KV scan, and the new planner through a unified query dispatch layer.

  • Augmenter engine providing virtual columns (e.g. lastAccessed, lastRun) backed by a persistent KV cache with synchronous overlay during query execution.

  • Collection-level statistics (row count, NDV, MCV) persisted alongside bitmap state and exposed to the planner for cost estimation.

SLIQ Language Extensions

  • Wildcard SELECT (select *, select source.*, select *.column) with proper null-column preservation and GROUP BY integration.

  • SELECT output-name aliasing with automatic deduplication for unnamed expression fields.

  • ORDER BY projected column keys (referencing SELECT output names).

  • Intra-aggregate ORDER BY for ordered-set aggregates (quantile, percentile_cont, percentile_disc).

  • FILTER clause on aggregate calls.

  • GROUP BY wildcard forms (group by *, group by source.*).

  • LEADING join-order hint and WITH clause engine hints (hash/loop/merge).

  • DISTINCT / ALL modifiers on SELECT clause.

  • NULLS FIRST / NULLS LAST in ORDER BY.

  • materialized keyword in FROM clause to force early materialization of a source.

  • Extended parser to handle all new syntax nodes and clause types.

Aggregate Functions

  • Expanded aggregate library: min/max with record-level comparison for wildcard arguments, product, covariance (covar_pop, covar_samp), correlation (corr), quantile, percentile_cont, percentile_disc, mode, first.

  • Numeric coercion helper (numericValue) consolidating null-guard and type-unwrap logic across all numeric aggregates.

  • Wildcard null-record semantics: rows whose source projection is all-null are treated as null input to the aggregate.

Bug Fixes

  • Fixed SELECT nil-column loss: top-level SELECT table constructors now route through a null-preserving evaluator so that explicitly selected columns with nil values are stored as SLIQ_NULL rather than being omitted by Lua table-constructor semantics. This prevented column disappearance and spurious row deduplication under the default DISTINCT behaviour.

  • Added null-guard in lintObjects to avoid crash when page meta is unavailable.

  • Error handling in indexPage and indexPagePass1 with try/catch to prevent single-page indexing failures from aborting the full reindex.

Documentation

  • New SLIQ reference page documenting the full query language.

  • "Index Maintenance" page documenting bitmap lifecycle and rebuild procedures.

  • Updated Builtin Tags to reflect new virtual columns from augmenters.

  • Added queryPlanner configuration schema for tuning planner knobs.

Todo: The documentation is fairly incomplete and will be subject of the next dedicated (sic!) development cycle. Therefor the new SLIQ.md is not yet referenced from the rest of the documentation.

mjf added 4 commits May 6, 2026 14:03
Introduces a full cost-based query planner with join support, a roaring
bitmap index engine for sub-millisecond predicate pushdown, and a
restructured two-pass boot sequence that makes the UI interactive before
full indexing completes.

**Query Planner & Execution**

- Cost-based join planner supporting hash join, nested-loop, and
  sort-merge strategies with automatic method selection based on
  cardinality estimates, NDV (number of distinct values), and MCV
  (most common values) statistics.

  Todo: The "real" SQL-like JOIN clause syntax (comming soon).

- HalfXor sketch for approximate NDV estimation on large collections
  without materializing full sets.

- Multi-source FROM clauses with cross-join detection, predicate
  extraction, and join-order enumeration.

- EXPLAIN / EXPLAIN ANALYZE / EXPLAIN VERBOSE support with
  Postgres-style plan output including per-node timing, row counts,
  cost estimates, pushdown annotations, and Result Columns summary.

- Predicate dispatch framework: WHERE predicates are classified and
  routed to capable scan engines (bitmap, KV-range, augmenter, or
  fallback compute) based on declared engine capabilities.

- Bind-predicate analysis to decompose conjunctive WHERE clauses into
  inline KV filters (executed without Lua overhead) and residual
  predicates requiring full expression evaluation.

- Query engine contract interface allowing pluggable scan backends
  with declared capability sets.

**Bitmap Index Engine**

- Roaring bitmap implementation (array, bitset, and run containers)
  with AND/OR/NOT/XOR operations for set-based filtering.

- Bitmap index over the object store: per-column dictionaries encode
  attribute values into integer IDs; bitmaps track which rows carry
  each value.

- Value codec layer supporting typed encoding of strings, numbers,
  booleans, and null for dictionary key ordering.

- Bitmap query evaluator translating pushed-down predicates into
  bitmap intersections and producing row-ID sets for the scan layer.

- Persisted bitmap state with incremental updates on page
  index/delete events.

**Object Index & Boot Sequence**

- Two-pass boot indexing: Pass-1 indexes Space Lua, Space Style,
  page metadata, and tags so Lua scripts and custom styles can load
  before the full index is built. Pass-2 runs the complete indexer
  set in background; the UI remains interactive during this phase!

- `ObjectIndex` rewritten to coordinate bitmap engines, augmenter
  engines, KV scan, and the new planner through a unified query
  dispatch layer.

- Augmenter engine providing virtual columns (e.g. `lastAccessed`,
  `lastRun`) backed by a persistent KV cache with synchronous
  overlay during query execution.

- Collection-level statistics (row count, NDV, MCV) persisted
  alongside bitmap state and exposed to the planner for cost
  estimation.

**SLIQ Language Extensions**

- Wildcard SELECT (`select *`, `select source.*`, `select *.column`)
  with proper null-column preservation and GROUP BY integration.

- SELECT output-name aliasing with automatic deduplication for
  unnamed expression fields.

- ORDER BY projected column keys (referencing SELECT output names).

- Intra-aggregate ORDER BY for ordered-set aggregates (quantile,
  `percentile_cont`, `percentile_disc`).

- FILTER clause on aggregate calls.

- GROUP BY wildcard forms (`group by *`, `group by source.*`).

- LEADING join-order hint and WITH clause engine hints
  (`hash`/`loop`/`merge`).

- DISTINCT / ALL modifiers on SELECT clause.

- NULLS FIRST / NULLS LAST in ORDER BY.

- `materialized` keyword in FROM clause to force early
  materialization of a source.

- Extended parser to handle all new syntax nodes and clause types.

**Aggregate Functions**

- Expanded aggregate library: `min`/`max` with record-level comparison
  for wildcard arguments, product, covariance (`covar_pop`,
  `covar_samp`), correlation (`corr`), `quantile`, `percentile_cont`,
  `percentile_disc`, `mode`, `first`.

- Numeric coercion helper (`numericValue`) consolidating null-guard
  and type-unwrap logic across all numeric aggregates.

- Wildcard null-record semantics: rows whose source projection is
  all-null are treated as null input to the aggregate.

**Bug Fixes**

- Fixed SELECT nil-column loss: top-level SELECT table constructors
  now route through a null-preserving evaluator so that explicitly
  selected columns with nil values are stored as `SLIQ_NULL` rather
  than being omitted by Lua table-constructor semantics. This
  prevented column disappearance and spurious row deduplication
  under the default DISTINCT behaviour.

- Added null-guard in `lintObjects` to avoid crash when page meta is
  unavailable.

- Error handling in `indexPage` and `indexPagePass1` with try/catch to
  prevent single-page indexing failures from aborting the full
  reindex.

**Documentation**

- New SLIQ reference page documenting the full query language.

- "Index Maintenance" page documenting bitmap lifecycle and rebuild
  procedures.

- Updated Builtin Tags to reflect new virtual columns from
  augmenters.

- Added queryPlanner configuration schema for tuning planner knobs.

Todo: The documentation is fairly incomplete and will be subject of the
next dedicated (sic!) development cycle. Therefor the new `SLIQ.md` is
not yet referenced from the rest of the documentation.
Signed-off-by: Matouš Jan Fialka <mjf@mjf.cz>
`indexPagePass1` creates indexable objects for early UI interactivity
(page queries, tag autocomplete, Lua/style execution) but _Pass-2_
already creates full anchor records. This removes duplicate anchors that
likely bloated _Indexed DB_ size.

Signed-off-by: Matouš Jan Fialka <mjf@mjf.cz>
after Pass-2.

Signed-off-by: Matouš Jan Fialka <mjf@mjf.cz>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant