feat(cubesql): merge view joins on a shared cube member into a single CubeScan by paveltiunov · Pull Request #10977 · cube-js/cube

paveltiunov · 2026-05-31T22:19:27Z

Summary

When the Tesseract SQL planner is enabled, the SQL API now merges a join of
views on a dimension that resolves to the same underlying cube member into a
single multi-fact CubeScan, instead of erroring out. The planner stitches the
fact groups with a FULL OUTER JOIN over the shared key; the requested SQL join
type (INNER/LEFT/RIGHT/FULL) is reconstructed with set filters on the
join-key dimension(s).

Motivating query:

SELECT c.customer_city, measure(o.revenue), measure(c.avg_age)
FROM customers_view c
LEFT JOIN orders_view o ON o.customer_city = c.customer_city
GROUP BY 1

How it works

The rewrite is a local CubeSQL e-graph rewrite, structured around a new
intermediate MultiFactJoinWrapper node that carries the join key (as
underlying cube members) so joins and filters can compose before the query is
finalized at the aggregate:

shared-member-join-to-wrapper — Join(CubeScan, CubeScan) →
MultiFactJoinWrapper(CubeScan)
shared-member-join-extend-wrapper — Join(MultiFactJoinWrapper(CubeScan), CubeScan) →
MultiFactJoinWrapper(CubeScan), enabling joins of 3+ views (left-deep)
shared-time-member-cross-join-to-wrapper — Filter(DATE_TRUNC = DATE_TRUNC, CrossJoin(CubeScan, CubeScan)) →
MultiFactJoinWrapper(CubeScan), supporting joins written directly on
DATE_TRUNC (which the planner lowers to a filtered cross join, i.e. INNER)
multi-fact-join-wrapper-filter-push-down — Filter(MultiFactJoinWrapper) →
MultiFactJoinWrapper(Filter), pushing WHERE/ON filters into the merged scan
aggregate-multi-fact-join-wrapper — unwraps the wrapper only when the
GROUP BY exactly matches the recorded join key

Time dimensions

Both shapes of the common time multi-fact pattern are supported:

Join on the raw shared time column + GROUP BY DATE_TRUNC('day', …) (grouped
column emitted as a timeDimensions entry with granularity).
Join directly on DATE_TRUNC('day', a.ts) = DATE_TRUNC('day', b.ts) (INNER),
requiring both truncated columns to resolve to the same underlying time member
at the same granularity.

Guards

The merge only happens when:

The Tesseract SQL planner is enabled (CUBEJS_TESSERACT_SQL_PLANNER).
Both sides of the join condition resolve to the same underlying cube member,
and the join key is composed only of dimensions.
Every join-key column on a side resolves to a single cube/view.
The query is grouped by exactly the join key. Ungrouped joins (SELECT *) and
group-by-a-different-dimension queries fall back to standard join handling
(and error as before).

Tests

compile::test::test_cube_join_views:

2-way LEFT/INNER/RIGHT/FULL join semantics
3-way and 4-way FULL joins; 3-way LEFT pinning per-pass presence filters
WHERE filter pushed into the merged scan; ON-clause filter on a single fact
Join on a shared time column + GROUP BY DATE_TRUNC (LEFT); join on
DATE_TRUNC + GROUP BY DATE_TRUNC (INNER)
Negative: Tesseract disabled, ungrouped SELECT *, group-by non-join-key, join on a measure

Docs

docs-mintlify/docs/data-modeling/multi-fact-views.mdx and the SQL API joins
reference document the behavior, requirements, N-way joins, time-dimension
joins, filtering, and the join-type semantics table.

codecov · 2026-06-01T20:34:56Z

Codecov Report

❌ Patch coverage is 95.40317% with 61 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.54%. Comparing base (0d6393b) to head (379a2a1).
⚠️ Report is 8 commits behind head on master.

Files with missing lines	Patch %	Lines
...besql/cubesql/src/compile/rewrite/rules/members.rs	93.04%	43 Missing ⚠️
...l/cubesql/src/compile/test/test_cube_join_views.rs	97.42%	18 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #10977      +/-   ##
==========================================
+ Coverage   83.32%   83.54%   +0.21%     
==========================================
  Files         255      256       +1     
  Lines       77261    78588    +1327     
==========================================
+ Hits        64379    65655    +1276     
- Misses      12882    12933      +51

Flag	Coverage Δ
cubesql	`83.54% <95.40%> (+0.21%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

claude · 2026-06-03T01:17:28Z

Claude finished @paveltiunov's task in 2m 14s —— View job

Claude Code Review

Verdict: LGTM with a few nits. No correctness blockers found. 1 low (test gap), 2 nits (cleanliness/efficiency).

Full review (expand)

Read PR context
Read changed files (members.rs push_down_cube_join + test_cube_join_views.rs)
Check existing review threads (none)
Post review

Summary

The change cleanly generalizes push_down_cube_join to also match dimension equi-joins whose left/right columns resolve to the same underlying cube member (via alias_member). The merge gate (join key fully within dimensions, every column pair resolving to the same underlying member) is appropriately conservative — joins on measures/segments or on different underlying members fall through to the existing non-merged handling, so risk of regression in unrelated query shapes is low. Re-introducing INNER/LEFT/RIGHT semantics on top of the multi-fact FULL OUTER stitch by injecting set filters on the required-side join keys is a sensible approach and matches the prior PR refactor away from a measure-based presence marker.

Findings

[Low] Test coverage gap for RIGHT JOIN and FULL JOIN.
test_cube_join_views.rs covers the un-grouped LEFT, grouped LEFT, grouped INNER, and a negative (measure key). The semantics table in the PR description lists FULL (no filters) and RIGHT (right-side set filter), but neither is asserted. A 4th case for RIGHT JOIN (and optionally a FULL JOIN asserting filters: Some(vec![])) would lock in the behavior described in the table — members.rs:2947-2955 is otherwise the only spec for it.

[Nit] Dangling egraph mutations on no-op outcome.
In push_down_cube_join (members.rs:2966-2986), egraph.add(...) for the filter chain and the subst.insert(left_filters_var, acc) run before the nested 4-deep var_iter! loops at members.rs:2989-3034. If those loops yield zero combinations, the function returns false while leaving orphan filter e-nodes in the egraph and a stale subst mutation. egg dedups identical adds so it's not a correctness or blow-up risk, but it's tidier to construct the filter chain inside the innermost loop right before return true, or after the alias_to_cube validation passes. Today's pattern relies on egg's idempotency and on the outer loops effectively always being non-empty (because is_proper_cube_join_condition / the shared-member detection has already validated members exist).

[Nit] Composite-key cube name only carries the last column.
members.rs:2910-2914 overwrites left_cube_name / right_cube_name each loop iteration via name.split('.').next(). For a single-column join (the only realistic case today this rewrite fires on) this is fine, but for a hypothetical composite key spanning columns from different cubes on one side, the code would silently pick whichever cube the last column belongs to. Worth either asserting all columns share a cube prefix (return None otherwise) or adding a comment that single-cube-per-side is the assumed precondition.

Other notes

The verbose crate::compile::rewrite::FilterMember{Member,Op,Values} paths at members.rs:2969-2977 are inconsistent with the rest of the file's import style. Optional cleanup — add to the existing use block at the top.
The shared-member path does not gate on member_name_to_expr.is_some() the way is_proper_cube_join_condition does (members.rs:3229-3237). In practice the e-class analysis is populated by the time joins are considered, but it's worth a quick check or a comment noting why the guard isn't needed here.
Comments are thorough and the rationale (FULL OUTER stitch + set filters to recover join semantics) is well captured at the code level.

· [branch](https://github.com/cube-js/cube/tree/cursor/view-join-shared-member-375b)

…eScan Generalize the push-down-cube-join rewrite so that a join between two CubeScans (typically views) on a dimension that resolves to the same underlying cube member is merged into a single CubeScan, just like the existing __cubeJoinField cube-to-cube join. A view dimension keeps its original cube.dimension path in alias_member, which is used to detect that both sides of the equi-join reference the same shared key. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

Mirror the motivating query exactly: SELECT c.customer_city, measure(o.revenue), measure(c.avg_age) FROM customers_view c LEFT JOIN orders_view o ON o.customer_city = c.customer_city GROUP BY 1 and assert it merges into a single grouped multi-fact CubeScan. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

When merging a join between two views on a shared cube member, the downstream multi-fact query is rendered as a FULL OUTER JOIN over the shared key. To recover the requested join semantics, the rewrite now adds a measure 'set' filter on each side that must be present: - INNER: both sides required - LEFT: left side required - RIGHT: right side required - FULL: no extra filter Branch presence is detected via a measure of the side (the grouping key is COALESCEd across sides downstream, so it cannot tell sides apart). Covered with left/inner group-by tests. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

…e filter Detect side presence with the side's join-key dimension instead of an arbitrary measure. The join key is always available and is the actual shared-key marker, avoiding the nullable-measure caveat and the case where a side has no selected measure. - LEFT: left join key must be set - RIGHT: right join key must be set - INNER: both join keys must be set - FULL: no extra filter Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

…n dimensions Make the merge gate explicit: the entire join key must resolve to dimensions (or time dimensions) on both sides and to the same underlying cube member. A join key that touches a measure/segment/etc. is rejected and the join falls back to normal (non-merged) handling. Add a negative test that joining two views on measures is not merged. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

- Add #[cfg(test)] to the test_cube_join_views module so it is not compiled into non-test builds (fixes unresolved pretty_assertions and unused-import errors under clippy -D warnings and the native builds). - Remove the unused right_filters_var from push_down_cube_join. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

…shift) Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

claude · 2026-06-05T23:52:12Z

Claude finished @cursor[bot]'s task in 46s —— View job

Claude Code Review

Verdict: LGTM. No correctness blockers. The merge gate ("join key fully within dimensions, same underlying member on both sides") is conservative, so non-view/unrelated joins fall through unchanged. The set-filter trick to recover INNER/LEFT/RIGHT on top of the FULL OUTER multi-fact stitch is consistent with the prior PR refactor away from a measure-based presence marker.

Notes (expand) — 1 low, 2 nits; all re-affirmed from the prior review pass

[Low] No test for RIGHT or FULL join semantics.
test_cube_join_views.rs covers ungrouped/grouped LEFT, grouped INNER, and a negative (measure key). The semantics table in the PR description lists FULL (no filters) and RIGHT (right-side set). Both branches in members.rs:2947-2955 are otherwise unexercised — a quick RIGHT case and a FULL case (asserting filters: Some(vec![])) would lock the table in.

[Nit] Egraph/subst mutation runs before the alias_to_cube validation.
members.rs:2966-2986 adds filter e-nodes and calls subst.insert(left_filters_var, acc) before the nested var_iter! loops at members.rs:2989-3034 decide whether to return true. If those loops yield zero combinations (rare in practice — is_proper_cube_join_condition/shared-member detection already validated the members), the function returns false having left orphan filter nodes and a stale subst entry. egg dedupes identical adds, so this isn't a correctness or blow-up risk, but constructing the filter chain inside the innermost loop right before return true (or after the alias_to_cube iteration produces ≥1 pair) would be tidier.

[Nit] Composite-key cube name only carries the last column.
members.rs:2910-2914 overwrites left_cube_name/right_cube_name each loop iteration via name.split('.').next(). Fine for single-column joins (the only case the rewrite realistically fires on today), but for a hypothetical composite key whose columns span different cubes on one side, the code silently picks whichever cube the last column belongs to. Either assert all columns share a cube prefix (return None otherwise) or add a comment that single-cube-per-side is the assumed precondition.

Style: the verbose crate::compile::rewrite::FilterMember* paths at members.rs:2969-2977 are inconsistent with the file's import style — fold them into the existing use block when convenient.

· [branch](https://github.com/cube-js/cube/tree/cursor/view-join-shared-member-375b)

…join key Move the shared-member view-join merge out of push_down_cube_join (which runs on the always-ungrouped raw join) into a new rule that matches an Aggregate over the join. The merge now only fires when: - the query is grouped (an Aggregate sits over the join), and - the GROUP BY is exactly the shared join key. This rejects ungrouped queries (e.g. SELECT * over the join) and queries that group by a non-join-key dimension, both of which would otherwise produce an incorrect multi-fact pushdown. push_down_cube_join is restored to its original __cubeJoinField-only behavior. Tests: grouped left/inner merge (with join-key set filters); ungrouped, group-by-mismatch, and measure-key joins are not merged. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

claude · 2026-06-06T06:28:29Z

Claude finished @cursor[bot]'s task in 1m 31s —— View job

Claude Code Review

Verdict: LGTM. No correctness blockers. The latest commit (6306a2f) moves the merge out of push_down_cube_join into a new aggregate-gated rule, so the rewrite now only fires when there's an Aggregate over the join and the GROUP BY is exactly the shared join key — a much tighter gate than before. Ungrouped joins and group-by-non-key cases that would have produced an incorrect multi-fact pushdown are now rejected (the new negative tests pin this down). push_down_cube_join is back to its original __cubeJoinField-only shape.

Findings: 0 high · 0 medium · 1 low · 2 nits — all left as inline comments. Re-affirms two prior threads (composite-key cube name, egraph mutation before final validation) against the new location; the RIGHT/FULL test gap is unchanged.

Notes (expand)

Read PR context and re-review since last pass
Read members.rs diff (new push_down_aggregate_shared_member_join rule) and test_cube_join_views.rs
Check existing review threads (none open)
Post review

[Low] No RIGHT or FULL join test. Covered: grouped LEFT, grouped INNER, ungrouped/group-by-mismatch/measure-key (negatives). The semantics table lists FULL (no filters) and RIGHT (right-side set) but neither is asserted — members.rs:3122-3134 is otherwise unexercised. Inlined on the test file.

[Nit] Composite-key cube name keeps only the last column's cube. members.rs:3045-3046 overwrites left_cube_name/right_cube_name each loop iteration. Single-column shared keys (the realistic case) are unaffected; composite keys spanning multiple cubes on one side would silently use the last column's cube in out_join_hints. Inlined.

[Nit] Egraph/subst mutation runs before the iteration that decides success. members.rs:3138-3157 adds filter e-nodes and inserts into subst before the outer var_iter! loops decide whether to return true. egg dedupes, so it's not a correctness/blow-up risk in practice, but constructing the filter chain right before return true would be tidier. Inlined.

Style: the verbose crate::compile::rewrite::FilterMember* paths at members.rs:3141-3149 are inconsistent with the file's import style — fold into the existing use block when convenient.

· · [branch](https://github.com/cube-js/cube/tree/cursor/view-join-shared-member-375b)

The shared-member view-join merge produces a multi-fact query that is only handled correctly by the Tesseract SQL planner (FULL OUTER stitch over the shared key). Add an `enable_tesseract_sql_planner` config option (read from CUBEJS_TESSERACT_SQL_PLANNER) and only fire the rewrite when it is enabled. Add a test that the merge does not happen when Tesseract is disabled. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

claude · 2026-06-06T19:47:02Z

Claude finished @cursor[bot]'s task in 1m 6s —— View job

Claude Code Review

Verdict: LGTM. New commit 93e642b adds a config gate on the Tesseract SQL planner — the merge only fires when CUBEJS_TESSERACT_SQL_PLANNER=true, matching nativeSqlPlanner in packages/cubejs-backend-shared/src/env.ts:318. Default is false, so non-Tesseract deployments are unaffected. A negative test pins down that the merge doesn't fire when the gate is off.

Findings since last review: 0 high · 0 medium · 0 low · 0 nits new — 3 prior unresolved threads (1 low, 2 nits) still apply against the new line numbers; not re-posting.

Details (expand)

Reviewed 93e642b diff (members.rs gate + new config field + test refactor)
Confirmed CUBEJS_TESSERACT_SQL_PLANNER matches the existing JS env var (packages/cubejs-backend-shared/src/env.ts:318)
Re-checked the 3 prior unresolved threads against current line numbers — still applicable, not duplicating

Re-affirmed prior threads (still open, no new comment):

members.rs:3053-3054 — composite-key cube name only carries the last column
members.rs:3146-3166 — egraph/subst mutation runs before the iteration that decides success
test_cube_join_views.rs:283 — no RIGHT or FULL join test

On the new commit specifically: the gate is read once at rule-construction time (self.config_obj.enable_tesseract_sql_planner() outside the closure, captured by move), so it's not re-evaluated per egraph application — appropriate, since ConfigObj is process-wide. Default of false in both ConfigObjImpl::new() and Config::test() matches the JS-side default. CompilationError::Rewrite is the expected error class when the rewrite doesn't fire and no other rule can plan the join, so the new negative test's assertion is the right shape.

· branch ·

Document that joining two views on a dimension that resolves to the same underlying cube member (grouped by that key) triggers a multi-fact query in the SQL API, including the join-type semantics (inner/left/right/full) and the Tesseract requirement. Add the behavior to the multi-fact views page and a cross-referencing section in the SQL API joins reference. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

claude · 2026-06-06T21:01:57Z

Claude finished @cursor[bot]'s task in 48s —— View job

Claude Code Review

Verdict: LGTM. Latest commit (3b61886) is docs-only — adds docs-mintlify/docs/data-modeling/multi-fact-views.mdx and a "Joining views on a shared dimension" section in joins.mdx. The Rust changes are unchanged since the previous pass. No new correctness blockers.

Findings on this turn: 0 high · 0 medium · 0 low · 0 nits new. Re-affirmed 3 prior unresolved threads (1 low, 2 nits) — still applicable, not duplicating.

Details (expand)

Diffed 93e642b..HEAD (docs-only): multi-fact-views.mdx new, joins.mdx cross-link section
Verified docs.json nav registers docs/data-modeling/multi-fact-views (line 139)
Re-checked the 3 prior unresolved Claude threads against current code — line numbers and concerns unchanged
Confirmed authoring location matches docs-mintlify/CLAUDE.md conventions (no body H1, title/description frontmatter, root-relative internal links, language-fenced code blocks, <Warning>/<CodeGroup> used correctly)

Docs review (multi-fact-views.mdx + joins.mdx):

Frontmatter, headings, callouts, and CodeGroup tabs follow docs-mintlify/CLAUDE.md exactly.
The semantics table in multi-fact-views.mdx:356-361 matches the implementation table in members.rs:3122-3134 (FULL → no filters; INNER → both sides; LEFT → left; RIGHT → right).
Gating conditions in multi-fact-views.mdx:339-349 faithfully describe the three implementation gates (Tesseract enabled, dimension-only join key, GROUP BY = join key).
Cross-links resolve: ref-tesseract-env → /reference/configuration/environment-variables#cubejs_tesseract_sql_planner, ref-multi-fact-views → /docs/data-modeling/multi-fact-views. Both targets exist.
One very minor nit (not posting): the example at multi-fact-views.mdx:324-332 joins on name while the modeled customers cube doesn't expose name through the customers_view/orders_view/returns_view shape introduced earlier — the example uses different view names than the modeled customer_overview view. Readers can follow the intent, so not worth blocking on; could be clarified with one sentence noting these are separate single-fact views.

Re-affirmed prior threads (still open, no new comment):

members.rs:3054 — composite-key cube name only carries the last column
members.rs:3165 — egraph/subst mutation runs before the iteration that decides success
test_cube_join_views.rs:283 — no RIGHT or FULL join test

· branch ·

…e/view A CubeScan can expose members from multiple cubes/views, so enforce that every join-key column on each side resolves to the same cube/view. A mismatch would make the merged join hint ambiguous, so such joins are no longer merged. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

claude · 2026-06-06T21:15:44Z

Claude finished @cursor[bot]'s task in 46s —— View job

Claude Code Review

Verdict: LGTM. The new commit 4872fcb addresses the prior composite-key cube-name nit by validating that every join-key column on a side resolves to the same cube/view and rejecting the merge otherwise (members.rs:3053-3068). No new correctness blockers.

Findings on this turn: 0 high · 0 medium · 0 low · 0 nits new. Resolved 1 prior thread (composite-key); 2 prior threads still apply (1 low, 1 nit) — not re-posting.

Details (expand)

Diffed 3b61886..HEAD — single Rust change in members.rs (composite-key safety)
Verified the fix: pre-existing left_cube_name/right_cube_name are now compared against each column's cube prefix; mismatch sets all_match = false and the merge is rejected. Single-column shared-key path is unchanged.
Resolved the now-obsolete composite-key thread
Re-checked the 2 remaining prior Claude threads against current line numbers — still applicable

Re-affirmed prior threads (still open, no new comment):

members.rs:3180 — egraph/subst mutation runs before the iteration that decides success (cleanliness nit, not a correctness issue; egg dedupes)
test_cube_join_views.rs:283 — no test asserts RIGHT (right-side set) or FULL (no filters) join semantics

· branch ·

…l merge; test RIGHT/FULL Address review nits: - Construct the join-semantics set filters and mutate subst inside the innermost iteration, right before returning true, so a false return never leaves a stale subst entry or orphan filter e-nodes. - Add RIGHT JOIN and FULL JOIN tests to lock in the join-type table (right join key set filter; no filter for full). (The composite-key single-cube-per-side check was already added in a prior commit.) Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

claude · 2026-06-07T04:22:41Z

Claude finished @cursor[bot]'s task in 50s —— View job

Claude Code Review

Verdict: LGTM. New commit 45db195 resolves both prior open threads: presence-filter construction and subst.insert are now inside the innermost var_iter! loop right before return true (members.rs:3202-3231), so a false return never leaves orphan filter e-nodes or a stale subst entry; and the test file gains explicit RIGHT and FULL coverage that pins the join-type table (test_cube_join_views.rs:288-368). All three prior Claude threads are now resolved or addressed.

Findings on this turn: 0 high · 0 medium · 0 low · 0 nits new.

Details (expand)

Diffed 4872fcb..HEAD (single commit 45db195)
Verified the members.rs refactor moves the filter-chain construction inside the alias_to_cube × join_hints loop, after out_join_hints is built and immediately before return true. presence_members is computed up front (cheap, no egraph mutation) and consumed only on the success path. The borrow shape works because presence_members is iterated by reference (for name in &presence_members) and cloned per entry.
Verified the new RIGHT test asserts filters: Some(vec![set_filter("orders_view.customer_city")]) and FULL asserts filters: None — matches the table in the PR description and members.rs:3122-3134.
Resolved prior Claude thread on egraph mutation (PRRT_kwDOCOH19M6Hio-Y) and prior Claude thread on RIGHT/FULL test gap (PRRT_kwDOCOH19M6Hio-x). The composite-key thread had already been resolved with 4872fcb.
Skimmed the new tests against the helper shape used by the LEFT/INNER tests — same plan_view_join + find_cube_scan().request assertion, same setup. No drift.

Minor observation, not blocking: the FULL test asserts filters: None, which depends on subst[left_filters_var] resolving to an empty/None CubeScanFilters in the unmerged input — i.e. the test exercises both "skip the filter-chain block" and "the pass-through filters happened to be empty". If a future caller produces a FULL join with pre-existing filters, the assertion shape (None vs Some(vec![]) vs Some(vec![<pre-existing>])) is something to keep in mind, but it's not a current bug.

· [branch](https://github.com/cube-js/cube/tree/cursor/view-join-shared-member-375b)

…ush-down Introduce a MultiFactJoinWrapper intermediate egraph node so the shared-member view-join rewrite is no longer a single aggregate-bound rule. The rewrite now splits into: - shared-member-join-to-wrapper: Join(CubeScan, CubeScan) -> wrapper(CubeScan) - shared-member-join-extend-wrapper: Join(wrapper(CubeScan), CubeScan) -> wrapper(CubeScan), enabling joins of 3+ views - multi-fact-join-wrapper-filter-push-down: Filter(wrapper) -> wrapper(Filter), pushing WHERE/ON filters into the merged scan - aggregate-multi-fact-join-wrapper: unwrap only when GROUP BY matches the recorded join key The wrapper records the join key (as underlying cube members) so the finalize rule can verify the GROUP BY, while joins and filters compose beforehand. Adds tests for 3-way and 4-way FULL joins, a WHERE filter, and an ON-clause filter, in addition to the existing 2-way LEFT/INNER/RIGHT/FULL coverage. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

claude · 2026-06-07T18:43:56Z

Claude finished @cursor[bot]'s task in 2m 46s —— View job

Claude Code Review

Verdict: LGTM. New commits cee814e (MultiFactJoinWrapper for N-way view joins + filter push-down) and 1aca9d9 (docs) are a clean generalization of the previous aggregate-gated rule into a three-step pipeline: (1) shared-member-join-to-wrapper produces a MultiFactJoinWrapper, (2) shared-member-join-extend-wrapper chains additional view joins onto the wrapper, (3) aggregate-multi-fact-join-wrapper unwraps only when the GROUP BY matches the recorded join key. The wrapper carries cost-1 in BestCubePlan, so when finalize doesn't fire the original join tree is still preferred — no risk of an un-eliminated wrapper escaping into a final plan. The new filter push-down rule handles WHERE cleanly. No correctness blockers.

Findings on this turn: 0 high · 0 medium · 3 low · 1 nit — left inline.

Details (expand)

Diffed 45db195..HEAD (cee814e + 1aca9d9)
Reviewed the new rules (shared-member-join-to-wrapper, shared-member-join-extend-wrapper, multi-fact-join-wrapper-filter-push-down, aggregate-multi-fact-join-wrapper) and the renamed merge_shared_member_join / new finalize_shared_member_join transforms
Verified MultiFactJoinWrapper cost is 1 (cost.rs:130) — same band as JoinCheck* intermediates, so un-eliminated wrappers lose to the original join tree at extraction
Cross-checked existing review threads via gh list-review-threads — the 3 prior Claude threads are all resolved against earlier commits; no duplicates posted
Spot-checked the new tests (3-way / 4-way FULL, LEFT + WHERE, LEFT + ON-clause filter) against the rewrite logic
Skimmed the docs delta — multi-fact-views.mdx adds an N-way example and filter notes; joins.mdx cross-link unchanged

Inline findings:

[low] extend-wrapper only matches wrapper-on-left — right-associative c JOIN (o JOIN r) won't merge (members.rs:517-588).
[low] finalize_shared_member_join rejects non-Expr::Column GROUP BY — e.g. GROUP BY DATE_TRUNC('day', c.day) over a time-dim key silently won't unwrap; worth a comment or negative test (members.rs:3316-3387).
[low] No INNER/LEFT/RIGHT coverage for 3+ view joins — the new N-way tests are FULL-only, so the per-pass presence-filter accumulation through extend-wrapper isn't pinned (test_cube_join_views.rs:443-585).
[nit] Duplicate #[allow(clippy::too_many_arguments)] at members.rs:3044-3045.

Other notes (not blocking, not posted inline):

Order-of-application in mixed wrapper+filter chains is safe because egg keeps the bare-wrapper form in the same e-class even after multi-fact-join-wrapper-filter-push-down adds the filter-inside-wrapper variant — extend-wrapper can still match the bare form.
join_member_names is sorted+deduped before insertion into the wrapper, so the same N-way merge reached through different rule orders produces the same join_members payload — important for e-class dedup.
The new dimension_member_name resolver inside finalize_shared_member_join uses find_member_by_column on the merged scan; relies on the e-class analysis being populated by then, which it is in practice.

· branch ·

- Remove duplicate #[allow(clippy::too_many_arguments)] on merge_shared_member_join - Document the left-deep-only assumption of shared-member-join-extend-wrapper (right-associative a JOIN (b JOIN c) is not chained) - Document that finalize only accepts plain-column GROUP BY (wrapped exprs like DATE_TRUNC fall back to standard handling) - Add a 3-way LEFT join test pinning per-pass presence-filter accumulation through the extend-wrapper rule Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

claude · 2026-06-07T19:42:29Z

Claude finished @cursor[bot]'s task in 42s —— View job

Claude Code Review

Verdict: LGTM. New commit 144bb60 addresses all four prior nits from the last pass: the duplicate #[allow(clippy::too_many_arguments)] is removed, shared-member-join-extend-wrapper now documents that only the wrapper-on-left shape is matched (right-associative trees fall back), finalize_shared_member_join documents that wrapped GROUP BY expressions intentionally aren't recognized as the join key, and a new 3-way LEFT-join test pins the per-pass presence-filter accumulation through extend-wrapper.

Findings on this turn: 0 high · 0 medium · 0 low · 0 nits new. All 4 prior open Claude threads resolved.

Details (expand)

Diffed 1aca9d9..HEAD (single commit 144bb60, +57/-1 across members.rs and test_cube_join_views.rs)
Verified the duplicate attribute is gone at members.rs:3050
Verified the new comment at members.rs:516-521 accurately describes the wrapper-on-left limitation
Verified the new comment at members.rs:3375-3378 documents the Expr::Column-only GROUP BY gate
Verified the new test_group_by_left_join_three_views_presence_filters test asserts both per-pass set filters appear (orders_view.customer_city + customers_view.customer_city), pinning the union-across-passes behavior the extend-wrapper path produces via repeated subst.insert(left_filters_var, ...)
Resolved the four prior Claude threads (PRRT_kwDOCOH19M6HrfrK, …HrfsS, …Hrft8, …Hrfvo) — concerns no longer applicable
Cross-checked remaining open threads — none from Claude remain unresolved

Nothing further to flag. The PR is in a good state.

· branch ·

A join written directly on DATE_TRUNC (ON DATE_TRUNC(g, a.ts) = DATE_TRUNC(g, b.ts)) is lowered by the planner to Filter(<eq>, CrossJoin(...)) rather than a column equi-join, so it never reached the shared-member merge. Add a shared-time-member-cross-join-to-wrapper rule that recognizes this shape, resolves both truncated columns to the same underlying time member at the same granularity, and merges into an INNER multi-fact CubeScan (both keys marked present). Grouping by DATE_TRUNC already worked via referenced-column collapse; the finalize comment is corrected accordingly. Adds time dimensions to the test views and tests for join-on-raw-time + GROUP BY DATE_TRUNC (LEFT) and join-on-DATE_TRUNC + GROUP BY DATE_TRUNC (INNER). Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

claude · 2026-06-07T20:07:16Z

Claude Code is working…

I'll analyze this and get back to you.

View job run

Add a shared customer_state dimension and tests for joining two views on a composite key (customer_city + customer_state) and grouping by both, plus a negative test that a partial GROUP BY (only one of the two join keys) does not merge. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

claude · 2026-06-07T20:09:57Z

Claude finished @cursor[bot]'s task in 3m 39s —— View job

Claude Code Review

Verdict: LGTM. Two new commits since the last pass: 3d11f94 adds a shared-time-member-cross-join-to-wrapper rule that merges joins written directly on DATE_TRUNC(...)=DATE_TRUNC(...) (the SQL planner lowers them to Filter(<eq>, CrossJoin(...))), and e5e24bf extends test coverage to composite-key joins (multiple shared dimensions). No correctness blockers. The new rule mirrors the structure of merge_shared_member_join (granularity match, both columns resolve to the same underlying time member, INNER set-filter on both sides, mutations inside the innermost loop right before return true), and the composite-key tests pin both the positive case (GROUP BY covering the full key) and a partial-GROUP-BY negative.

Findings on this turn: 0 high · 0 medium · 2 low · 0 nits new. All 7 prior Claude threads resolved or no longer applicable.

Details (expand)

Diffed 144bb60..HEAD (3d11f94 + e5e24bf): +309/-6 in members.rs, +187 in test_cube_join_views.rs, +29 in multi-fact-views.mdx
Read the new merge_shared_time_cross_join and the 4 new tests
Verified all 7 prior Claude review threads are resolved (per gh list-review-threads)
Cross-checked the new rule against existing patterns and engine/df/optimizers/filter_push_down.rs:114 upstream lowering note
Posted findings as inline comments

Inline findings:

[low] No guard that JOIN granularity matches GROUP BY granularity (members.rs:3483) — ON DATE_TRUNC('month',...)=DATE_TRUNC('month',...) paired with GROUP BY DATE_TRUNC('day',...) would merge and stitch at day-grain, diverging from the user-written month-join semantics whenever rows fall on different days in the same month. Either reject the merge when grain differs or document the assumption.
[low] Test name says INNER but SQL uses LEFT JOIN (test_cube_join_views.rs:741) — test_inner_join_on_date_trunc_group_by_date_trunc asserts both-sides set filters (INNER) but the SQL is LEFT JOIN. The merge works because DataFusion lowers non-equi LEFT JOINs to Filter(CrossJoin) upstream (already noted in filter_push_down.rs:114), which strips LEFT-outer semantics before this rule sees the plan. Pre-existing upstream behavior, not introduced here — but the test reads misleadingly. Suggest using INNER JOIN in the SQL or adding a comment that LEFT is intentional to exercise the upstream-lowering path.

Other notes (not blocking, not posted inline):

join_member_names for the time-cross-join case is built from one entry (resolve_underlying(&left_key)) and then sort()/dedup()'d — the sort/dedup pair is a no-op for a one-element Vec. Mirroring merge_shared_member_join's shape so kept consistent, but worth knowing it isn't doing work.
dimension_member_name accepts both Member::Dimension and Member::TimeDimension. The rule is named "shared-time-member" and documented for time dimensions, but technically a DATE_TRUNC over a plain dimension that the meta also exposes would match the gate. In practice this is an execution-layer no-op (DATE_TRUNC on a non-temporal column fails at runtime), so it's not a security or correctness concern, just an over-broad gate.
The new rule mutates the egraph and inserts into subst only inside the innermost loop just before return true — same shape as the prior pass's resolution. Good.
Composite-key positive test (test_left_join_on_multiple_dimensions_group_by_both) asserts presence filters in reverse declaration order (customer_state then customer_city) — that's an artifact of the acc = [filter, acc] prepending loop in merge_shared_member_join. Stable for a given members order, so the assertion is fine.

· branch ·

claude · 2026-06-07T20:13:12Z

+            ..Default::default()
+        }
+    )
+}


[low] Test name says INNER but the SQL uses LEFT JOIN.

The test asserts INNER semantics (set_filter("orders_view.created_at") AND set_filter("customers_view.created_at") — both sides marked present), but the SQL uses LEFT JOIN. The merge happens because DataFusion lowers a non-equi LEFT JOIN to Filter(<eq>, CrossJoin) upstream (see the existing comment at engine/df/optimizers/filter_push_down.rs:114 — "Current DataFusion version plans complex joins as Filter(CrossJoin)"), which silently strips the LEFT-outer semantics before this rule sees the plan.

It's a pre-existing upstream behavior, not introduced here, but as written the test reads as "writing LEFT JOIN on DATE_TRUNC returns INNER results" without flagging that. Suggest either using INNER JOIN in the SQL (matching the rule's docstring on members.rs:594-599 "Merge an INNER join expressed as a date-truncated equality"), or adding a comment to the test that the LEFT keyword is intentional to exercise the upstream-lowering path and the loss-of-LEFT is expected.

…sion The planner turns 'ON a.dim = b.dim AND DATE_TRUNC(g, a.ts) = DATE_TRUNC(g, b.ts)' into Filter(<trunc eq>, Join(a.dim = b.dim, ...)). The column join becomes a MultiFactJoinWrapper; add a multi-fact-join-wrapper-absorb-time-key rule that folds the truncated time member into the wrapper's recorded join key (marking both time columns present, since a post-join equality is INNER on that key) so a query grouped by both the time dimension and the dimension merges into one multi-fact CubeScan. Adds a test for the mixed DATE_TRUNC + dimension join. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

claude · 2026-06-07T21:40:03Z

Claude finished @cursor[bot]'s task in 2m 6s —— View job

Claude Code Review

Verdict: LGTM. New commit 3614128 adds multi-fact-join-wrapper-absorb-time-key, which folds a DATE_TRUNC(g, a.ts) = DATE_TRUNC(g, b.ts) filter sitting on top of an existing MultiFactJoinWrapper into the wrapper's join key — covering the composite shape ON a.dim = b.dim AND DATE_TRUNC(a.ts) = DATE_TRUNC(b.ts) (the planner makes the column eq the join and keeps the trunc eq as a top filter). The fold checks both granularities match and both columns resolve to the same underlying time member; both time columns are marked present (INNER on the absorbed key, which matches SQL LEFT JOIN + WHERE = INNER semantics on the join key). Test asserts GROUP BY on the time dim + dim, with the absorbed time key driving the time_dimensions entry. No correctness blockers.

Findings on this turn: 0 high · 0 medium · 0 low · 0 nits new. Posted no new inline comments.

Details (expand)

Diffed e5e24bf..HEAD — single commit 3614128: +211 in members.rs (one new rule + transform), +55 in test_cube_join_views.rs (one positive test), +12 in multi-fact-views.mdx
Read the new absorb_time_key_into_wrapper transform — same shape as merge_shared_time_cross_join (granularity equality, underlying-member equality, INNER-style both-sides set filter, mutations only inside the success path right before return true)
Verified join_members is unioned with prev_join_members then sorted/deduped, so the finalize GROUP BY check sees both the column key and the time key as required
Spot-checked the test: GROUP BY on DATE_TRUNC('day', c.created_at) + c.customer_city resolves via referenced_expr to the underlying columns/members, and the merged scan emits created_at as time_dimensions[day] + customer_city as dimensions
Cross-checked review threads — 7 prior Claude threads resolved; 2 prior open Claude threads still applicable (see below), not re-posting
Confirmed the noted-as-resolved comment at members.rs:3882-3888 clarifies why Expr::Column-only check on GROUP BY isn't actually restrictive (referenced_expr collapses DATE_TRUNC(g, c.ts) down to its inner column reference)

Re-affirmed prior open threads (still apply, no new comment):

members.rs:3556 — no guard that JOIN granularity matches GROUP BY granularity. The same concern applies to the new absorb-time-key rule: if the absorbed DATE_TRUNC(month, ...) pairs with GROUP BY DATE_TRUNC(day, ...), the merge happens at day-grain. Worth covering both rules with one fix.
test_cube_join_views.rs:741 — test_inner_join_on_date_trunc_group_by_date_trunc uses LEFT JOIN but asserts INNER semantics (relies on upstream LEFT→Filter(CrossJoin) lowering).

Other notes (not blocking, not posted inline):

The new test's SQL uses JOIN (INNER) — good, no naming ambiguity here.
presence_members = [left_key, right_key] always inserts two filter members. If left_key == right_key (degenerate case of both sides resolving to the same member-name string — only possible if both columns truly refer to the same cube member on the same side, which the gate already rejects), this would silently produce a duplicate filter entry. Not reachable today; same pattern as the sibling rules.
The rule has no guard that the wrapper's prev_join_members is non-empty. By construction (the wrapper can only have been produced by one of the three creation rules, all of which insert at least one key), it always is — but worth knowing the assumption isn't asserted.

· branch ·

…P BY Per review: the multi-fact stitch happens at the GROUP BY grain, so the join key's granularity must equal the GROUP BY granularity for a time member. Previously only the underlying member name was recorded, so joining on DATE_TRUNC('month', ...) while grouping by DATE_TRUNC('day', ...) merged and stitched at day grain, diverging from the written join. Store join_members as (underlying member, Option<granularity>) on the wrapper (None for plain dimensions, Some(grain) for DATE_TRUNC time keys), and at finalize extract each GROUP BY expression's granularity from its original_expr (rather than referenced_expr, which drops the grain) and require the full (member, granularity) sets to match. Replaces the raw-time-join test with a granularity-mismatch negative test. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

claude · 2026-06-08T00:00:54Z

Claude finished @cursor[bot]'s task in 2m 15s —— View job

Claude Code Review

Verdict: Request changes — 1 medium. Latest commit 379a2a1 correctly tightens the JOIN/GROUP BY granularity gate (resolved the prior open thread on this), but in doing so it silently breaks the documented "raw time column join + GROUP BY DATE_TRUNC" shape. The test that pinned that behavior was deleted in the same commit and replaced with a granularity-mismatch negative — so no test catches the regression — while multi-fact-views.mdx:374-381 and the PR description still claim it works. Either restore the case in the rule (treat a (name, None) join key against a (name, Some(grain)) GROUP BY as compatible when the member is a time dim) or update the docs/PR description to reflect the tightened gate. Posted inline at members.rs:3378.

Findings on this turn: 0 high · 1 medium · 0 low · 0 nits. Resolved 1 prior thread (Hr11I). 1 prior open Claude thread still applies (Hr18_ — test_inner_join_on_date_trunc_group_by_date_trunc SQL still uses LEFT JOIN).

Details (expand)

Diffed 3614128..HEAD (single commit 379a2a1 — granularity gate)
Verified the rule change: join_members is now Vec<(String, Option<String>)>; plain dim joins record (name, None), time-key joins record (name, Some(grain)). Finalize extracts (col, None) from Expr::Column or (col, Some(grain)) from Expr::ScalarFunction { fun: DateTrunc, .. } and compares exact set equality.
Confirmed the regression by code-tracing: LEFT JOIN o ON o.created_at = c.created_at records (created_at, None); GROUP BY DATE_TRUNC('day', c.created_at) extracts (created_at, Some("day")); sets differ → no merge.
Confirmed no test pins the doc'd raw-time-column + DATE_TRUNC-GROUP-BY shape anymore (the previous one was renamed/replaced into the negative).
Confirmed the doc still claims this shape works (multi-fact-views.mdx:374-381).
Resolved the prior open thread on granularity guard (PRRT_kwDOCOH19M6Hr11I).
Re-checked the remaining prior open thread (Hr18_, test name says INNER but SQL uses LEFT) against current line numbers — still applies at test_cube_join_views.rs:692-734; not re-posting.

Other notes (not blocking):

The new finalize code uses original_expr (which preserves the wrapped expression) rather than referenced_expr (which collapses it) — good fix, the reason it can recognize DATE_TRUNC shape at all.
Vec::sort() on Vec<(String, Option<String>)> uses tuple ordering — (a, None) sorts before (a, Some(_)), so dedup still works correctly even when the same member appears with both grains.
parse_granularity(scalar, false) returns None for unrecognized literals (e.g. 'qtr'), so the rule conservatively bails instead of silently matching.

· branch ·

claude · 2026-06-08T00:02:53Z

+            // already recorded on the left wrapper (for chained 3+ view joins).
+            let mut join_member_names: Vec<(String, Option<String>)> = shared_left_keys
+                .iter()
+                .map(|k| (resolve_underlying(k), None))


[medium] Granularity-strict matching silently breaks the documented "raw time column + GROUP BY DATE_TRUNC" shape.

merge_shared_member_join records every shared key as (name, None) (line 3378). finalize_shared_member_join now requires the GROUP BY pair to equal the join key exactly — a DATE_TRUNC('day', c.created_at) group-by extracts (created_at, Some("day")), which does not match (created_at, None). So a query like

LEFT JOIN o ON o.created_at = c.created_at GROUP BY DATE_TRUNC('day', c.created_at)

no longer merges. That's exactly the first shape documented as supported in docs-mintlify/docs/data-modeling/multi-fact-views.mdx:374-381 ("Join on the raw time column, group by DATE_TRUNC") and was previously pinned by test_left_join_raw_time_group_by_date_trunc — which was deleted in 379a2a1 and replaced with a granularity-mismatch negative, so no test catches the regression.

If the strict (member, grain) match is the intended design (the GROUP BY grain must match the join grain — and a raw column join is "no grain" so it never matches a truncated GROUP BY), then the docs example needs to be removed/rewritten and the PR description's "Both shapes of the common time multi-fact pattern are supported" claim updated. Otherwise the resolution rule should treat a (name, None) join key on a time member as compatible with a (name, Some(grain)) GROUP BY (the user joined at the row grain and is free to group by any truncation), restoring the deleted test.

Either way worth aligning the code with the docs before this merges.

github-actions Bot added the rust Pull requests that update Rust code label May 31, 2026

vercel Bot deployed to Preview May 31, 2026 22:21 View deployment

paveltiunov marked this pull request as ready for review June 3, 2026 01:17

paveltiunov requested a review from a team as a code owner June 3, 2026 01:17

MazterQyou approved these changes Jun 3, 2026

View reviewed changes

cursoragent and others added 8 commits June 5, 2026 23:46

test(cubesql): cover view join merge on shared cube member

f60313e

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

chore: re-trigger CI (flaky Windows native + concurrency-canceled red…

9a682af

…shift) Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

cursor Bot force-pushed the cursor/view-join-shared-member-375b branch from 21aa80a to 9a682af Compare June 5, 2026 23:51

vercel Bot deployed to Preview June 5, 2026 23:54 View deployment

claude Bot reviewed Jun 6, 2026

View reviewed changes

Comment thread rust/cubesql/cubesql/src/compile/rewrite/rules/members.rs Outdated

claude Bot reviewed Jun 6, 2026

View reviewed changes

Comment thread rust/cubesql/cubesql/src/compile/rewrite/rules/members.rs Outdated

claude Bot reviewed Jun 6, 2026

View reviewed changes

Comment thread rust/cubesql/cubesql/src/compile/test/test_cube_join_views.rs

cursor Bot requested a review from keydunov as a code owner June 6, 2026 21:01

cursoragent and others added 2 commits June 7, 2026 18:42

docs: document N-way view joins and filter support in the SQL API

1aca9d9

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>