Skip to content

feat(cubesql): merge view joins on a shared cube member into a single CubeScan#10977

Open
paveltiunov wants to merge 20 commits into
masterfrom
cursor/view-join-shared-member-375b
Open

feat(cubesql): merge view joins on a shared cube member into a single CubeScan#10977
paveltiunov wants to merge 20 commits into
masterfrom
cursor/view-join-shared-member-375b

Conversation

@paveltiunov

@paveltiunov paveltiunov commented May 31, 2026

Copy link
Copy Markdown
Member

Summary

When the Tesseract SQL planner is enabled, the SQL API now merges a join of
views on a dimension that resolves to the same underlying cube member into a
single multi-fact CubeScan, instead of erroring out. The planner stitches the
fact groups with a FULL OUTER JOIN over the shared key; the requested SQL join
type (INNER/LEFT/RIGHT/FULL) is reconstructed with set filters on the
join-key dimension(s).

Motivating query:

SELECT c.customer_city, measure(o.revenue), measure(c.avg_age)
FROM customers_view c
LEFT JOIN orders_view o ON o.customer_city = c.customer_city
GROUP BY 1

How it works

The rewrite is a local CubeSQL e-graph rewrite, structured around a new
intermediate MultiFactJoinWrapper node that carries the join key (as
underlying cube members) so joins and filters can compose before the query is
finalized at the aggregate:

  • shared-member-join-to-wrapperJoin(CubeScan, CubeScan)
    MultiFactJoinWrapper(CubeScan)
  • shared-member-join-extend-wrapperJoin(MultiFactJoinWrapper(CubeScan), CubeScan)
    MultiFactJoinWrapper(CubeScan), enabling joins of 3+ views (left-deep)
  • shared-time-member-cross-join-to-wrapperFilter(DATE_TRUNC = DATE_TRUNC, CrossJoin(CubeScan, CubeScan))
    MultiFactJoinWrapper(CubeScan), supporting joins written directly on
    DATE_TRUNC (which the planner lowers to a filtered cross join, i.e. INNER)
  • multi-fact-join-wrapper-filter-push-downFilter(MultiFactJoinWrapper)
    MultiFactJoinWrapper(Filter), pushing WHERE/ON filters into the merged scan
  • aggregate-multi-fact-join-wrapper — unwraps the wrapper only when the
    GROUP BY exactly matches the recorded join key

Time dimensions

Both shapes of the common time multi-fact pattern are supported:

  • Join on the raw shared time column + GROUP BY DATE_TRUNC('day', …) (grouped
    column emitted as a timeDimensions entry with granularity).
  • Join directly on DATE_TRUNC('day', a.ts) = DATE_TRUNC('day', b.ts) (INNER),
    requiring both truncated columns to resolve to the same underlying time member
    at the same granularity.

Guards

The merge only happens when:

  • The Tesseract SQL planner is enabled (CUBEJS_TESSERACT_SQL_PLANNER).
  • Both sides of the join condition resolve to the same underlying cube member,
    and the join key is composed only of dimensions.
  • Every join-key column on a side resolves to a single cube/view.
  • The query is grouped by exactly the join key. Ungrouped joins (SELECT *) and
    group-by-a-different-dimension queries fall back to standard join handling
    (and error as before).

Tests

compile::test::test_cube_join_views:

  • 2-way LEFT/INNER/RIGHT/FULL join semantics
  • 3-way and 4-way FULL joins; 3-way LEFT pinning per-pass presence filters
  • WHERE filter pushed into the merged scan; ON-clause filter on a single fact
  • Join on a shared time column + GROUP BY DATE_TRUNC (LEFT); join on
    DATE_TRUNC + GROUP BY DATE_TRUNC (INNER)
  • Negative: Tesseract disabled, ungrouped SELECT *, group-by non-join-key, join on a measure

Docs

docs-mintlify/docs/data-modeling/multi-fact-views.mdx and the SQL API joins
reference document the behavior, requirements, N-way joins, time-dimension
joins, filtering, and the join-type semantics table.

Open in Web Open in Cursor 

@github-actions github-actions Bot added the rust Pull requests that update Rust code label May 31, 2026
@codecov

codecov Bot commented Jun 1, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 95.40317% with 61 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.54%. Comparing base (0d6393b) to head (379a2a1).
⚠️ Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
...besql/cubesql/src/compile/rewrite/rules/members.rs 93.04% 43 Missing ⚠️
...l/cubesql/src/compile/test/test_cube_join_views.rs 97.42% 18 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10977      +/-   ##
==========================================
+ Coverage   83.32%   83.54%   +0.21%     
==========================================
  Files         255      256       +1     
  Lines       77261    78588    +1327     
==========================================
+ Hits        64379    65655    +1276     
- Misses      12882    12933      +51     
Flag Coverage Δ
cubesql 83.54% <95.40%> (+0.21%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@paveltiunov paveltiunov marked this pull request as ready for review June 3, 2026 01:17
@paveltiunov paveltiunov requested a review from a team as a code owner June 3, 2026 01:17
@claude

claude Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Claude finished @paveltiunov's task in 2m 14s —— View job


Claude Code Review

Verdict: LGTM with a few nits. No correctness blockers found. 1 low (test gap), 2 nits (cleanliness/efficiency).

Full review (expand)
  • Read PR context
  • Read changed files (members.rs push_down_cube_join + test_cube_join_views.rs)
  • Check existing review threads (none)
  • Post review

Summary

The change cleanly generalizes push_down_cube_join to also match dimension equi-joins whose left/right columns resolve to the same underlying cube member (via alias_member). The merge gate (join key fully within dimensions, every column pair resolving to the same underlying member) is appropriately conservative — joins on measures/segments or on different underlying members fall through to the existing non-merged handling, so risk of regression in unrelated query shapes is low. Re-introducing INNER/LEFT/RIGHT semantics on top of the multi-fact FULL OUTER stitch by injecting set filters on the required-side join keys is a sensible approach and matches the prior PR refactor away from a measure-based presence marker.

Findings

[Low] Test coverage gap for RIGHT JOIN and FULL JOIN.
test_cube_join_views.rs covers the un-grouped LEFT, grouped LEFT, grouped INNER, and a negative (measure key). The semantics table in the PR description lists FULL (no filters) and RIGHT (right-side set filter), but neither is asserted. A 4th case for RIGHT JOIN (and optionally a FULL JOIN asserting filters: Some(vec![])) would lock in the behavior described in the table — members.rs:2947-2955 is otherwise the only spec for it.

[Nit] Dangling egraph mutations on no-op outcome.
In push_down_cube_join (members.rs:2966-2986), egraph.add(...) for the filter chain and the subst.insert(left_filters_var, acc) run before the nested 4-deep var_iter! loops at members.rs:2989-3034. If those loops yield zero combinations, the function returns false while leaving orphan filter e-nodes in the egraph and a stale subst mutation. egg dedups identical adds so it's not a correctness or blow-up risk, but it's tidier to construct the filter chain inside the innermost loop right before return true, or after the alias_to_cube validation passes. Today's pattern relies on egg's idempotency and on the outer loops effectively always being non-empty (because is_proper_cube_join_condition / the shared-member detection has already validated members exist).

[Nit] Composite-key cube name only carries the last column.
members.rs:2910-2914 overwrites left_cube_name / right_cube_name each loop iteration via name.split('.').next(). For a single-column join (the only realistic case today this rewrite fires on) this is fine, but for a hypothetical composite key spanning columns from different cubes on one side, the code would silently pick whichever cube the last column belongs to. Worth either asserting all columns share a cube prefix (return None otherwise) or adding a comment that single-cube-per-side is the assumed precondition.

Other notes

  • The verbose crate::compile::rewrite::FilterMember{Member,Op,Values} paths at members.rs:2969-2977 are inconsistent with the rest of the file's import style. Optional cleanup — add to the existing use block at the top.
  • The shared-member path does not gate on member_name_to_expr.is_some() the way is_proper_cube_join_condition does (members.rs:3229-3237). In practice the e-class analysis is populated by the time joins are considered, but it's worth a quick check or a comment noting why the guard isn't needed here.
  • Comments are thorough and the rationale (FULL OUTER stitch + set filters to recover join semantics) is well captured at the code level.
· [branch](https://github.com/cube-js/cube/tree/cursor/view-join-shared-member-375b)

cursoragent and others added 8 commits June 5, 2026 23:46
…eScan

Generalize the push-down-cube-join rewrite so that a join between two
CubeScans (typically views) on a dimension that resolves to the same
underlying cube member is merged into a single CubeScan, just like the
existing __cubeJoinField cube-to-cube join. A view dimension keeps its
original cube.dimension path in alias_member, which is used to detect
that both sides of the equi-join reference the same shared key.

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
Mirror the motivating query exactly:
  SELECT c.customer_city, measure(o.revenue), measure(c.avg_age)
  FROM customers_view c
  LEFT JOIN orders_view o ON o.customer_city = c.customer_city
  GROUP BY 1
and assert it merges into a single grouped multi-fact CubeScan.

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
When merging a join between two views on a shared cube member, the
downstream multi-fact query is rendered as a FULL OUTER JOIN over the
shared key. To recover the requested join semantics, the rewrite now
adds a measure 'set' filter on each side that must be present:
- INNER: both sides required
- LEFT:  left side required
- RIGHT: right side required
- FULL:  no extra filter

Branch presence is detected via a measure of the side (the grouping key
is COALESCEd across sides downstream, so it cannot tell sides apart).
Covered with left/inner group-by tests.

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
…e filter

Detect side presence with the side's join-key dimension instead of an
arbitrary measure. The join key is always available and is the actual
shared-key marker, avoiding the nullable-measure caveat and the case
where a side has no selected measure.

- LEFT:  left join key must be set
- RIGHT: right join key must be set
- INNER: both join keys must be set
- FULL:  no extra filter

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
…n dimensions

Make the merge gate explicit: the entire join key must resolve to
dimensions (or time dimensions) on both sides and to the same underlying
cube member. A join key that touches a measure/segment/etc. is rejected
and the join falls back to normal (non-merged) handling. Add a negative
test that joining two views on measures is not merged.

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
- Add #[cfg(test)] to the test_cube_join_views module so it is not
  compiled into non-test builds (fixes unresolved pretty_assertions and
  unused-import errors under clippy -D warnings and the native builds).
- Remove the unused right_filters_var from push_down_cube_join.

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
…shift)

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
@cursor cursor Bot force-pushed the cursor/view-join-shared-member-375b branch from 21aa80a to 9a682af Compare June 5, 2026 23:51
@claude

claude Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Claude finished @cursor[bot]'s task in 46s —— View job


Claude Code Review

Verdict: LGTM. No correctness blockers. The merge gate ("join key fully within dimensions, same underlying member on both sides") is conservative, so non-view/unrelated joins fall through unchanged. The set-filter trick to recover INNER/LEFT/RIGHT on top of the FULL OUTER multi-fact stitch is consistent with the prior PR refactor away from a measure-based presence marker.

Notes (expand) — 1 low, 2 nits; all re-affirmed from the prior review pass

[Low] No test for RIGHT or FULL join semantics.
test_cube_join_views.rs covers ungrouped/grouped LEFT, grouped INNER, and a negative (measure key). The semantics table in the PR description lists FULL (no filters) and RIGHT (right-side set). Both branches in members.rs:2947-2955 are otherwise unexercised — a quick RIGHT case and a FULL case (asserting filters: Some(vec![])) would lock the table in.

[Nit] Egraph/subst mutation runs before the alias_to_cube validation.
members.rs:2966-2986 adds filter e-nodes and calls subst.insert(left_filters_var, acc) before the nested var_iter! loops at members.rs:2989-3034 decide whether to return true. If those loops yield zero combinations (rare in practice — is_proper_cube_join_condition/shared-member detection already validated the members), the function returns false having left orphan filter nodes and a stale subst entry. egg dedupes identical adds, so this isn't a correctness or blow-up risk, but constructing the filter chain inside the innermost loop right before return true (or after the alias_to_cube iteration produces ≥1 pair) would be tidier.

[Nit] Composite-key cube name only carries the last column.
members.rs:2910-2914 overwrites left_cube_name/right_cube_name each loop iteration via name.split('.').next(). Fine for single-column joins (the only case the rewrite realistically fires on today), but for a hypothetical composite key whose columns span different cubes on one side, the code silently picks whichever cube the last column belongs to. Either assert all columns share a cube prefix (return None otherwise) or add a comment that single-cube-per-side is the assumed precondition.

Style: the verbose crate::compile::rewrite::FilterMember* paths at members.rs:2969-2977 are inconsistent with the file's import style — fold them into the existing use block when convenient.

· [branch](https://github.com/cube-js/cube/tree/cursor/view-join-shared-member-375b)

…join key

Move the shared-member view-join merge out of push_down_cube_join (which
runs on the always-ungrouped raw join) into a new rule that matches an
Aggregate over the join. The merge now only fires when:
- the query is grouped (an Aggregate sits over the join), and
- the GROUP BY is exactly the shared join key.

This rejects ungrouped queries (e.g. SELECT * over the join) and queries
that group by a non-join-key dimension, both of which would otherwise
produce an incorrect multi-fact pushdown. push_down_cube_join is restored
to its original __cubeJoinField-only behavior.

Tests: grouped left/inner merge (with join-key set filters); ungrouped,
group-by-mismatch, and measure-key joins are not merged.

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
@claude

claude Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Claude finished @cursor[bot]'s task in 1m 31s —— View job


Claude Code Review

Verdict: LGTM. No correctness blockers. The latest commit (6306a2f) moves the merge out of push_down_cube_join into a new aggregate-gated rule, so the rewrite now only fires when there's an Aggregate over the join and the GROUP BY is exactly the shared join key — a much tighter gate than before. Ungrouped joins and group-by-non-key cases that would have produced an incorrect multi-fact pushdown are now rejected (the new negative tests pin this down). push_down_cube_join is back to its original __cubeJoinField-only shape.

Findings: 0 high · 0 medium · 1 low · 2 nits — all left as inline comments. Re-affirms two prior threads (composite-key cube name, egraph mutation before final validation) against the new location; the RIGHT/FULL test gap is unchanged.

Notes (expand)
  • Read PR context and re-review since last pass
  • Read members.rs diff (new push_down_aggregate_shared_member_join rule) and test_cube_join_views.rs
  • Check existing review threads (none open)
  • Post review

[Low] No RIGHT or FULL join test. Covered: grouped LEFT, grouped INNER, ungrouped/group-by-mismatch/measure-key (negatives). The semantics table lists FULL (no filters) and RIGHT (right-side set) but neither is asserted — members.rs:3122-3134 is otherwise unexercised. Inlined on the test file.

[Nit] Composite-key cube name keeps only the last column's cube. members.rs:3045-3046 overwrites left_cube_name/right_cube_name each loop iteration. Single-column shared keys (the realistic case) are unaffected; composite keys spanning multiple cubes on one side would silently use the last column's cube in out_join_hints. Inlined.

[Nit] Egraph/subst mutation runs before the iteration that decides success. members.rs:3138-3157 adds filter e-nodes and inserts into subst before the outer var_iter! loops decide whether to return true. egg dedupes, so it's not a correctness/blow-up risk in practice, but constructing the filter chain right before return true would be tidier. Inlined.

Style: the verbose crate::compile::rewrite::FilterMember* paths at members.rs:3141-3149 are inconsistent with the file's import style — fold into the existing use block when convenient.

· · [branch](https://github.com/cube-js/cube/tree/cursor/view-join-shared-member-375b)

Comment thread rust/cubesql/cubesql/src/compile/rewrite/rules/members.rs Outdated
Comment thread rust/cubesql/cubesql/src/compile/rewrite/rules/members.rs Outdated
Comment thread rust/cubesql/cubesql/src/compile/test/test_cube_join_views.rs
The shared-member view-join merge produces a multi-fact query that is
only handled correctly by the Tesseract SQL planner (FULL OUTER stitch
over the shared key). Add an `enable_tesseract_sql_planner` config
option (read from CUBEJS_TESSERACT_SQL_PLANNER) and only fire the rewrite
when it is enabled. Add a test that the merge does not happen when
Tesseract is disabled.

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
@claude

claude Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Claude finished @cursor[bot]'s task in 1m 6s —— View job


Claude Code Review

Verdict: LGTM. New commit 93e642b adds a config gate on the Tesseract SQL planner — the merge only fires when CUBEJS_TESSERACT_SQL_PLANNER=true, matching nativeSqlPlanner in packages/cubejs-backend-shared/src/env.ts:318. Default is false, so non-Tesseract deployments are unaffected. A negative test pins down that the merge doesn't fire when the gate is off.

Findings since last review: 0 high · 0 medium · 0 low · 0 nits new — 3 prior unresolved threads (1 low, 2 nits) still apply against the new line numbers; not re-posting.

Details (expand)
  • Reviewed 93e642b diff (members.rs gate + new config field + test refactor)
  • Confirmed CUBEJS_TESSERACT_SQL_PLANNER matches the existing JS env var (packages/cubejs-backend-shared/src/env.ts:318)
  • Re-checked the 3 prior unresolved threads against current line numbers — still applicable, not duplicating

Re-affirmed prior threads (still open, no new comment):

  • members.rs:3053-3054 — composite-key cube name only carries the last column
  • members.rs:3146-3166 — egraph/subst mutation runs before the iteration that decides success
  • test_cube_join_views.rs:283 — no RIGHT or FULL join test

On the new commit specifically: the gate is read once at rule-construction time (self.config_obj.enable_tesseract_sql_planner() outside the closure, captured by move), so it's not re-evaluated per egraph application — appropriate, since ConfigObj is process-wide. Default of false in both ConfigObjImpl::new() and Config::test() matches the JS-side default. CompilationError::Rewrite is the expected error class when the rewrite doesn't fire and no other rule can plan the join, so the new negative test's assertion is the right shape.

· branch ·

Document that joining two views on a dimension that resolves to the same
underlying cube member (grouped by that key) triggers a multi-fact query
in the SQL API, including the join-type semantics (inner/left/right/full)
and the Tesseract requirement. Add the behavior to the multi-fact views
page and a cross-referencing section in the SQL API joins reference.

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
@cursor cursor Bot requested a review from keydunov as a code owner June 6, 2026 21:01
@claude

claude Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Claude finished @cursor[bot]'s task in 48s —— View job


Claude Code Review

Verdict: LGTM. Latest commit (3b61886) is docs-only — adds docs-mintlify/docs/data-modeling/multi-fact-views.mdx and a "Joining views on a shared dimension" section in joins.mdx. The Rust changes are unchanged since the previous pass. No new correctness blockers.

Findings on this turn: 0 high · 0 medium · 0 low · 0 nits new. Re-affirmed 3 prior unresolved threads (1 low, 2 nits) — still applicable, not duplicating.

Details (expand)
  • Diffed 93e642b..HEAD (docs-only): multi-fact-views.mdx new, joins.mdx cross-link section
  • Verified docs.json nav registers docs/data-modeling/multi-fact-views (line 139)
  • Re-checked the 3 prior unresolved Claude threads against current code — line numbers and concerns unchanged
  • Confirmed authoring location matches docs-mintlify/CLAUDE.md conventions (no body H1, title/description frontmatter, root-relative internal links, language-fenced code blocks, <Warning>/<CodeGroup> used correctly)

Docs review (multi-fact-views.mdx + joins.mdx):

  • Frontmatter, headings, callouts, and CodeGroup tabs follow docs-mintlify/CLAUDE.md exactly.
  • The semantics table in multi-fact-views.mdx:356-361 matches the implementation table in members.rs:3122-3134 (FULL → no filters; INNER → both sides; LEFT → left; RIGHT → right).
  • Gating conditions in multi-fact-views.mdx:339-349 faithfully describe the three implementation gates (Tesseract enabled, dimension-only join key, GROUP BY = join key).
  • Cross-links resolve: ref-tesseract-env/reference/configuration/environment-variables#cubejs_tesseract_sql_planner, ref-multi-fact-views/docs/data-modeling/multi-fact-views. Both targets exist.
  • One very minor nit (not posting): the example at multi-fact-views.mdx:324-332 joins on name while the modeled customers cube doesn't expose name through the customers_view/orders_view/returns_view shape introduced earlier — the example uses different view names than the modeled customer_overview view. Readers can follow the intent, so not worth blocking on; could be clarified with one sentence noting these are separate single-fact views.

Re-affirmed prior threads (still open, no new comment):

  • members.rs:3054 — composite-key cube name only carries the last column
  • members.rs:3165 — egraph/subst mutation runs before the iteration that decides success
  • test_cube_join_views.rs:283 — no RIGHT or FULL join test

· branch ·

…e/view

A CubeScan can expose members from multiple cubes/views, so enforce that
every join-key column on each side resolves to the same cube/view. A
mismatch would make the merged join hint ambiguous, so such joins are no
longer merged.

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
@claude

claude Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Claude finished @cursor[bot]'s task in 46s —— View job


Claude Code Review

Verdict: LGTM. The new commit 4872fcb addresses the prior composite-key cube-name nit by validating that every join-key column on a side resolves to the same cube/view and rejecting the merge otherwise (members.rs:3053-3068). No new correctness blockers.

Findings on this turn: 0 high · 0 medium · 0 low · 0 nits new. Resolved 1 prior thread (composite-key); 2 prior threads still apply (1 low, 1 nit) — not re-posting.

Details (expand)
  • Diffed 3b61886..HEAD — single Rust change in members.rs (composite-key safety)
  • Verified the fix: pre-existing left_cube_name/right_cube_name are now compared against each column's cube prefix; mismatch sets all_match = false and the merge is rejected. Single-column shared-key path is unchanged.
  • Resolved the now-obsolete composite-key thread
  • Re-checked the 2 remaining prior Claude threads against current line numbers — still applicable

Re-affirmed prior threads (still open, no new comment):

  • members.rs:3180 — egraph/subst mutation runs before the iteration that decides success (cleanliness nit, not a correctness issue; egg dedupes)
  • test_cube_join_views.rs:283 — no test asserts RIGHT (right-side set) or FULL (no filters) join semantics

· branch ·

…l merge; test RIGHT/FULL

Address review nits:
- Construct the join-semantics set filters and mutate subst inside the
  innermost iteration, right before returning true, so a false return
  never leaves a stale subst entry or orphan filter e-nodes.
- Add RIGHT JOIN and FULL JOIN tests to lock in the join-type table
  (right join key set filter; no filter for full).

(The composite-key single-cube-per-side check was already added in a
prior commit.)

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
@claude

claude Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Claude finished @cursor[bot]'s task in 50s —— View job


Claude Code Review

Verdict: LGTM. New commit 45db195 resolves both prior open threads: presence-filter construction and subst.insert are now inside the innermost var_iter! loop right before return true (members.rs:3202-3231), so a false return never leaves orphan filter e-nodes or a stale subst entry; and the test file gains explicit RIGHT and FULL coverage that pins the join-type table (test_cube_join_views.rs:288-368). All three prior Claude threads are now resolved or addressed.

Findings on this turn: 0 high · 0 medium · 0 low · 0 nits new.

Details (expand)
  • Diffed 4872fcb..HEAD (single commit 45db195)
  • Verified the members.rs refactor moves the filter-chain construction inside the alias_to_cube × join_hints loop, after out_join_hints is built and immediately before return true. presence_members is computed up front (cheap, no egraph mutation) and consumed only on the success path. The borrow shape works because presence_members is iterated by reference (for name in &presence_members) and cloned per entry.
  • Verified the new RIGHT test asserts filters: Some(vec![set_filter("orders_view.customer_city")]) and FULL asserts filters: None — matches the table in the PR description and members.rs:3122-3134.
  • Resolved prior Claude thread on egraph mutation (PRRT_kwDOCOH19M6Hio-Y) and prior Claude thread on RIGHT/FULL test gap (PRRT_kwDOCOH19M6Hio-x). The composite-key thread had already been resolved with 4872fcb.
  • Skimmed the new tests against the helper shape used by the LEFT/INNER tests — same plan_view_join + find_cube_scan().request assertion, same setup. No drift.

Minor observation, not blocking: the FULL test asserts filters: None, which depends on subst[left_filters_var] resolving to an empty/None CubeScanFilters in the unmerged input — i.e. the test exercises both "skip the filter-chain block" and "the pass-through filters happened to be empty". If a future caller produces a FULL join with pre-existing filters, the assertion shape (None vs Some(vec![]) vs Some(vec![<pre-existing>])) is something to keep in mind, but it's not a current bug.

· [branch](https://github.com/cube-js/cube/tree/cursor/view-join-shared-member-375b)

cursoragent and others added 2 commits June 7, 2026 18:42
…ush-down

Introduce a MultiFactJoinWrapper intermediate egraph node so the shared-member
view-join rewrite is no longer a single aggregate-bound rule. The rewrite now
splits into:
  - shared-member-join-to-wrapper: Join(CubeScan, CubeScan) -> wrapper(CubeScan)
  - shared-member-join-extend-wrapper: Join(wrapper(CubeScan), CubeScan) ->
    wrapper(CubeScan), enabling joins of 3+ views
  - multi-fact-join-wrapper-filter-push-down: Filter(wrapper) ->
    wrapper(Filter), pushing WHERE/ON filters into the merged scan
  - aggregate-multi-fact-join-wrapper: unwrap only when GROUP BY matches the
    recorded join key

The wrapper records the join key (as underlying cube members) so the finalize
rule can verify the GROUP BY, while joins and filters compose beforehand.

Adds tests for 3-way and 4-way FULL joins, a WHERE filter, and an ON-clause
filter, in addition to the existing 2-way LEFT/INNER/RIGHT/FULL coverage.

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
@claude

claude Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Claude finished @cursor[bot]'s task in 2m 46s —— View job


Claude Code Review

Verdict: LGTM. New commits cee814e (MultiFactJoinWrapper for N-way view joins + filter push-down) and 1aca9d9 (docs) are a clean generalization of the previous aggregate-gated rule into a three-step pipeline: (1) shared-member-join-to-wrapper produces a MultiFactJoinWrapper, (2) shared-member-join-extend-wrapper chains additional view joins onto the wrapper, (3) aggregate-multi-fact-join-wrapper unwraps only when the GROUP BY matches the recorded join key. The wrapper carries cost-1 in BestCubePlan, so when finalize doesn't fire the original join tree is still preferred — no risk of an un-eliminated wrapper escaping into a final plan. The new filter push-down rule handles WHERE cleanly. No correctness blockers.

Findings on this turn: 0 high · 0 medium · 3 low · 1 nit — left inline.

Details (expand)
  • Diffed 45db195..HEAD (cee814e + 1aca9d9)
  • Reviewed the new rules (shared-member-join-to-wrapper, shared-member-join-extend-wrapper, multi-fact-join-wrapper-filter-push-down, aggregate-multi-fact-join-wrapper) and the renamed merge_shared_member_join / new finalize_shared_member_join transforms
  • Verified MultiFactJoinWrapper cost is 1 (cost.rs:130) — same band as JoinCheck* intermediates, so un-eliminated wrappers lose to the original join tree at extraction
  • Cross-checked existing review threads via gh list-review-threads — the 3 prior Claude threads are all resolved against earlier commits; no duplicates posted
  • Spot-checked the new tests (3-way / 4-way FULL, LEFT + WHERE, LEFT + ON-clause filter) against the rewrite logic
  • Skimmed the docs delta — multi-fact-views.mdx adds an N-way example and filter notes; joins.mdx cross-link unchanged

Inline findings:

  • [low] extend-wrapper only matches wrapper-on-left — right-associative c JOIN (o JOIN r) won't merge (members.rs:517-588).
  • [low] finalize_shared_member_join rejects non-Expr::Column GROUP BY — e.g. GROUP BY DATE_TRUNC('day', c.day) over a time-dim key silently won't unwrap; worth a comment or negative test (members.rs:3316-3387).
  • [low] No INNER/LEFT/RIGHT coverage for 3+ view joins — the new N-way tests are FULL-only, so the per-pass presence-filter accumulation through extend-wrapper isn't pinned (test_cube_join_views.rs:443-585).
  • [nit] Duplicate #[allow(clippy::too_many_arguments)] at members.rs:3044-3045.

Other notes (not blocking, not posted inline):

  • Order-of-application in mixed wrapper+filter chains is safe because egg keeps the bare-wrapper form in the same e-class even after multi-fact-join-wrapper-filter-push-down adds the filter-inside-wrapper variant — extend-wrapper can still match the bare form.
  • join_member_names is sorted+deduped before insertion into the wrapper, so the same N-way merge reached through different rule orders produces the same join_members payload — important for e-class dedup.
  • The new dimension_member_name resolver inside finalize_shared_member_join uses find_member_by_column on the merged scan; relies on the e-class analysis being populated by then, which it is in practice.

· branch ·

Comment thread rust/cubesql/cubesql/src/compile/rewrite/rules/members.rs Outdated
Comment thread rust/cubesql/cubesql/src/compile/rewrite/rules/members.rs
Comment thread rust/cubesql/cubesql/src/compile/rewrite/rules/members.rs Outdated
Comment thread rust/cubesql/cubesql/src/compile/test/test_cube_join_views.rs
- Remove duplicate #[allow(clippy::too_many_arguments)] on merge_shared_member_join
- Document the left-deep-only assumption of shared-member-join-extend-wrapper
  (right-associative a JOIN (b JOIN c) is not chained)
- Document that finalize only accepts plain-column GROUP BY (wrapped exprs like
  DATE_TRUNC fall back to standard handling)
- Add a 3-way LEFT join test pinning per-pass presence-filter accumulation
  through the extend-wrapper rule

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
@claude

claude Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Claude finished @cursor[bot]'s task in 42s —— View job


Claude Code Review

Verdict: LGTM. New commit 144bb60 addresses all four prior nits from the last pass: the duplicate #[allow(clippy::too_many_arguments)] is removed, shared-member-join-extend-wrapper now documents that only the wrapper-on-left shape is matched (right-associative trees fall back), finalize_shared_member_join documents that wrapped GROUP BY expressions intentionally aren't recognized as the join key, and a new 3-way LEFT-join test pins the per-pass presence-filter accumulation through extend-wrapper.

Findings on this turn: 0 high · 0 medium · 0 low · 0 nits new. All 4 prior open Claude threads resolved.

Details (expand)
  • Diffed 1aca9d9..HEAD (single commit 144bb60, +57/-1 across members.rs and test_cube_join_views.rs)
  • Verified the duplicate attribute is gone at members.rs:3050
  • Verified the new comment at members.rs:516-521 accurately describes the wrapper-on-left limitation
  • Verified the new comment at members.rs:3375-3378 documents the Expr::Column-only GROUP BY gate
  • Verified the new test_group_by_left_join_three_views_presence_filters test asserts both per-pass set filters appear (orders_view.customer_city + customers_view.customer_city), pinning the union-across-passes behavior the extend-wrapper path produces via repeated subst.insert(left_filters_var, ...)
  • Resolved the four prior Claude threads (PRRT_kwDOCOH19M6HrfrK, …HrfsS, …Hrft8, …Hrfvo) — concerns no longer applicable
  • Cross-checked remaining open threads — none from Claude remain unresolved

Nothing further to flag. The PR is in a good state.

· branch ·

A join written directly on DATE_TRUNC (ON DATE_TRUNC(g, a.ts) = DATE_TRUNC(g, b.ts))
is lowered by the planner to Filter(<eq>, CrossJoin(...)) rather than a column
equi-join, so it never reached the shared-member merge. Add a
shared-time-member-cross-join-to-wrapper rule that recognizes this shape,
resolves both truncated columns to the same underlying time member at the same
granularity, and merges into an INNER multi-fact CubeScan (both keys marked
present). Grouping by DATE_TRUNC already worked via referenced-column collapse;
the finalize comment is corrected accordingly.

Adds time dimensions to the test views and tests for join-on-raw-time +
GROUP BY DATE_TRUNC (LEFT) and join-on-DATE_TRUNC + GROUP BY DATE_TRUNC (INNER).

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
@claude

claude Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Claude Code is working…

I'll analyze this and get back to you.

View job run

Add a shared customer_state dimension and tests for joining two views on a
composite key (customer_city + customer_state) and grouping by both, plus a
negative test that a partial GROUP BY (only one of the two join keys) does not
merge.

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
@claude

claude Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Claude finished @cursor[bot]'s task in 3m 39s —— View job


Claude Code Review

Verdict: LGTM. Two new commits since the last pass: 3d11f94 adds a shared-time-member-cross-join-to-wrapper rule that merges joins written directly on DATE_TRUNC(...)=DATE_TRUNC(...) (the SQL planner lowers them to Filter(<eq>, CrossJoin(...))), and e5e24bf extends test coverage to composite-key joins (multiple shared dimensions). No correctness blockers. The new rule mirrors the structure of merge_shared_member_join (granularity match, both columns resolve to the same underlying time member, INNER set-filter on both sides, mutations inside the innermost loop right before return true), and the composite-key tests pin both the positive case (GROUP BY covering the full key) and a partial-GROUP-BY negative.

Findings on this turn: 0 high · 0 medium · 2 low · 0 nits new. All 7 prior Claude threads resolved or no longer applicable.

Details (expand)
  • Diffed 144bb60..HEAD (3d11f94 + e5e24bf): +309/-6 in members.rs, +187 in test_cube_join_views.rs, +29 in multi-fact-views.mdx
  • Read the new merge_shared_time_cross_join and the 4 new tests
  • Verified all 7 prior Claude review threads are resolved (per gh list-review-threads)
  • Cross-checked the new rule against existing patterns and engine/df/optimizers/filter_push_down.rs:114 upstream lowering note
  • Posted findings as inline comments

Inline findings:

  • [low] No guard that JOIN granularity matches GROUP BY granularity (members.rs:3483) — ON DATE_TRUNC('month',...)=DATE_TRUNC('month',...) paired with GROUP BY DATE_TRUNC('day',...) would merge and stitch at day-grain, diverging from the user-written month-join semantics whenever rows fall on different days in the same month. Either reject the merge when grain differs or document the assumption.
  • [low] Test name says INNER but SQL uses LEFT JOIN (test_cube_join_views.rs:741) — test_inner_join_on_date_trunc_group_by_date_trunc asserts both-sides set filters (INNER) but the SQL is LEFT JOIN. The merge works because DataFusion lowers non-equi LEFT JOINs to Filter(CrossJoin) upstream (already noted in filter_push_down.rs:114), which strips LEFT-outer semantics before this rule sees the plan. Pre-existing upstream behavior, not introduced here — but the test reads misleadingly. Suggest using INNER JOIN in the SQL or adding a comment that LEFT is intentional to exercise the upstream-lowering path.

Other notes (not blocking, not posted inline):

  • join_member_names for the time-cross-join case is built from one entry (resolve_underlying(&left_key)) and then sort()/dedup()'d — the sort/dedup pair is a no-op for a one-element Vec. Mirroring merge_shared_member_join's shape so kept consistent, but worth knowing it isn't doing work.
  • dimension_member_name accepts both Member::Dimension and Member::TimeDimension. The rule is named "shared-time-member" and documented for time dimensions, but technically a DATE_TRUNC over a plain dimension that the meta also exposes would match the gate. In practice this is an execution-layer no-op (DATE_TRUNC on a non-temporal column fails at runtime), so it's not a security or correctness concern, just an over-broad gate.
  • The new rule mutates the egraph and inserts into subst only inside the innermost loop just before return true — same shape as the prior pass's resolution. Good.
  • Composite-key positive test (test_left_join_on_multiple_dimensions_group_by_both) asserts presence filters in reverse declaration order (customer_state then customer_city) — that's an artifact of the acc = [filter, acc] prepending loop in merge_shared_member_join. Stable for a given members order, so the assertion is fine.

· branch ·

Comment thread rust/cubesql/cubesql/src/compile/rewrite/rules/members.rs
..Default::default()
}
)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[low] Test name says INNER but the SQL uses LEFT JOIN.

The test asserts INNER semantics (set_filter("orders_view.created_at") AND set_filter("customers_view.created_at") — both sides marked present), but the SQL uses LEFT JOIN. The merge happens because DataFusion lowers a non-equi LEFT JOIN to Filter(<eq>, CrossJoin) upstream (see the existing comment at engine/df/optimizers/filter_push_down.rs:114 — "Current DataFusion version plans complex joins as Filter(CrossJoin)"), which silently strips the LEFT-outer semantics before this rule sees the plan.

It's a pre-existing upstream behavior, not introduced here, but as written the test reads as "writing LEFT JOIN on DATE_TRUNC returns INNER results" without flagging that. Suggest either using INNER JOIN in the SQL (matching the rule's docstring on members.rs:594-599 "Merge an INNER join expressed as a date-truncated equality"), or adding a comment to the test that the LEFT keyword is intentional to exercise the upstream-lowering path and the loss-of-LEFT is expected.

…sion

The planner turns 'ON a.dim = b.dim AND DATE_TRUNC(g, a.ts) = DATE_TRUNC(g, b.ts)'
into Filter(<trunc eq>, Join(a.dim = b.dim, ...)). The column join becomes a
MultiFactJoinWrapper; add a multi-fact-join-wrapper-absorb-time-key rule that
folds the truncated time member into the wrapper's recorded join key (marking
both time columns present, since a post-join equality is INNER on that key) so
a query grouped by both the time dimension and the dimension merges into one
multi-fact CubeScan.

Adds a test for the mixed DATE_TRUNC + dimension join.

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
@claude

claude Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Claude finished @cursor[bot]'s task in 2m 6s —— View job


Claude Code Review

Verdict: LGTM. New commit 3614128 adds multi-fact-join-wrapper-absorb-time-key, which folds a DATE_TRUNC(g, a.ts) = DATE_TRUNC(g, b.ts) filter sitting on top of an existing MultiFactJoinWrapper into the wrapper's join key — covering the composite shape ON a.dim = b.dim AND DATE_TRUNC(a.ts) = DATE_TRUNC(b.ts) (the planner makes the column eq the join and keeps the trunc eq as a top filter). The fold checks both granularities match and both columns resolve to the same underlying time member; both time columns are marked present (INNER on the absorbed key, which matches SQL LEFT JOIN + WHERE = INNER semantics on the join key). Test asserts GROUP BY on the time dim + dim, with the absorbed time key driving the time_dimensions entry. No correctness blockers.

Findings on this turn: 0 high · 0 medium · 0 low · 0 nits new. Posted no new inline comments.

Details (expand)
  • Diffed e5e24bf..HEAD — single commit 3614128: +211 in members.rs (one new rule + transform), +55 in test_cube_join_views.rs (one positive test), +12 in multi-fact-views.mdx
  • Read the new absorb_time_key_into_wrapper transform — same shape as merge_shared_time_cross_join (granularity equality, underlying-member equality, INNER-style both-sides set filter, mutations only inside the success path right before return true)
  • Verified join_members is unioned with prev_join_members then sorted/deduped, so the finalize GROUP BY check sees both the column key and the time key as required
  • Spot-checked the test: GROUP BY on DATE_TRUNC('day', c.created_at) + c.customer_city resolves via referenced_expr to the underlying columns/members, and the merged scan emits created_at as time_dimensions[day] + customer_city as dimensions
  • Cross-checked review threads — 7 prior Claude threads resolved; 2 prior open Claude threads still applicable (see below), not re-posting
  • Confirmed the noted-as-resolved comment at members.rs:3882-3888 clarifies why Expr::Column-only check on GROUP BY isn't actually restrictive (referenced_expr collapses DATE_TRUNC(g, c.ts) down to its inner column reference)

Re-affirmed prior open threads (still apply, no new comment):

  • members.rs:3556 — no guard that JOIN granularity matches GROUP BY granularity. The same concern applies to the new absorb-time-key rule: if the absorbed DATE_TRUNC(month, ...) pairs with GROUP BY DATE_TRUNC(day, ...), the merge happens at day-grain. Worth covering both rules with one fix.
  • test_cube_join_views.rs:741test_inner_join_on_date_trunc_group_by_date_trunc uses LEFT JOIN but asserts INNER semantics (relies on upstream LEFT→Filter(CrossJoin) lowering).

Other notes (not blocking, not posted inline):

  • The new test's SQL uses JOIN (INNER) — good, no naming ambiguity here.
  • presence_members = [left_key, right_key] always inserts two filter members. If left_key == right_key (degenerate case of both sides resolving to the same member-name string — only possible if both columns truly refer to the same cube member on the same side, which the gate already rejects), this would silently produce a duplicate filter entry. Not reachable today; same pattern as the sibling rules.
  • The rule has no guard that the wrapper's prev_join_members is non-empty. By construction (the wrapper can only have been produced by one of the three creation rules, all of which insert at least one key), it always is — but worth knowing the assumption isn't asserted.

· branch ·

…P BY

Per review: the multi-fact stitch happens at the GROUP BY grain, so the join
key's granularity must equal the GROUP BY granularity for a time member.
Previously only the underlying member name was recorded, so joining on
DATE_TRUNC('month', ...) while grouping by DATE_TRUNC('day', ...) merged and
stitched at day grain, diverging from the written join.

Store join_members as (underlying member, Option<granularity>) on the wrapper
(None for plain dimensions, Some(grain) for DATE_TRUNC time keys), and at
finalize extract each GROUP BY expression's granularity from its original_expr
(rather than referenced_expr, which drops the grain) and require the full
(member, granularity) sets to match.

Replaces the raw-time-join test with a granularity-mismatch negative test.

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
@claude

claude Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Claude finished @cursor[bot]'s task in 2m 15s —— View job


Claude Code Review

Verdict: Request changes — 1 medium. Latest commit 379a2a1 correctly tightens the JOIN/GROUP BY granularity gate (resolved the prior open thread on this), but in doing so it silently breaks the documented "raw time column join + GROUP BY DATE_TRUNC" shape. The test that pinned that behavior was deleted in the same commit and replaced with a granularity-mismatch negative — so no test catches the regression — while multi-fact-views.mdx:374-381 and the PR description still claim it works. Either restore the case in the rule (treat a (name, None) join key against a (name, Some(grain)) GROUP BY as compatible when the member is a time dim) or update the docs/PR description to reflect the tightened gate. Posted inline at members.rs:3378.

Findings on this turn: 0 high · 1 medium · 0 low · 0 nits. Resolved 1 prior thread (Hr11I). 1 prior open Claude thread still applies (Hr18_test_inner_join_on_date_trunc_group_by_date_trunc SQL still uses LEFT JOIN).

Details (expand)
  • Diffed 3614128..HEAD (single commit 379a2a1 — granularity gate)
  • Verified the rule change: join_members is now Vec<(String, Option<String>)>; plain dim joins record (name, None), time-key joins record (name, Some(grain)). Finalize extracts (col, None) from Expr::Column or (col, Some(grain)) from Expr::ScalarFunction { fun: DateTrunc, .. } and compares exact set equality.
  • Confirmed the regression by code-tracing: LEFT JOIN o ON o.created_at = c.created_at records (created_at, None); GROUP BY DATE_TRUNC('day', c.created_at) extracts (created_at, Some("day")); sets differ → no merge.
  • Confirmed no test pins the doc'd raw-time-column + DATE_TRUNC-GROUP-BY shape anymore (the previous one was renamed/replaced into the negative).
  • Confirmed the doc still claims this shape works (multi-fact-views.mdx:374-381).
  • Resolved the prior open thread on granularity guard (PRRT_kwDOCOH19M6Hr11I).
  • Re-checked the remaining prior open thread (Hr18_, test name says INNER but SQL uses LEFT) against current line numbers — still applies at test_cube_join_views.rs:692-734; not re-posting.

Other notes (not blocking):

  • The new finalize code uses original_expr (which preserves the wrapped expression) rather than referenced_expr (which collapses it) — good fix, the reason it can recognize DATE_TRUNC shape at all.
  • Vec::sort() on Vec<(String, Option<String>)> uses tuple ordering — (a, None) sorts before (a, Some(_)), so dedup still works correctly even when the same member appears with both grains.
  • parse_granularity(scalar, false) returns None for unrecognized literals (e.g. 'qtr'), so the rule conservatively bails instead of silently matching.

· branch ·

// already recorded on the left wrapper (for chained 3+ view joins).
let mut join_member_names: Vec<(String, Option<String>)> = shared_left_keys
.iter()
.map(|k| (resolve_underlying(k), None))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[medium] Granularity-strict matching silently breaks the documented "raw time column + GROUP BY DATE_TRUNC" shape.

merge_shared_member_join records every shared key as (name, None) (line 3378). finalize_shared_member_join now requires the GROUP BY pair to equal the join key exactly — a DATE_TRUNC('day', c.created_at) group-by extracts (created_at, Some("day")), which does not match (created_at, None). So a query like

LEFT JOIN o ON o.created_at = c.created_at
GROUP BY DATE_TRUNC('day', c.created_at)

no longer merges. That's exactly the first shape documented as supported in docs-mintlify/docs/data-modeling/multi-fact-views.mdx:374-381 ("Join on the raw time column, group by DATE_TRUNC") and was previously pinned by test_left_join_raw_time_group_by_date_trunc — which was deleted in 379a2a1 and replaced with a granularity-mismatch negative, so no test catches the regression.

If the strict (member, grain) match is the intended design (the GROUP BY grain must match the join grain — and a raw column join is "no grain" so it never matches a truncated GROUP BY), then the docs example needs to be removed/rewritten and the PR description's "Both shapes of the common time multi-fact pattern are supported" claim updated. Otherwise the resolution rule should treat a (name, None) join key on a time member as compatible with a (name, Some(grain)) GROUP BY (the user joined at the row grain and is free to group by any truncation), restoring the deleted test.

Either way worth aligning the code with the docs before this merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants