[Spark] Support CHANGES (Auto-CDF) reads in AUTO v2 enable mode#7031
Draft
gengliangwang wants to merge 2 commits into
Draft
[Spark] Support CHANGES (Auto-CDF) reads in AUTO v2 enable mode#7031gengliangwang wants to merge 2 commits into
gengliangwang wants to merge 2 commits into
Conversation
A `SELECT ... CHANGES FROM VERSION ... TO VERSION ...` query routes through `TableCatalog.loadChangelog` -> `ChangelogSupport.loadChangelog`. That path resolved the table via `loadTable`, which in AUTO mode returns the V1 connector (`DeltaTableV2`). Auto-CDF is only implemented by the V2 connector, so the read failed with `DELTA_CHANGELOG_REQUIRES_V2_TABLE` and only worked when the session forced STRICT mode. This adds `DeltaV2Mode.shouldRouteChangelogToV2()` (true for AUTO and STRICT) and updates `ChangelogSupport.loadChangelog` to re-resolve a V1 `DeltaTableV2` as a `DeltaV2Table` for the CHANGES read when the mode permits. AUTO keeps general batch reads/writes on the V1 connector and only opts into V2 for the V2-supported CHANGES operation; NONE still rejects with the existing error. Tests: added `testAutoModeRoutesChangesToV2` (CHANGES succeeds under AUTO without STRICT) and `testNoneModeRejectsChanges` (NONE still rejected) to `DeltaChangelogCatalogIntegrationTest`. Co-authored-by: Isaac
…logTable Extract the V1 DeltaTableV2 -> V2 DeltaV2Table re-resolution out of the ChangelogSupport trait and into DeltaCatalog, where all V1/V2 connector construction already lives. The trait now delegates via the abstract asV2ChangelogTable method instead of reaching into DeltaTableV2 internals (catalogTable/path), keeping the changelog trait thin and connector-construction-agnostic. The DeltaCatalog method is intentionally not annotated @OverRide: it satisfies the abstract declaration only in the Spark 4.2 ChangelogSupport shim; the 4.0/4.1 shims use an empty trait where it is an unused helper. Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
A
SELECT ... CHANGES FROM VERSION ... TO VERSION ...query routes throughTableCatalog.loadChangelog→ChangelogSupport.loadChangelog(spark-4.2 shim). That path resolves the table vialoadTable, which inAUTOmode returns the V1 connector (DeltaTableV2), sinceDeltaV2Mode.shouldCatalogReturnV2Tables()is onlytrueforSTRICT. Auto-CDF is implemented only by the V2 connector, so the read failed with:As a result, CHANGES queries only worked when the session forced
STRICTmode.Changes
DeltaV2Mode.shouldRouteChangelogToV2()— new predicate,trueforAUTOandSTRICT,falseforNONE. Intentionally distinct fromshouldCatalogReturnV2Tables():AUTOkeeps general batch reads/writes on the V1 connector (full feature support) and only opts into V2 for V2-supported operations like CHANGES.ChangelogSupport.loadChangelog(spark-4.2) — whenloadTablereturns a V1DeltaTableV2but the mode routes CHANGES to V2, re-resolve the same table as aDeltaV2Table(via catalog table or path).NONEmode still throws the existing error.delta-error-classes.json/DeltaErrors.scala— updated the error message and docs to reflect thatAUTO(now) orSTRICTenable V2 CHANGES reads.Testing
Added to
DeltaChangelogCatalogIntegrationTest:testAutoModeRoutesChangesToV2—CHANGES FROM VERSION 1 TO VERSION 3succeeds underAUTOwithout forcingSTRICT.testNoneModeRejectsChanges—NONEstill rejected withDELTA_CHANGELOG_REQUIRES_V2_TABLE.Verified with
-DsparkVersion=4.2(the Auto-CDF code only exists in the spark-4.2 shim, per SPARK-56685): Java + Scala + scalastyle + Checkstyle clean, and the fullDeltaChangelogCatalogIntegrationTestpasses 19/19 (including pre-existing STRICT-mode tests).This pull request and its description were written by Isaac.