[Spark] support identity creation via sparksql#7062
Open
mwc360 wants to merge 4 commits into
Open
Conversation
Contributor
|
@newfront @timothyw553 could you trigger CI and help find a reviewer, please? This solves an important issue, part of the roadmap: Linux Foundation Delta Lake Roadmap (view) |
Collaborator
|
hi @mwc360 the CI is failing, could you take a look? |
Contributor
Author
|
@timothyw553 - scalastyle issue is fixed, can you retrigger CI? |
Contributor
Author
|
@timothyw553 - sorry, missed something. Just tested that it compiles successfully. pls trigger CI again. |
Collaborator
|
triggered |
Contributor
Author
|
@timothyw553 - fyi, everything is passing |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves #7061
Which Delta project/connector is this regarding?
Description
Adds support for the Spark 4.0 SQL DDL identity-column syntax in Delta, e.g.:
Previously the syntax parsed but Spark refused to dispatch to the catalog (capability not advertised), and even if it did, Spark's V2 Column.identityColumnSpec() was being dropped by the default CatalogV2Util.v2ColumnsToStructType conversion before reaching Delta's create path, so the resulting tables had no identity metadata.
How it works
Spark 4.0 stores identity info on the V2 Column object (
Column.identityColumnSpec()), not in StructField metadata. The default fallback inStagingTableCatalogconverts Column[] → StructType viaCatalogV2Util.v2ColumnsToStructType, which dropsIdentityColumnSpec. By overriding theColumn[]overloads inAbstractDeltaCatalogand using a Delta-aware converter, the identity info is preserved into Delta's metadata keys before it reachescreateDeltaTable. From there, the existing identity codepath (ColumnWithDefaultExprUtils.isIdentityColumn,IdentityColumnsTableFeatureauto-enable, write-time generation, admission checks) takes over unchanged.How was this patch tested?
A new test suite plus regression testing of existing identity suites.
Does this PR introduce any user-facing changes?
No, new functionality.