Skip to content

[Spark] support identity creation via sparksql#7062

Open
mwc360 wants to merge 4 commits into
delta-io:masterfrom
mwc360:mcole_identity_sql
Open

[Spark] support identity creation via sparksql#7062
mwc360 wants to merge 4 commits into
delta-io:masterfrom
mwc360:mcole_identity_sql

Conversation

@mwc360

@mwc360 mwc360 commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Resolves #7061

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

Adds support for the Spark 4.0 SQL DDL identity-column syntax in Delta, e.g.:

CREATE TABLE t (
   id1 BIGINT GENERATED ALWAYS AS IDENTITY,
   id2 BIGINT GENERATED ALWAYS AS IDENTITY (START WITH -1 INCREMENT BY 1),
   id3 BIGINT GENERATED BY DEFAULT AS IDENTITY,
   id4 BIGINT GENERATED BY DEFAULT AS IDENTITY (START WITH -1 INCREMENT BY 1)
)

Previously the syntax parsed but Spark refused to dispatch to the catalog (capability not advertised), and even if it did, Spark's V2 Column.identityColumnSpec() was being dropped by the default CatalogV2Util.v2ColumnsToStructType conversion before reaching Delta's create path, so the resulting tables had no identity metadata.

How it works

Spark 4.0 stores identity info on the V2  Column  object ( Column.identityColumnSpec() ), not in StructField metadata. The default fallback in StagingTableCatalog converts Column[] → StructType via CatalogV2Util.v2ColumnsToStructType, which drops IdentityColumnSpec. By overriding the Column[] overloads in AbstractDeltaCatalog and using a Delta-aware converter, the identity info is preserved into Delta's metadata keys before it reaches createDeltaTable . From there, the existing identity codepath ( ColumnWithDefaultExprUtils.isIdentityColumn, IdentityColumnsTableFeature auto-enable, write-time generation, admission checks) takes over unchanged.

How was this patch tested?

A new test suite plus regression testing of existing identity suites.

Does this PR introduce any user-facing changes?

No, new functionality.

@felipepessoto

Copy link
Copy Markdown
Contributor

@newfront @timothyw553 could you trigger CI and help find a reviewer, please? This solves an important issue, part of the roadmap: Linux Foundation Delta Lake Roadmap (view)

@timothyw553

Copy link
Copy Markdown
Collaborator

hi @mwc360 the CI is failing, could you take a look?

@mwc360

mwc360 commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

@timothyw553 - scalastyle issue is fixed, can you retrigger CI?

@mwc360

mwc360 commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

@timothyw553 - sorry, missed something. Just tested that it compiles successfully. pls trigger CI again.

@timothyw553

Copy link
Copy Markdown
Collaborator

triggered

@mwc360

mwc360 commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

@timothyw553 - fyi, everything is passing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request][Spark] SQL support for identity columns in table DDL

3 participants