[spark] Support catalog-qualified CREATE TABLE LIKE by kerwin-zk · Pull Request #7924 · apache/paimon

kerwin-zk · 2026-05-21T08:37:50Z

Purpose

This allows CREATE TABLE LIKE to resolve source and target tables through Paimon catalogs when either side uses a catalog-qualified identifier.

Examples:

-- target and source are both catalog-qualified
CREATE TABLE paimon.default.target_tbl LIKE paimon.default.source_tbl;

-- only the target table is catalog-qualified
CREATE TABLE paimon.default.target_tbl LIKE source_tbl;

-- only the source table is catalog-qualified
CREATE TABLE target_tbl LIKE paimon.default.source_tbl;

Tests

CI

Copilot

Pull request overview

Adds Spark SQL parsing and test coverage to support CREATE TABLE LIKE when either the source or target table uses a catalog-qualified identifier (e.g. paimon.db.tbl), ensuring the command is correctly rewritten to Paimon’s CreateTableLike handling on Spark ≥ 3.4.

Changes:

Extend the Paimon SQL extensions grammar + AST builder to parse CREATE TABLE LIKE and remap catalog-qualified identifiers into Spark’s CreateTableLikeCommand.
Update the extensions parser to detect catalog-qualified CREATE TABLE LIKE statements and run them through the extensions pipeline (so rewrite rules apply).
Add a new UT suite covering catalog-qualified target/source, IF NOT EXISTS, clause passthrough (USING, TBLPROPERTIES), and unsupported Hive storage syntax.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
paimon-spark/paimon-spark-ut/src/test/scala/org/apache/paimon/spark/sql/CatalogQualifiedCreateTableLikeTest.scala	Adds regression/behavior tests for catalog-qualified `CREATE TABLE LIKE` scenarios on Spark ≥ 3.4.
paimon-spark/paimon-spark-common/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/PaimonSqlExtensionsAstBuilder.scala	Builds Spark’s `CreateTableLikeCommand` from a placeholder parse, then patches in catalog-qualified `TableIdentifier`s.
paimon-spark/paimon-spark-common/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/AbstractPaimonSparkSqlExtensionsParser.scala	Detects catalog-qualified `CREATE TABLE LIKE` and routes it through the extensions parser + rewrite rules.
paimon-spark/paimon-spark-common/src/main/antlr4/org.apache.spark.sql.catalyst.parser.extensions/PaimonSqlExtensions.g4	Extends the extensions grammar to recognize `CREATE TABLE LIKE` with relevant clauses.
paimon-spark/paimon-spark-4.0/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/AbstractPaimonSparkSqlExtensionsParser.scala	Keeps Spark 4.0’s parser wrapper behavior consistent with the common module changes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

JingsongLi

Review

The approach of intercepting catalog-qualified CREATE TABLE LIKE in the extensions parser and delegating to Spark's parser for clause handling (via synthetic SQL with dummy names) is clever and avoids duplicating Spark's complex clause resolution logic.

Issues

1. Double parsing on every SQL statement that isn't a Paimon command.

isCatalogCreateTableLike performs tokenization + full parse for any non-Paimon-command SQL that contains CREATE TABLE, LIKE, and . tokens. This is a hot path — every DML/query statement goes through parsePlan. The maybeCreateTableLike heuristic helps, but it will still trigger for common patterns like:

CREATE TABLE foo AS SELECT * FROM catalog.db.table WHERE col LIKE '%x%'

This matches all four conditions: starts with CREATE TABLE, contains LIKE token, contains .. Then isParsedCatalogCreateTableLike does a full parse which will fail with PaimonParseException (caught and returns false). The failure path is cheap but the token scan + exception-based control flow on every CTAS with qualified source and LIKE predicate is unnecessary overhead.

Consider tightening maybeCreateTableLike: e.g., check that LIKE appears after the second identifier (not inside a WHERE clause). Or check that the token immediately before LIKE is an identifier/dot (not a string literal).

2. isCatalogIdentifier assumes exactly 3 parts = catalog-qualified.

private def isCatalogIdentifier(identifier: MultipartIdentifierContext): Boolean = {
  identifier.parts.size() == 3
}

This breaks for tables in nested namespaces (e.g., catalog.ns1.ns2.table = 4 parts) and for tables in the default namespace where only 2 parts (catalog.table) might be used. More importantly, toTableIdentifier handles arbitrary lengths:

case parts =>
  TableIdentifier(parts.last, Some(parts.slice(1, parts.length - 1).mkString(".")), Some(parts.head))

So isCatalogIdentifier should be parts.size() >= 3 to catch all catalog-qualified cases.

3. The grammar adds many tokens/rules that are unused except for passthrough.

The PR adds rowFormat, createFileFormat, storageHandler, fileFormat, locationSpec, propertyList, property, propertyKey, propertyValue, stringLit, plus 20+ new keywords. These are only needed so the ANTLR parser can successfully parse the full CREATE TABLE LIKE statement — but the visitor never visits them (only createTableLikeClausesText extracts them as raw text).

This works but bloats the grammar significantly. An alternative: use a greedy catch-all rule for the trailing clauses (everything after source=multipartIdentifier), since you just extract it as text anyway.

4. sparkCreateTableLikeCommand uses delegate parser with synthetic identifiers — fragile.

s"CREATE TABLE$ifNotExists __paimon_create_like_target LIKE __paimon_create_like_source${createTableLikeClausesText(ctx)}"

If the clauses contain __paimon_create_like_target or __paimon_create_like_source as string values (e.g., in TBLPROPERTIES), this could theoretically break. More practically, if Spark's parser adds new clauses or changes syntax in future versions, the reconstructed SQL may not parse correctly. The version guard (< "3.4") helps, but each new Spark version may need testing.

5. Duplicate AbstractPaimonSparkSqlExtensionsParser.scala in spark-4.0 and spark-common.

The diff shows identical changes in both paimon-spark-4.0 and paimon-spark-common. Is there a way to avoid this duplication? If the Spark 4.0 version must diverge, at least add a comment explaining why both files exist.

Minor

nonReserved list update looks correct — all new keywords can still be used as identifiers
The test coverage is solid: qualified target, qualified source, both qualified, IF NOT EXISTS, TBLPROPERTIES override, STORED AS rejection
The applyParserRules refactoring is a nice cleanup

Summary

The PR works for the intended use case. Main concerns are the detection heuristic overhead on non-matching SQL and the rigid parts.size() == 3 check for catalog identification.

YannByron · 2026-05-25T05:39:37Z

+1

Zouxxyy · 2026-05-25T06:35:40Z


  private lazy val substitutor = new VariableSubstitution()
  private lazy val astBuilder = new PaimonSqlExtensionsAstBuilder(delegate)
+  private val nonReservedIdentifierTokenTypes = Set(


What are these effects?

kerwin-zk force-pushed the support-catalog-qualified-create-table-like branch 5 times, most recently from eceaf35 to d0202fe Compare May 21, 2026 13:23

YannByron requested a review from Copilot May 22, 2026 08:46

Copilot started reviewing on behalf of YannByron May 22, 2026 08:47 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

[spark] Support catalog-qualified CREATE TABLE LIKE

f720582

kerwin-zk force-pushed the support-catalog-qualified-create-table-like branch from d0202fe to f720582 Compare May 22, 2026 13:55

JingsongLi reviewed May 23, 2026

View reviewed changes

[spark] Support catalog-qualified CREATE TABLE LIKE

f688995

kerwin-zk force-pushed the support-catalog-qualified-create-table-like branch from 2ec62ab to f688995 Compare May 23, 2026 16:15

[spark] Fix Spark 4.0 parser compatibility

c931a19

Zouxxyy reviewed May 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[spark] Support catalog-qualified CREATE TABLE LIKE#7924

[spark] Support catalog-qualified CREATE TABLE LIKE#7924
kerwin-zk wants to merge 3 commits into
apache:masterfrom
kerwin-zk:support-catalog-qualified-create-table-like

kerwin-zk commented May 21, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

JingsongLi left a comment

Uh oh!

YannByron commented May 25, 2026

Uh oh!

Zouxxyy May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

kerwin-zk commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

JingsongLi left a comment

Choose a reason for hiding this comment

Review

Issues

Minor

Summary

Uh oh!

YannByron commented May 25, 2026

Uh oh!

Zouxxyy May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kerwin-zk commented May 21, 2026 •

edited

Loading