Skip to content

[Spark] Run DeltaSinkSuite against both path-based and name-based access#7081

Open
PorridgeSwim wants to merge 2 commits into
delta-io:masterfrom
PorridgeSwim:nameBasedDeltaSinkSuite
Open

[Spark] Run DeltaSinkSuite against both path-based and name-based access#7081
PorridgeSwim wants to merge 2 commits into
delta-io:masterfrom
PorridgeSwim:nameBasedDeltaSinkSuite

Conversation

@PorridgeSwim

@PorridgeSwim PorridgeSwim commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

Test-only change that makes DeltaSinkSuite run against both path-based and name-based table access, rather than migrating it wholesale to name-based.

An earlier revision of this PR replaced path-based access with name-based access, which dropped the original path-based coverage. This revision parameterizes the suite instead:

  • A useDsv2 flag selects how the target is addressed:
    • useDsv2 = false (default) — path-based: writeStream...start(path), spark.read.format("delta").load(path), DeltaLog.forTable(spark, path). This preserves the suite's original coverage.
    • useDsv2 = true — name-based: writeStream...toTable(name), spark.read.table(name), DeltaLog.forTable(spark, TableIdentifier(name)). This exercises the name-based write seam, which is the only path available to the DSv2 (Kernel) sink — toTable rejects path identifiers.
  • A withSinkTarget helper plus a few small companion helpers (startStream, readTarget, deltaLogForTarget, deltaTableForTarget) interpret the target identifier per mode, so each test body is written once and runs both ways.
  • DeltaSinkSuite (and its existing CatalogManagedBatch1/2 subclasses) keep useDsv2 = false; a new DeltaSinkNameBasedSuite overrides it to true so both modes actually run in CI.

Two tests genuinely behave differently between the two access paths and branch internally on useDsv2:

  • "throw exception ... write in batch with different partitioning" — path-based throws AnalysisException ("Partition columns do not match"); name-based throws IllegalArgumentException ("The provided partitioning or clustering columns do not match the existing table's").
  • "can't write out with all columns being partition columns" — path-based starts the stream and then surfaces a StreamingQueryException; name-based rejects the all-columns partitioning up front at toTable with an AnalysisException.

A few tests remain intrinsically path-specific (e.g. path not specified, incompatible schema merging ...) and are unchanged.

Motivation (same as #7058 for the type-widening tests): unblock running these tests against the DSv2 (Kernel) connector, which has no path-based streaming write, while retaining the existing path-based coverage.

How was this patch tested?

Test-only change. Both DeltaSinkSuite (path-based) and DeltaSinkNameBasedSuite (name-based) pass.

Does this PR introduce any user-facing changes?

No.

@PorridgeSwim PorridgeSwim self-assigned this Jun 23, 2026

@xzhseh xzhseh left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the effort!

.trigger(org.apache.spark.sql.streaming.Trigger.Once)
.option("checkpointLocation", chkDir)
.start(tableDir)
withTable("test_delta_sink") {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this test use withTable and not withSinkTarget like the rest?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

@PorridgeSwim PorridgeSwim force-pushed the nameBasedDeltaSinkSuite branch from bb9fff5 to 5acab6c Compare June 24, 2026 18:06
@PorridgeSwim PorridgeSwim requested a review from BrooksWalls June 24, 2026 18:10
/**
* Run a sink test against a name-based (catalog) target table.
*/
protected def withSinkTarget(f: (String, File) => Unit): Unit = {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned we are losing test coverage for path-based tables.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preserved the path-based tests

Add a useDsv2 flag so the suite keeps its original path-based coverage
(useDsv2 = false) while also exercising the name-based write seam that
the DSv2 sink requires (useDsv2 = true, via DeltaSinkNameBasedSuite).

Co-authored-by: Isaac
@PorridgeSwim PorridgeSwim changed the title [Spark] Migrate DeltaSinkSuite from path-based to name-based access [Spark] Run DeltaSinkSuite against both path-based and name-based access Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants