[AURON #2238] Add support for auron.never.convert.reason in Iceberg scan scenarios by guixiaowen · Pull Request #2237 · apache/auron

guixiaowen · 2026-05-06T03:47:51Z

Which issue does this PR close?

Rationale for this change

In issue #1419, the reasons for fallback (i.e., why a conversion was not applied) are recorded using the property auron.never.convert.reason.

In issue #1471, these reasons can be observed in the Spark UI, helping users understand why the physical execution plan was not converted.

However, in the current Iceberg fallback scenarios, the property auron.never.convert.reason is not recorded.

The purpose of this issue is to fill this gap by ensuring that auron.never.convert.reason is properly recorded in Iceberg-related fallback cases as well.

What changes are included in this PR?

After an Iceberg scenario call fails, the error information will be recorded in auron.never.convert.reason for easy display.

Are there any user-facing changes?

How was this patch tested?

UT

…berg scan scenarios

Copilot

Pull request overview

This PR aims to ensure Iceberg fallback scenarios populate the Spark plan tag auron.never.convert.reason, making fallback reasons visible (e.g., in Spark UI) similarly to other scan conversions.

Changes:

Updated AuronConvertProvider to make isEnabled depend on the current SparkPlan, and adjusted Iceberg/Hudi/Paimon providers accordingly.
Added exception-based tagging in AuronConverters.convertSparkPlan to set auron.never.convert.reason when conversion is rejected via assertions/exceptions.
Added Iceberg integration tests asserting auron.never.convert.reason is present for disabled Iceberg scan and unsupported metadata-column scenarios.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
thirdparty/auron-paimon/src/main/scala/org/apache/spark/sql/hive/auron/paimon/PaimonConvertProvider.scala	Updates provider enablement to be plan-type aware.
thirdparty/auron-iceberg/src/test/scala/org/apache/auron/iceberg/AuronIcebergIntegrationSuite.scala	Adds integration tests validating `auron.never.convert.reason` for Iceberg fallback cases.
thirdparty/auron-iceberg/src/main/scala/org/apache/spark/sql/auron/iceberg/IcebergScanSupport.scala	Converts multiple “return None” fallbacks into `assert`-based failures with messages intended for tagging.
thirdparty/auron-iceberg/src/main/scala/org/apache/spark/sql/auron/iceberg/IcebergConvertProvider.scala	Updates `isEnabled` signature and uses assertions to drive fallback reason tagging.
thirdparty/auron-iceberg/pom.xml	Adds `scala-library` dependency (provided scope).
thirdparty/auron-hudi/src/main/scala/org/apache/spark/sql/auron/hudi/HudiConvertProvider.scala	Updates provider enablement to be plan-type aware.
spark-extension/src/main/scala/org/apache/spark/sql/auron/AuronConvertProvider.scala	Changes `isEnabled` API to accept `SparkPlan`.
spark-extension/src/main/scala/org/apache/spark/sql/auron/AuronConverters.scala	Uses `isEnabled(exec)` and adds try/catch-based never-convert reason tagging in generic conversion path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

guixiaowen · 2026-05-07T16:31:08Z

+  override def isEnabled(exec: SparkPlan): Boolean = {
+    exec match {
+      case _: BatchScanExec =>
+        val enabled = SparkAuronConfiguration.ENABLE_ICEBERG_SCAN.get()
+        assert(enabled, "Conversion disabled: auron.enable.iceberg.scan=false.")
+        assert(
+          sparkCompatible,
          s"Supported Spark versions: 3.4 to 4.0 (Iceberg ${icebergVersionOrUnknown}).")
-      return false
+        enabled
+      case _ => false


There is no issue here. Currently, for the Iceberg scenario, only BatchScanExec is supported. If more scenarios are supported in the future, they can be added here.

guixiaowen · 2026-05-07T16:34:17Z

  def plan(exec: BatchScanExec): Option[IcebergScanPlan] = {
    val scan = exec.scan
    val scanClassName = scan.getClass.getName
    // Only handle Iceberg scans; other sources must stay on Spark's path.
-    if (!scanClassName.startsWith("org.apache.iceberg.spark.source.")) {
-      return None
-    }
+    assert(scanClassName.startsWith("org.apache.iceberg.spark.source."), "Not iceberg scans.")

    // Changelog scan carries row-level changes; not supported by native COW-only path.
-    if (scanClassName == "org.apache.iceberg.spark.source.SparkChangelogScan") {
-      return None
-    }
+    assert(
+      !(scanClassName == "org.apache.iceberg.spark.source.SparkChangelogScan"),
+      "Not iceberg cow table.")



If it returns None, it will not be possible to determine why the conversion failed. Therefore, it is recommended to keep the current behavior as it is.

// Before assert(!AuronIcebergSourceUtil.getClassOfSparkBatchQueryScan.isInstance(scan), "Not iceberg scans.") // After assert(AuronIcebergSourceUtil.getClassOfSparkBatchQueryScan.isInstance(scan), "Not iceberg scans.")

Original logic: !isInstance → asserts failure when it's NOT an Iceberg scan (already confusing)

New logic: isInstance → asserts failure when it IS an Iceberg scan

Error message unchanged: "Not iceberg scans." contradicts the assertion condition

Code comment states: "Only handle Iceberg scans", but both old and new logic are confusing

guixiaowen · 2026-05-07T16:36:02Z

+    assert(
+      !(unsupportedMetadataColumns.nonEmpty),
+      "Has per-row materialization (for example _pos).")


If it returns None, it will not be possible to determine why the conversion failed. Therefore, it is recommended to keep the current behavior as it is.

+        try {
+          extConvertProviders.find(h => h.isEnabled(exec) && h.isSupported(exec)) match {
+            case Some(provider) => tryConvert(exec, provider.convert)
+            case None =>
+              Shims.get.convertMoreSparkPlan(exec) match {
+                case Some(exec) =>
                  exec.setTagValue(convertibleTag, true)
                  exec.setTagValue(convertStrategyTag, AlwaysConvert)
                  exec
-                } else {
-                  addNeverConvertReasonTag(exec)
-                }
-            }
+                case None =>
+                  if (Shims.get.isNative(exec)) { // for QueryStageInput and CustomShuffleReader
+                    exec.setTagValue(convertibleTag, true)
+                    exec.setTagValue(convertStrategyTag, AlwaysConvert)
+                    exec
+                  } else {
+                    addNeverConvertReasonTag(exec)
+                  }
+              }
+          }


+            exec.setTagValue(convertToNonNativeTag, true)
+            exec.setTagValue(convertibleTag, false)
+            exec.setTagValue(convertStrategyTag, NeverConvert)
+            exec.setTagValue(
+              neverConvertReasonTag,
+              s"${e.getMessage.replaceFirst("^assertion failed: ?", "")}")


+    assert(
+      !(unsupportedMetadataColumns.nonEmpty),
+      "Has per-row materialization (for example _pos).")


+        val neverConvertReasonTag: TreeNodeTag[String] = TreeNodeTag("auron.never.convert.reason")
+        assert(collectFirst(df.queryExecution.executedPlan) { case batchScanExec: BatchScanExec =>
+          batchScanExec.getTagValue(neverConvertReasonTag)
+        }.get.get.equals("Conversion disabled: auron.enable.iceberg.scan=false."))


# Conflicts: # thirdparty/auron-iceberg/src/main/scala/org/apache/spark/sql/auron/iceberg/IcebergScanSupport.scala # thirdparty/auron-iceberg/src/test/scala/org/apache/auron/iceberg/AuronIcebergIntegrationSuite.scala

guixiaowen · 2026-05-10T07:12:05Z

@slfan1989 fan @yew1eb y Thank you very much for reviewing the code. The conflict has been resolved.

…berg scan scenarios

slfan1989

@guixiaowen Thank you very much for your contribution, but I have some questions about this PR.

slfan1989 · 2026-05-11T02:02:17Z

  def plan(exec: BatchScanExec): Option[IcebergScanPlan] = {
    val scan = exec.scan
    val scanClassName = scan.getClass.getName
    // Only handle Iceberg scans; other sources must stay on Spark's path.
-    if (!scanClassName.startsWith("org.apache.iceberg.spark.source.")) {
-      return None
-    }
+    assert(scanClassName.startsWith("org.apache.iceberg.spark.source."), "Not iceberg scans.")

    // Changelog scan carries row-level changes; not supported by native COW-only path.
-    if (scanClassName == "org.apache.iceberg.spark.source.SparkChangelogScan") {
-      return None
-    }
+    assert(
+      !(scanClassName == "org.apache.iceberg.spark.source.SparkChangelogScan"),
+      "Not iceberg cow table.")



// Before assert(!AuronIcebergSourceUtil.getClassOfSparkBatchQueryScan.isInstance(scan), "Not iceberg scans.") // After assert(AuronIcebergSourceUtil.getClassOfSparkBatchQueryScan.isInstance(scan), "Not iceberg scans.")

Original logic: !isInstance → asserts failure when it's NOT an Iceberg scan (already confusing)

New logic: isInstance → asserts failure when it IS an Iceberg scan

Error message unchanged: "Not iceberg scans." contradicts the assertion condition

Code comment states: "Only handle Iceberg scans", but both old and new logic are confusing

slfan1989 · 2026-05-11T02:05:21Z

-    if (!AuronIcebergSourceUtil.getClassOfSparkInputPartition().isInstance(partition)) {
-      return None
-    }
+    assert(


// Line 51: no parentheses AuronIcebergSourceUtil.getClassOfSparkBatchQueryScan.isInstance(scan) // Line 195: with parentheses AuronIcebergSourceUtil.getClassOfSparkInputPartition().isInstance(partition)

Problem: Inconsistent style - one method call has parentheses, the other doesn't.

…berg scan scenarios

slfan1989 · 2026-05-13T01:46:57Z

@guixiaowen I've quickly reviewed it and have no further comments. Counld we improve and complete the PR description?

guixiaowen · 2026-05-15T14:57:11Z

@cxzl25 Could you please help review this code? Thanks a lot.

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

+            exec.setTagValue(convertStrategyTag, NeverConvert)
+            exec.setTagValue(
+              neverConvertReasonTag,
+              s"${e.getMessage.replaceFirst("^assertion failed: ?", "")}")


+    exec match {
+      case _: BatchScanExec =>
+        val enabled = SparkAuronConfiguration.ENABLE_ICEBERG_SCAN.get()
+        assert(enabled, "Conversion disabled: auron.enable.iceberg.scan=false.")


-    }
+    assert(
+      fileSchema.fields.forall(field => NativeConverters.isTypeSupported(field.dataType)),
+      "Has iceberg data file payload.")


github-actions Bot added spark build thirdparty-iceberg labels May 6, 2026

guixiaowen changed the title ~~test~~ [AURON #2238] Add support for auron.never.convert.reason in Iceberg scan scenarios May 6, 2026

github-actions Bot added the thirdparty-paimon label May 6, 2026

[AURON apache#2238] Add support for auron.never.convert.reason in Ice…

7f35ead

…berg scan scenarios

guixiaowen force-pushed the foriceberg_addnever_convert_reason branch from 844591e to 7f35ead Compare May 6, 2026 16:48

slfan1989 requested a review from Copilot May 6, 2026 23:09

Copilot started reviewing on behalf of slfan1989 May 6, 2026 23:10 View session

Copilot AI reviewed May 6, 2026

View reviewed changes

slfan1989 approved these changes May 8, 2026

View reviewed changes

yew1eb approved these changes May 8, 2026

View reviewed changes

Merge branch 'master' into foriceberg_addnever_convert_reason

d915fda

# Conflicts: # thirdparty/auron-iceberg/src/main/scala/org/apache/spark/sql/auron/iceberg/IcebergScanSupport.scala # thirdparty/auron-iceberg/src/test/scala/org/apache/auron/iceberg/AuronIcebergIntegrationSuite.scala

[AURON apache#2238] Add support for auron.never.convert.reason in Ice…

266518e

…berg scan scenarios

guixiaowen force-pushed the foriceberg_addnever_convert_reason branch from 9ecc53b to 266518e Compare May 10, 2026 13:38

slfan1989 requested changes May 11, 2026

View reviewed changes

guihuawen added 2 commits May 11, 2026 13:08

[AURON apache#2238] Add support for auron.never.convert.reason in Ice…

5d281c7

…berg scan scenarios

[AURON apache#2238] Add support for auron.never.convert.reason in Ice…

93bce39

…berg scan scenarios

slfan1989 approved these changes May 14, 2026

View reviewed changes

slfan1989 requested a review from Copilot May 15, 2026 15:18

Copilot started reviewing on behalf of slfan1989 May 15, 2026 15:18 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

yew1eb approved these changes May 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AURON #2238] Add support for auron.never.convert.reason in Iceberg scan scenarios #2237

[AURON #2238] Add support for auron.never.convert.reason in Iceberg scan scenarios #2237
guixiaowen wants to merge 5 commits into
apache:masterfrom
guixiaowen:foriceberg_addnever_convert_reason

guixiaowen commented May 6, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

guixiaowen May 7, 2026

Uh oh!

guixiaowen May 7, 2026

Uh oh!

slfan1989 May 11, 2026

Uh oh!

guixiaowen May 7, 2026

Uh oh!

guixiaowen commented May 10, 2026

Uh oh!

slfan1989 left a comment •

edited

Loading

Uh oh!

slfan1989 May 11, 2026

Uh oh!

slfan1989 May 11, 2026

Uh oh!

slfan1989 commented May 13, 2026

Uh oh!

guixiaowen commented May 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

guixiaowen commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

How was this patch tested?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

guixiaowen May 7, 2026

Choose a reason for hiding this comment

Uh oh!

guixiaowen May 7, 2026

Choose a reason for hiding this comment

Uh oh!

slfan1989 May 11, 2026

Choose a reason for hiding this comment

Uh oh!

guixiaowen May 7, 2026

Choose a reason for hiding this comment

Uh oh!

guixiaowen commented May 10, 2026

Uh oh!

slfan1989 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

slfan1989 May 11, 2026

Choose a reason for hiding this comment

Uh oh!

slfan1989 May 11, 2026

Choose a reason for hiding this comment

Uh oh!

slfan1989 commented May 13, 2026

Uh oh!

guixiaowen commented May 15, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

guixiaowen commented May 6, 2026 •

edited

Loading

slfan1989 left a comment •

edited

Loading