Skip to content

[AURON #2238] Add support for auron.never.convert.reason in Iceberg scan scenarios #2237

Open
guixiaowen wants to merge 5 commits into
apache:masterfrom
guixiaowen:foriceberg_addnever_convert_reason
Open

[AURON #2238] Add support for auron.never.convert.reason in Iceberg scan scenarios #2237
guixiaowen wants to merge 5 commits into
apache:masterfrom
guixiaowen:foriceberg_addnever_convert_reason

Conversation

@guixiaowen
Copy link
Copy Markdown
Contributor

@guixiaowen guixiaowen commented May 6, 2026

Which issue does this PR close?

Closes #2238

Rationale for this change

In issue #1419, the reasons for fallback (i.e., why a conversion was not applied) are recorded using the property auron.never.convert.reason.

In issue #1471, these reasons can be observed in the Spark UI, helping users understand why the physical execution plan was not converted.

However, in the current Iceberg fallback scenarios, the property auron.never.convert.reason is not recorded.

The purpose of this issue is to fill this gap by ensuring that auron.never.convert.reason is properly recorded in Iceberg-related fallback cases as well.

What changes are included in this PR?

After an Iceberg scenario call fails, the error information will be recorded in auron.never.convert.reason for easy display.

Are there any user-facing changes?

How was this patch tested?

UT

@guixiaowen guixiaowen changed the title test [AURON #2238] Add support for auron.never.convert.reason in Iceberg scan scenarios May 6, 2026
@guixiaowen guixiaowen force-pushed the foriceberg_addnever_convert_reason branch from 844591e to 7f35ead Compare May 6, 2026 16:48
@slfan1989 slfan1989 requested a review from Copilot May 6, 2026 23:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to ensure Iceberg fallback scenarios populate the Spark plan tag auron.never.convert.reason, making fallback reasons visible (e.g., in Spark UI) similarly to other scan conversions.

Changes:

  • Updated AuronConvertProvider to make isEnabled depend on the current SparkPlan, and adjusted Iceberg/Hudi/Paimon providers accordingly.
  • Added exception-based tagging in AuronConverters.convertSparkPlan to set auron.never.convert.reason when conversion is rejected via assertions/exceptions.
  • Added Iceberg integration tests asserting auron.never.convert.reason is present for disabled Iceberg scan and unsupported metadata-column scenarios.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
thirdparty/auron-paimon/src/main/scala/org/apache/spark/sql/hive/auron/paimon/PaimonConvertProvider.scala Updates provider enablement to be plan-type aware.
thirdparty/auron-iceberg/src/test/scala/org/apache/auron/iceberg/AuronIcebergIntegrationSuite.scala Adds integration tests validating auron.never.convert.reason for Iceberg fallback cases.
thirdparty/auron-iceberg/src/main/scala/org/apache/spark/sql/auron/iceberg/IcebergScanSupport.scala Converts multiple “return None” fallbacks into assert-based failures with messages intended for tagging.
thirdparty/auron-iceberg/src/main/scala/org/apache/spark/sql/auron/iceberg/IcebergConvertProvider.scala Updates isEnabled signature and uses assertions to drive fallback reason tagging.
thirdparty/auron-iceberg/pom.xml Adds scala-library dependency (provided scope).
thirdparty/auron-hudi/src/main/scala/org/apache/spark/sql/auron/hudi/HudiConvertProvider.scala Updates provider enablement to be plan-type aware.
spark-extension/src/main/scala/org/apache/spark/sql/auron/AuronConvertProvider.scala Changes isEnabled API to accept SparkPlan.
spark-extension/src/main/scala/org/apache/spark/sql/auron/AuronConverters.scala Uses isEnabled(exec) and adds try/catch-based never-convert reason tagging in generic conversion path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +31 to +40
override def isEnabled(exec: SparkPlan): Boolean = {
exec match {
case _: BatchScanExec =>
val enabled = SparkAuronConfiguration.ENABLE_ICEBERG_SCAN.get()
assert(enabled, "Conversion disabled: auron.enable.iceberg.scan=false.")
assert(
sparkCompatible,
s"Supported Spark versions: 3.4 to 4.0 (Iceberg ${icebergVersionOrUnknown}).")
return false
enabled
case _ => false
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no issue here. Currently, for the Iceberg scenario, only BatchScanExec is supported. If more scenarios are supported in the future, they can be added here.

Comment on lines 46 to 56
def plan(exec: BatchScanExec): Option[IcebergScanPlan] = {
val scan = exec.scan
val scanClassName = scan.getClass.getName
// Only handle Iceberg scans; other sources must stay on Spark's path.
if (!scanClassName.startsWith("org.apache.iceberg.spark.source.")) {
return None
}
assert(scanClassName.startsWith("org.apache.iceberg.spark.source."), "Not iceberg scans.")

// Changelog scan carries row-level changes; not supported by native COW-only path.
if (scanClassName == "org.apache.iceberg.spark.source.SparkChangelogScan") {
return None
}
assert(
!(scanClassName == "org.apache.iceberg.spark.source.SparkChangelogScan"),
"Not iceberg cow table.")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it returns None, it will not be possible to determine why the conversion failed. Therefore, it is recommended to keep the current behavior as it is.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Before
assert(!AuronIcebergSourceUtil.getClassOfSparkBatchQueryScan.isInstance(scan), 
       "Not iceberg scans.")

// After
assert(AuronIcebergSourceUtil.getClassOfSparkBatchQueryScan.isInstance(scan),
       "Not iceberg scans.")
  • Original logic: !isInstance → asserts failure when it's NOT an Iceberg scan (already confusing)
  • New logic: isInstance → asserts failure when it IS an Iceberg scan
  • Error message unchanged: "Not iceberg scans." contradicts the assertion condition
  • Code comment states: "Only handle Iceberg scans", but both old and new logic are confusing

Comment on lines +61 to +63
assert(
!(unsupportedMetadataColumns.nonEmpty),
"Has per-row materialization (for example _pos).")
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it returns None, it will not be possible to determine why the conversion failed. Therefore, it is recommended to keep the current behavior as it is.

Comment on lines +242 to +260
try {
extConvertProviders.find(h => h.isEnabled(exec) && h.isSupported(exec)) match {
case Some(provider) => tryConvert(exec, provider.convert)
case None =>
Shims.get.convertMoreSparkPlan(exec) match {
case Some(exec) =>
exec.setTagValue(convertibleTag, true)
exec.setTagValue(convertStrategyTag, AlwaysConvert)
exec
} else {
addNeverConvertReasonTag(exec)
}
}
case None =>
if (Shims.get.isNative(exec)) { // for QueryStageInput and CustomShuffleReader
exec.setTagValue(convertibleTag, true)
exec.setTagValue(convertStrategyTag, AlwaysConvert)
exec
} else {
addNeverConvertReasonTag(exec)
}
}
}
Comment on lines +263 to +268
exec.setTagValue(convertToNonNativeTag, true)
exec.setTagValue(convertibleTag, false)
exec.setTagValue(convertStrategyTag, NeverConvert)
exec.setTagValue(
neverConvertReasonTag,
s"${e.getMessage.replaceFirst("^assertion failed: ?", "")}")
Comment on lines +61 to +63
assert(
!(unsupportedMetadataColumns.nonEmpty),
"Has per-row materialization (for example _pos).")
Comment on lines +43 to +46
val neverConvertReasonTag: TreeNodeTag[String] = TreeNodeTag("auron.never.convert.reason")
assert(collectFirst(df.queryExecution.executedPlan) { case batchScanExec: BatchScanExec =>
batchScanExec.getTagValue(neverConvertReasonTag)
}.get.get.equals("Conversion disabled: auron.enable.iceberg.scan=false."))
# Conflicts:
#	thirdparty/auron-iceberg/src/main/scala/org/apache/spark/sql/auron/iceberg/IcebergScanSupport.scala
#	thirdparty/auron-iceberg/src/test/scala/org/apache/auron/iceberg/AuronIcebergIntegrationSuite.scala
@guixiaowen
Copy link
Copy Markdown
Contributor Author

@slfan1989 fan @yew1eb y Thank you very much for reviewing the code. The conflict has been resolved.

@guixiaowen guixiaowen force-pushed the foriceberg_addnever_convert_reason branch from 9ecc53b to 266518e Compare May 10, 2026 13:38
Copy link
Copy Markdown
Contributor

@slfan1989 slfan1989 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guixiaowen Thank you very much for your contribution, but I have some questions about this PR.

Comment on lines 46 to 56
def plan(exec: BatchScanExec): Option[IcebergScanPlan] = {
val scan = exec.scan
val scanClassName = scan.getClass.getName
// Only handle Iceberg scans; other sources must stay on Spark's path.
if (!scanClassName.startsWith("org.apache.iceberg.spark.source.")) {
return None
}
assert(scanClassName.startsWith("org.apache.iceberg.spark.source."), "Not iceberg scans.")

// Changelog scan carries row-level changes; not supported by native COW-only path.
if (scanClassName == "org.apache.iceberg.spark.source.SparkChangelogScan") {
return None
}
assert(
!(scanClassName == "org.apache.iceberg.spark.source.SparkChangelogScan"),
"Not iceberg cow table.")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Before
assert(!AuronIcebergSourceUtil.getClassOfSparkBatchQueryScan.isInstance(scan), 
       "Not iceberg scans.")

// After
assert(AuronIcebergSourceUtil.getClassOfSparkBatchQueryScan.isInstance(scan),
       "Not iceberg scans.")
  • Original logic: !isInstance → asserts failure when it's NOT an Iceberg scan (already confusing)
  • New logic: isInstance → asserts failure when it IS an Iceberg scan
  • Error message unchanged: "Not iceberg scans." contradicts the assertion condition
  • Code comment states: "Only handle Iceberg scans", but both old and new logic are confusing

if (!AuronIcebergSourceUtil.getClassOfSparkInputPartition().isInstance(partition)) {
return None
}
assert(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Line 51: no parentheses
AuronIcebergSourceUtil.getClassOfSparkBatchQueryScan.isInstance(scan)

// Line 195: with parentheses
AuronIcebergSourceUtil.getClassOfSparkInputPartition().isInstance(partition)

Problem: Inconsistent style - one method call has parentheses, the other doesn't.

@slfan1989
Copy link
Copy Markdown
Contributor

@guixiaowen I've quickly reviewed it and have no further comments. Counld we improve and complete the PR description?

@guixiaowen
Copy link
Copy Markdown
Contributor Author

@cxzl25 Could you please help review this code? Thanks a lot.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

exec.setTagValue(convertStrategyTag, NeverConvert)
exec.setTagValue(
neverConvertReasonTag,
s"${e.getMessage.replaceFirst("^assertion failed: ?", "")}")
exec match {
case _: BatchScanExec =>
val enabled = SparkAuronConfiguration.ENABLE_ICEBERG_SCAN.get()
assert(enabled, "Conversion disabled: auron.enable.iceberg.scan=false.")
}
assert(
fileSchema.fields.forall(field => NativeConverters.isTypeSupported(field.dataType)),
"Has iceberg data file payload.")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for auron.never.convert.reason in Iceberg scan scenarios

4 participants