[AURON #1724] Support binary input for Spark substring function#2262
[AURON #1724] Support binary input for Spark substring function#2262lyne7-sc wants to merge 7 commits into
substring function#2262Conversation
ShreyeshArangath
left a comment
There was a problem hiding this comment.
Mostly LGTM, left a few comments
|
@ShreyeshArangath Thanks for your review! I’ve updated the implementation accordingly. |
ShreyeshArangath
left a comment
There was a problem hiding this comment.
Great work! LGTM, just one question
yew1eb
left a comment
There was a problem hiding this comment.
The Rust implementation looks correct. One issue on the Scala side.
| @@ -1025,10 +1025,10 @@ object NativeConverters extends Logging { | |||
| if pos.asInstanceOf[Int] > 0 && len.asInstanceOf[Int] >= 0 => | |||
| val longPos = pos.asInstanceOf[Int].toLong | |||
There was a problem hiding this comment.
The guard pos > 0 && len >= 0 was inherited from the old Substr mapping and is now too restrictive. spark_substring correctly handles pos == 0 (treated as 1) and negative pos (counted from the end), as the Rust unit tests already cover. Cases with pos <= 0 will still fall back to Spark instead of using the native path. Consider relaxing the guard to just len >= 0, or dropping it entirely since the implementation handles all integer values.
There was a problem hiding this comment.
Good catch! That makes sense to me. I’ve relaxed the guard so these cases can go through the native path.
Which issue does this PR close?
Closes #1724
Rationale for this change
Spark
substringsupports both string and binary inputs, while Auron previously mapped it to datafusion'sSubstr, which only handled string-compatible behavior and caused the Spark string/binary substring suite case to be excluded.What changes are included in this PR?
Spark_Substringext function support forUtf8andBinaryinputs.Substringconversion throughSpark_Substringand preserve the input data type.string / binary substring functiontest case.Are there any user-facing changes?
Yes. Spark SQL
substringnow supports binary input in native execution, matching Spark behavior.How was this patch tested?
spark_substring.