Skip to content

test(sqllogictest): stabilize parquet output_rows_skew with WITH ORDER#21898

Merged
alamb merged 3 commits intoapache:mainfrom
RatulDawar:fix/explain-analyze-parquet-skew-with-order
Apr 30, 2026
Merged

test(sqllogictest): stabilize parquet output_rows_skew with WITH ORDER#21898
alamb merged 3 commits intoapache:mainfrom
RatulDawar:fix/explain-analyze-parquet-skew-with-order

Conversation

@RatulDawar
Copy link
Copy Markdown
Contributor

Summary

Adds WITH ORDER (x) to CREATE EXTERNAL TABLE skew_parquet / skew_parquet_single in explain_analyze.slt so FileScanConfig preserves scan ordering (preserve_order), keeping per-partition output_rows stable under dynamic file scheduling (PR #21351).

Related

Testing

  • cargo test -p datafusion-sqllogictest --test sqllogictests -- explain_analyze (recommended before merge)

Sqllogictest-only change.

Made with Cursor

CREATE EXTERNAL TABLE skew_parquet now uses WITH ORDER (x) so FileScanConfig
preserves scan ordering and per-partition row counts stay deterministic with
dynamic file scheduling.

Refs discussion after PR apache#21351 / apache#21866.

Made-with: Cursor
@github-actions github-actions Bot added the sqllogictest SQL Logic Tests (.slt) label Apr 28, 2026
@2010YOUY01
Copy link
Copy Markdown
Contributor

Those tests are already stable, and I think the with order statement has no effect.

We have to fix the skipped one. The comments explains the root cause, though I'm not sure about the solution yet.

@RatulDawar
Copy link
Copy Markdown
Contributor Author

Yes, I think I worded myself incorrectly. I wanted to improved the coverage as mentioned in the comment by @alamb.
For I checked the code and saw that dynamic file stream task scheduling worked only when preserve_order is false.
So I was thinking that to make the test deterministic we can change tests to trigger preserve order.

Copy link
Copy Markdown
Contributor

@2010YOUY01 2010YOUY01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, this makes sense. Thank you.

It's not obvious why this fix works, let's add more comments.

Comment thread datafusion/sqllogictest/test_files/explain_analyze.slt
Co-authored-by: Yongting You <2010youy01@gmail.com>
@RatulDawar
Copy link
Copy Markdown
Contributor Author

@2010YOUY01 made the comment changes

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 29, 2026

CI failure is a new check added in

Working on resolving it

@rluvaton
Copy link
Copy Markdown
Member

aligned with main since the fix for the detect breaking changes is now resolved, sorry for the trouble

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 30, 2026

Thanks @rluvaton

@alamb alamb added this pull request to the merge queue Apr 30, 2026
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 30, 2026

Thanks @RatulDawar and @2010YOUY01

Merged via the queue into apache:main with commit c2824b5 Apr 30, 2026
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants