Skip to content

[FLINK-39715] [table-planner] Fix IndexOutOfBoundsException in FlinkExpandConversionRule for global aggregate after sort#28218

Open
arvindKandpal-ksolves wants to merge 1 commit into
apache:masterfrom
arvindKandpal-ksolves:FLINK-39715
Open

[FLINK-39715] [table-planner] Fix IndexOutOfBoundsException in FlinkExpandConversionRule for global aggregate after sort#28218
arvindKandpal-ksolves wants to merge 1 commit into
apache:masterfrom
arvindKandpal-ksolves:FLINK-39715

Conversation

@arvindKandpal-ksolves
Copy link
Copy Markdown

What is the purpose of the change

This pull request fixes a query planner crash (IndexOutOfBoundsException) that occurs in batch mode when a global aggregate function (e.g., MAX, MIN, COUNT) is executed on top of a table that was previously sorted using ORDER BY. ( FLINK-39715 )

The root cause was that FlinkExpandConversionRule.satisfyCollation was forcefully creating a BatchPhysicalSort with the original sorting traits (which refer to input field indices), without validating if those indices are still within the bounds of the new node's row type (where field count shrinks to 1 due to global aggregation). This PR introduces an explicit bound check for required collation field indices before attempting to satisfy the collation trait.

Brief change log

  • Modified FlinkExpandConversionRule.satisfyCollation to validate that all field indices in requiredCollation are strictly less than the current node's getRowType.getFieldCount. Returns null if the validation fails.
  • Modified FlinkExpandConversionRule.satisfyTraitsBySelf to gracefully return and skip conversion if satisfyCollation returns null.
  • Added ExpandConversionRuleFixTest.java under org.apache.flink.table.planner.plan.rules.physical to verify the fix and protect against future regressions.

Verifying this change

This change added tests and can be verified as follows:

  • Added a new unit test ExpandConversionRuleFixTest#testOrderByWithGlobalAggregate which reproduces the exact batch mode pipeline (ORDER BY followed by global aggregate) and asserts that the planner optimizes the execution plan successfully without throwing an exception.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

@flinkbot
Copy link
Copy Markdown
Collaborator

flinkbot commented May 21, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

…pandConversionRule for global aggregate after sort
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants