Skip to content

[fix](regression) fix stale expected output for one_level_nestedtypes_with_s3data#62488

Open
Mryange wants to merge 1 commit intoapache:masterfrom
Mryange:update-out-dev-4.14
Open

[fix](regression) fix stale expected output for one_level_nestedtypes_with_s3data#62488
Mryange wants to merge 1 commit intoapache:masterfrom
Mryange:update-out-dev-4.14

Conversation

@Mryange
Copy link
Copy Markdown
Contributor

@Mryange Mryange commented Apr 14, 2026

What problem does this PR solve?

Issue Number: N/A

Problem Summary:

The regression test datatype_p0/nested_types/base_cases/one_level_nestedtypes_with_s3data was failing with a CHAR result mismatch on the order_qt_sql_s3 tag.

Root-cause analysis:

  1. The S3 source files (one_level_array.parquet, .orc, .csv) in oss://doris-regression-hk/regression/datalake/ have not changed since 2024-07-27 (confirmed via ossutil stat). The parquet c_bool column is list<bool> (verified with pyarrow). The column type (array<boolean>) in the test plugin has also never changed.
  2. On 2025-09-12, commit 074d88b (PR [fix](nested-type) fix cases from s3  #55896, "fix cases from s3") added WHERE k1 IS NOT NULL to the query and regenerated the .out. At that time, Doris had a bug in reading parquet list<bool> values inside nested arrays, producing incorrect boolean values (e.g. [0, 0, 1, 0, ...] instead of the correct [0, 0, 0, 1, 1, ...]).
  3. On 2025-12-16, commit 0031179b1e6 (PR [fix](parquet)fix parquet topn lazy mat complex data error result #58785, "fix parquet topn lazy mat complex data error result") refactored ColumnChunkReader to use IN_COLLECTION/OFFSET_INDEX template parameters, giving nested-array columns a distinct and correct read path. This fix incidentally corrected the boolean array reading for columns like c_bool.
  4. After PR [fix](parquet)fix parquet topn lazy mat complex data error result #58785 landed, Doris now reads parquet list<bool> correctly (matching pyarrow), but the .out file was never updated, causing the test to fail.

Fix: force-regenerate one_level_nestedtypes_with_s3data.out using the current correct Doris behavior against the unchanged S3 data.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Apr 14, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@github-actions
Copy link
Copy Markdown
Contributor

Possible file(s) that should be tracked in LFS detected: 🚨

The following file(s) exceeds the file size limit: 1048576 bytes, as set in the .yml configuration files:

  • regression-test/data/datatype_p0/nested_types/base_cases/one_level_nestedtypes_with_s3data.out

Consider using git-lfs to manage large files.

@github-actions github-actions bot added the lfs-detected! Warning Label for use when LFS is detected in the commits of a Pull Request label Apr 14, 2026
@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented Apr 14, 2026

run buildall

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented Apr 14, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking issues found.

Critical checkpoints:

  • Goal of the task: Refresh the stale golden file for regression-test/suites/datatype_p0/nested_types/base_cases/one_level_nestedtypes_with_s3data.groovy after earlier engine-side behavior changes. The PR does that by updating the consumed expected-output file regression-test/data/datatype_p0/nested_types/base_cases/one_level_nestedtypes_with_s3data.out.
  • Is the change small, clear, and focused: Yes. The PR only updates the generated regression result file for the affected suite.
  • Concurrency: Not applicable. No concurrent code path or locking change is involved.
  • Lifecycle management: Not applicable. No lifecycle-sensitive objects, static initialization, or ownership changes are involved.
  • Configuration: None added or changed.
  • Compatibility: No FE/BE protocol, storage format, or rolling-upgrade compatibility surface is changed.
  • Parallel code paths: The relevant regression suite already covers the affected S3 nested-type read path; no missing sister-path code change is visible in this PR because this PR is only updating expected results.
  • Special conditional checks: Not applicable.
  • Test coverage: The existing regression suite already exercises this path. No new suite logic was needed for an expected-result refresh.
  • Test result modification: Yes. The updated .out file is the artifact consumed by the regression framework for this suite. I did not find formatting or framework-path issues in the checked-in result file layout.
  • Observability: Not applicable for an expected-output refresh.
  • Transaction and persistence: Not applicable.
  • Data writes and modifications: Not applicable beyond regression golden data.
  • FE/BE variable passing: None.
  • Performance: Not applicable for this PR.
  • Other issues: No blocking issue found in the submitted change itself.

Residual risk:

  • I did not independently rerun the S3-backed regression in this runner, so correctness is inferred from the existing suite wiring and the PR rationale rather than reproduced end to end in this review session.,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lfs-detected! Warning Label for use when LFS is detected in the commits of a Pull Request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants