[core] Support manifest sort feature when commit by discivigour · Pull Request #7842 · apache/paimon

discivigour · 2026-05-13T11:41:48Z

Purpose

add manifest-sort.enabled to enable manifest sort

Tests

ManifestFileMetaTest.testManifestSortWithOverlappingPartitions()

leaves12138 · 2026-05-19T02:47:43Z

+    /**
+     * Compares the value at field {@code k} of two {@link BinaryRow}s according to {@code type}.
+     */
+    static int compareField(BinaryRow a, BinaryRow b, int k, DataType type) {


Why not use CodeGenUtils.newRecordComparator?

Nice point.

leaves12138 · 2026-05-19T02:57:25Z

+                }
+            }
+
+            if (!addedToExisting) {


Do not use boolean addedToExisting.

Just
List earliestRun = runs.pool();
if (earliestRun == null) {
do something
} else if (compare(xxx) > 0) {
do something
} else {
do something
}

It makes this more pretty

leaves12138 · 2026-05-19T03:03:02Z

+                                last.partitionStats().maxValues(),
+                                sortFieldIndex,
+                                sortFieldType)
+                        >= 0) {


There is overlap in one run if "equals".

I designed it this way to ensure that the minimum number of Sorted runs is built to reduce the burden of sorting.

batch externalSort fix add manifest sort to compact job addTest review mvMorax fix spi proto proto fix fix # Conflicts: # paimon-core/src/main/java/org/apache/paimon/operation/ManifestFileMerger.java

# Conflicts: # paimon-core/src/test/java/org/apache/paimon/schema/SchemaValidationTest.java # Conflicts: # paimon-core/src/test/java/org/apache/paimon/schema/SchemaValidationTest.java

leaves12138

Thanks for the update. I took another pass over the latest revision, and I think there are still a few issues that should be fixed before merging.

The boundary condition for interval overlap still looks wrong.

In ManifestFileSorter.buildLevelSortedRuns, a file is appended to an existing run when file.min >= last.max. In splitIntoSections, a new section is also started when file.min >= sectionMaxBound.

However, partition stats represent closed intervals. For example, [1, 3] and [3, 5] still overlap at partition value 3. A sorted run is documented as containing non-overlapping intervals, so this case should not be placed into the same run. Similarly, sections are used as overlap-connected rewrite units, so this case should not be split into different sections either. I think both checks should use > 0, not >= 0, and we should add a test for the max == min boundary case.
manifest-sort.enabled currently bypasses the original manifest compaction trigger/gate.

ManifestFileMerger.merge directly enters ManifestFileSorter.trySortRewrite and returns from that path when manifest sort is enabled. This means the original manifest.full-compaction-threshold-size and manifest.merge-min-count behavior is no longer applied in the same way. Inside classifyManifests, files are classified only by fileSize < targetSize or delete-range overlap, so small manifests / delete manifests can trigger sort rewrite more aggressively than the existing merge logic.

If this is intentional, I think the new semantics should be documented very clearly. Otherwise, the sort path should preserve the existing full/minor compaction gates, especially manifest.merge-min-count for minor manifest merging.
The partial rewrite path for manifest-sort.max-rewrite-size can break the output order.

When the first section exceeds the rewrite budget, rewriteSections splits it into rewriteFiles and remainingFiles, rewrites the first part, and appends the remaining section to the end of the sections list. If there are later sections with larger key ranges, the remaining part of the current section will be emitted after them, which can produce an order like 0..10, 20..30, 10..20.

To keep the manifest list sorted, I think we should either skip the whole section once the budget is exceeded, or keep the remaining section at the current position/order instead of appending it to the tail. This also needs a regression test.
Test coverage is still missing some important edge cases.

The new tests cover large overlapping ranges and delete elimination, but I do not see coverage for boundary-touching intervals (max == min), manifest.merge-min-count / full threshold behavior under manifest-sort.enabled, manifest-sort.max-rewrite-size preserving global output order, or null partition values. The switch to RecordComparator is a good improvement, but a null partition test would make this safer.

discivigour · 2026-05-19T08:21:48Z

Thanks for the update. I took another pass over the latest revision, and I think there are still a few issues that should be fixed before merging.

The boundary condition for interval overlap still looks wrong.
In ManifestFileSorter.buildLevelSortedRuns, a file is appended to an existing run when file.min >= last.max. In splitIntoSections, a new section is also started when file.min >= sectionMaxBound.
However, partition stats represent closed intervals. For example, [1, 3] and [3, 5] still overlap at partition value 3. A sorted run is documented as containing non-overlapping intervals, so this case should not be placed into the same run. Similarly, sections are used as overlap-connected rewrite units, so this case should not be split into different sections either. I think both checks should use > 0, not >= 0, and we should add a test for the max == min boundary case.

manifest-sort.enabled currently bypasses the original manifest compaction trigger/gate.
ManifestFileMerger.merge directly enters ManifestFileSorter.trySortRewrite and returns from that path when manifest sort is enabled. This means the original manifest.full-compaction-threshold-size and manifest.merge-min-count behavior is no longer applied in the same way. Inside classifyManifests, files are classified only by fileSize < targetSize or delete-range overlap, so small manifests / delete manifests can trigger sort rewrite more aggressively than the existing merge logic.
If this is intentional, I think the new semantics should be documented very clearly. Otherwise, the sort path should preserve the existing full/minor compaction gates, especially manifest.merge-min-count for minor manifest merging.

The partial rewrite path for manifest-sort.max-rewrite-size can break the output order.
When the first section exceeds the rewrite budget, rewriteSections splits it into rewriteFiles and remainingFiles, rewrites the first part, and appends the remaining section to the end of the sections list. If there are later sections with larger key ranges, the remaining part of the current section will be emitted after them, which can produce an order like 0..10, 20..30, 10..20.
To keep the manifest list sorted, I think we should either skip the whole section once the budget is exceeded, or keep the remaining section at the current position/order instead of appending it to the tail. This also needs a regression test.

Test coverage is still missing some important edge cases.
The new tests cover large overlapping ranges and delete elimination, but I do not see coverage for boundary-touching intervals (max == min), manifest.merge-min-count / full threshold behavior under manifest-sort.enabled, manifest-sort.max-rewrite-size preserving global output order, or null partition values. The switch to RecordComparator is a good improvement, but a null partition test would make this safer.

Thanks for your comment.

I designed it this way to ensure that the minimum number of Sorted runs is built to reduce the burden of sorting.
I have introduced manifestFullCompactionThresholdSize to to reduce the phenomenon of "one delete entry causing large-scale file rewriting.
I don‘t think it is a problem.
I will add more tests.

leaves12138

LGTM.
Please check the two questions:
1、If sorted run has files overlap, is it correct?
2、If delete comes, how to deal with it.
Nothing else to me.

discivigour · 2026-05-20T02:23:12Z

LGTM. Please check the two questions: 1、If sorted run has files overlap, is it correct? 2、If delete comes, how to deal with it. Nothing else to me.

If the file endpoints in SortedRun overlap, it does not affect the subsequent reconstruction of the LSM Tree. In addition, when users filter through partitions, they will only read 1-2 more files.
If a delete entry comes, it will first determine whether achieve manifestFullCompactionThresholdSize.If achieved, will eliminate all the delete; If not reach, the meta related to the delete entry partition will be retained and not participate in the sorting to prevent the order of add and delete from being disrupted

leaves12138 · 2026-05-20T05:00:27Z

+                totalDeltaFileSize += file.fileSize();
+            }
+        }
+        boolean removeAllDelete = totalDeltaFileSize >= sizeTrigger;


Rename to triggerFullCompact

leaves12138 · 2026-05-20T05:08:54Z

+        Map<ManifestFileMeta, Boolean> defaultCompactionManifests = new LinkedHashMap<>();
+        List<ManifestFileMeta> lsmFiles = new LinkedList<>(input);
+        Set<FileEntry.Identifier> deleteEntries =
+                FileEntry.readDeletedEntries(manifestFile, input, manifestReadParallelism);


Why read delete every time? If not full compaction, we still need to read all the deletes?

Split full compaction and minor compaction, don't make it mixed. Refer to ManifestFileMerge.merge

Thanks, I will split full compaction and minor compaction.

JingsongLi

Review: [core] Support manifest sort feature when commit

Overall this is a substantial feature that models manifest files like an LSM tree and applies sorted compaction by partition field. The design is well-structured with clear separation of concerns (ManifestFileSorter, ManifestPickStrategy, ManifestAdjacentSortedRun). Below are findings that I believe warrant discussion.

Correctness Concerns

1. Modifying sections list during iteration (ManifestFileSorter.rewriteSections)

for (int i = 0; i < sections.size(); i++) {
    ...
    if (!remainingFiles.isEmpty()) {
        Section remainingSection = new Section(remainingFiles, remainingSize, remainingHasDefault);
        sections.add(remainingSection);  // <-- adds to the list being iterated
    }
    reachedLimit = true;
}

This works because sections.size() is re-evaluated each iteration, but it is fragile and difficult to reason about. If a future change introduces another path that adds to sections, this could loop indefinitely. Consider collecting remaining sections separately and processing them after the main loop.

2. Boundary equality semantics inconsistency

In buildLevelSortedRuns, min == max (boundary equality) means files are non-overlapping and placed in the same SortedRun:

// >= 0 means non-overlapping
if (fieldComparator.compare(file.min, earliestRun.last.max) >= 0) { ... }

But in splitIntoSections, min == sectionMaxBound causes a new section to be created:

// >= 0 means separate sections
if (fieldComparator.compare(file.min, sectionMaxBound) >= 0) { ... }

The two algorithms disagree on what "boundary equality" means. This inconsistency could cause a single SortedRun's files to be split across multiple sections. While this may not cause data loss, it could reduce compaction effectiveness and merits a clear comment explaining the deliberate divergence.

3. ManifestAdjacentSortedRun mutable level field used in equals/hashCode

setLevel() mutates the object after construction, and level participates in equals()/hashCode(). These objects are placed into a HashSet<ManifestAdjacentSortedRun> pickedSet. If setLevel() were ever called after insertion into the set, lookups would silently break. Current code assigns levels before building the set, so it works today, but this is a latent bug waiting to happen. Consider making level part of a builder/factory pattern or removing it from equals/hashCode.

Design Observations

4. Reusing maxSizeAmplificationPercent and sortedRunSizeRatio from data-file compaction

These options (num-sorted-run.size-ratio, sort-spill.threshold) were designed for data-file universal compaction. Reusing them for manifest file LSM compaction couples two unrelated subsystems. The optimal ratio for data files (e.g., default 200%) is likely not appropriate for manifest files which have very different I/O characteristics. Consider introducing dedicated options (e.g., manifest-sort.size-amplification-percent, manifest-sort.size-ratio) with appropriate defaults.

5. ManifestFileMerger.merge() API change

The signature change from explicit parameters to CoreOptions makes the API less explicit and harder to unit-test in isolation. The workaround in compactManifestOnce():

Options compactOptions = Options.fromMap(options.toMap());
compactOptions.set(CoreOptions.MANIFEST_MERGE_MIN_COUNT, 1);
compactOptions.set(CoreOptions.MANIFEST_FULL_COMPACTION_FILE_SIZE, MemorySize.ofBytes(1));

copies the entire options map just to override two values, which is heavier than the previous approach of passing parameters directly. An overload that preserves the old signature (calling into the new one) would avoid this.

6. Memory pressure with large manifest sets

sortAndRewriteFull and sortAndRewriteMinor read all entries from all picked files into memory for sorting. While maxRewriteSize bounds total file size, manifest files can contain a very large number of entries. For tables with millions of data files, this could cause significant GC pressure or OOM. Consider a streaming merge-sort or spill-to-disk approach for very large sections, or at minimum document the memory implication in the option description.

Minor Issues

7. resolveSortField will throw IndexOutOfBoundsException if called with an empty partitionType and null sortPartitionField. The caller guards against this (partitionType.getFieldCount() > 0), but the method itself is package-private and could be called from tests or future code without that guard. A defensive check would be safer.

8. ManifestPickStrategy.MAX_LEVEL = 4 is hardcoded. The number of levels directly affects compaction aggressiveness. Consider making this configurable or at least documenting why 4 is the right constant.

9. Visibility changes in ManifestFileMerger (computeDeletePartitions, FullCompactionReadResult from private to package-private) should be documented as intentional cross-class sharing rather than left implicit.

Test Coverage

The tests cover the key scenarios (overlapping partitions, DELETE elimination, multi-field partitions, schema validation). However, I'd suggest adding:

A test that verifies behavior when maxRewriteSize is hit mid-section (the partial rewrite path).
A test for the minor compaction path where ADD+DELETE pairs span different sections to verify no data loss.
A test with a single-file input to verify no unnecessary rewrites.

Overall the feature is well-thought-out. The main actionable items are the boundary equality inconsistency (#2), the mutable-hashCode issue (#3), and the option reuse concern (#4). Nice work on the LSM-based approach for manifest organization.

discivigour force-pushed the j/manifest5 branch from f3af919 to 8a6e759 Compare May 18, 2026 12:55

leaves12138 reviewed May 19, 2026

View reviewed changes

umi added 26 commits May 19, 2026 13:21

proto

28dfa8e

batch externalSort fix add manifest sort to compact job addTest review mvMorax fix spi proto proto fix fix # Conflicts: # paimon-core/src/main/java/org/apache/paimon/operation/ManifestFileMerger.java

fix

d8f515f

addTest

987d3d2

spotless

c499423

fix

d6c3863

fix

256f0a2

fix

448fc50

rm

f32d3b6

fx

39230db

fix

179276c

# Conflicts: # paimon-core/src/test/java/org/apache/paimon/schema/SchemaValidationTest.java # Conflicts: # paimon-core/src/test/java/org/apache/paimon/schema/SchemaValidationTest.java

fix

643e0f2

fix

eff2865

fix

cd36036

rmMinorComp

e472063

test

42fbcc7

fix

4f889c9

rmTrigger

a718aa0

jili

0604be9

fix

dd85e5a

rmPrint

3ca0c5b

simplied

f272a28

fix

16ad162

rmOpenFileCost

c04c378

spotless

4fd0d05

fix

d2aca05

fmt

115e2e6

discivigour force-pushed the j/manifest5 branch from 67cb0e4 to ac1ba13 Compare May 19, 2026 05:22

leaves12138 requested changes May 19, 2026

View reviewed changes

umi added 5 commits May 19, 2026 16:23

deleteTrigger

d95fa08

addSmall

586da76

test

fd262ee

doc

30907ab

comment

b1fc5eb

discivigour marked this pull request as ready for review May 19, 2026 10:15

umi added 2 commits May 19, 2026 19:45

modifyTests

91e22a1

fmt

63760a7

leaves12138 reviewed May 19, 2026

View reviewed changes

leaves12138 reviewed May 20, 2026

View reviewed changes

umi added 12 commits May 20, 2026 15:34

index

6a92402

split

e885973

fix

4504646

comment

b76c897

splitSortAndRewriteSection

0cb6b2c

fix

ad21fa4

fix

781f678

fix

3845743

refactor

18e9578

fix

01e0af4

static

9236ce4

fix

2f76713

JingsongLi mentioned this pull request May 22, 2026

feat(core): support sort manifest entries by partition #7866

Open

minorDelete

0b36890

JingsongLi reviewed May 23, 2026

View reviewed changes

Conversation

discivigour commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Tests

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leaves12138 May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leaves12138 left a comment

Choose a reason for hiding this comment

Uh oh!

discivigour commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leaves12138 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

discivigour commented May 20, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JingsongLi left a comment

Choose a reason for hiding this comment

Review: [core] Support manifest sort feature when commit

Correctness Concerns

Design Observations

Minor Issues

Test Coverage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

discivigour commented May 13, 2026 •

edited

Loading

leaves12138 May 19, 2026 •

edited

Loading

discivigour commented May 19, 2026 •

edited

Loading

leaves12138 left a comment •

edited

Loading