[core] introduce Placeholder for Blob File Format by steFaiz · Pull Request #7889 · apache/paimon

steFaiz · 2026-05-18T11:16:56Z

Purpose

This is the first part of #7881
Including:

Bump Blob File Format to V2, introducing a PlaceHolder Blob.
Introduce a fallbackReader for blob to skip placeholders. This is a two-level abstraction:
a. At first, all data files will be divided according to max_seq_num
b. within each group, create a sequential reader to logically concat files and fill missing gaps. For example: If the full row range of normal files is [0, 100], but some group only have one file with range [20, 80], the output is: [0, 19] -> filled with placeholders; [20, 80] -> records from files; [81, 100] -> filled with placeholders.
c. create readers for each group, and read the blob from the max group whose value is NOT a placeholder.

The mechanism can be illustrated as below:

Tests

ITCase and Unit tests

JingsongLi · 2026-05-20T09:34:41Z

+     * The placeholder blob, mainly for blob update in data-evolution. It should never be exposed to
+     * users.
+     */
+    Blob PLACE_HOLDER =


This is strange, maybe just use NULL as place holder?

Thanks for your advise! But in #7125 we supports storing nulls in blob file. I'm not clear how to distinguish placeholders and native NULLs if so.

From the semantics, NULLs are exposed to users, users know that they store some nulls. But placeholders are fully internal used, users should never be aware about them. If users set some rows as nulls, we may fallback those rows to earlier versions, this is not expected in our design.

Could you please give me some advise?

Perhaps you can consider using row number in blob to determine how to merge? You can just return valid blobs with row number.

The row number is actually the primary key.

I understand that you not only need this class for reading, but also for writing. If you skip these elements, the changes will be significant.

I thin you can just introduce a BlobPlaceHolder implements Blob, Serializable for this, use instance of is better.

Thanks! I'll modify my code!

JingsongLi

Review Comments for PR #7889

Overall this is a well-designed change with thorough tests. The sequence-group fallback mechanism for blob placeholders is a solid approach. A few observations:

Issues

1. Typo in comment (DataEvolutionSplitRead.java)

} else if (bunch instanceof BlobFileBunch) {
    // for blob funch, fallback on placeholders

→ "funch" should be "bunch"

2. Potential resource leak in BlobFallbackRecordReader.readBatch()

for (int i = 0; i < groupReaders.size(); i++) {
    RecordIterator<InternalRow> iterator = groupReaders.get(i).readBatch();
    if (iterator == null) {
        return null;  // ← iterators[0..i-1] are not released
    }
    iterators[i] = iterator;
}

If the k-th reader returns null, the already-obtained iterators [0, k-1] will never have releaseBatch() called. Should release them before returning null.

3. Memory pressure from ForceSingleBatchReader wrapping all group readers

Each sequence group is fully materialized into memory via ForceSingleBatchReader. When there are many sequence groups with large row ranges, this could cause significant memory pressure, especially when blob-as-descriptor is disabled and actual blob data is loaded. The TODO comment acknowledges this - just want to confirm this is acceptable for the first iteration.

4. Singleton placeholder row reuse in BlobSequenceGroupRecordReader

private InternalRow placeHolderRow() {
    if (placeholderRow == null) {
        GenericRow row = new GenericRow(readRowType.getFieldCount());
        row.setField(blobIndex, BlobPlaceholder.INSTANCE);
        placeholderRow = row;
    }
    return placeholderRow;
}

This returns the same mutable GenericRow instance for all placeholder positions. It works here because ForceSingleBatchReader copies the data, but it's fragile - a future caller that holds references to returned rows would see aliased data. Consider adding a comment noting this intentional reuse.

5. BlobFileBunch doesn't validate schemaId across files (by design?)

VectorFileBunch.add() enforces file.schemaId() == files.get(0).schemaId(), but BlobFileBunch.add() only checks writeCols. This seems intentional since blob files from different sequences naturally have different schemas, but worth confirming.

6. DataEvolutionFileReader contract relaxation

- checkArgument(readers != null && readers.length > 1, "Readers should be more than 1");
+ checkArgument(readers != null && readers.length >= 1, "should not pass empty readers.");

This relaxes the precondition for all callers of DataEvolutionFileReader, not just the blob path. Is there a case where a single reader is passed from non-blob paths? If not, consider keeping > 1 for non-blob scenarios or documenting when single-reader is expected.

Minor / Style

In BlobFallbackRecordReaderTest, the ReadResult.add() method silently treats placeholder rows differently (counts them vs collecting rowIds). This is fine for testing but the test would be more explicit if assertions on placeholder count were always paired with total row count assertions.
The fixedBlobBytes helper in BlobUpdateTest allocates 2 * 1024 * 124 bytes (≈248KB). Was 2 * 1024 * 1024 (2MB) intended? Or is this intentionally smaller to keep tests fast?

Positive

The separation of SpecialFieldBunch into BlobFileBunch and VectorFileBunch is a good refactoring - they have fundamentally different semantics now.
The backward-compatible version 1 reading test (testReadLegacyVersionOneBlobFile) is a nice addition.
The BlobSequenceGroupRecordReader javadoc with ASCII art is very helpful for understanding the complex layout.
Python-side changes properly mirror the Java changes with appropriate error handling for unsupported placeholder in read_arrow_batch.

steFaiz · 2026-05-25T07:00:51Z

@JingsongLi Thanks! 1 & 2 are addressed. Left some explaination for the rest:

Memory pressure from ForceSingleBatchReader wrapping all group readers

Yes, this is a problem, and I've mark this as TODO in my original implementation. I think it can be solved by:

introducing a discard method to discard iterator's next record. In blob, we can just move to the next blob.
reading all blobs as descriptor. But this may break the optimization introduced in [format] read blobs as bytes when blob-as-descriptor is false #6989

I'm pleasant to continuously optimize this, but I think can be introduced in future PRs.

Singleton placeholder row reuse in BlobSequenceGroupRecordReader

This is safe, because Placeholder will never be read, all method callings will throw an error

BlobFileBunch doesn't validate schemaId across files (by design?)

This is intentionally fixed in #7618

DataEvolutionFileReader contract relaxation

This is for only-blob projection situations. If only one blob field is selected, the datafiles num in each blob bunch is still
'> 1' because of update. So the code path will still go through the createUnionReader.
Refactoring the whole code is too noisy in this PR. And this relaxation do not affect correctness of efficiency. It could be optimized in separated PRs.

leaves12138

Thanks for the update. I went through the latest patch again, focusing on the placeholder encoding/decoding path and the data-evolution fallback reader.

The previous major concerns look addressed or acceptable for this iteration:

BlobFormatWriter now encodes null and placeholder distinctly, and BlobFormatReader maps placeholders back to the internal sentinel without exposing blob data accessors.
BlobFallbackRecordReader now keeps blob sequence groups aligned and releases the normal returned iterators through releaseBatch().
The fallback behavior handles row-range pushdown and overlapping sequence groups in the added tests.
The DataEvolutionFileReader one-reader relaxation makes sense for blob-only projections.

I also ran a local compile check on the patched core modules:

mvn -pl paimon-core -am -DskipTests -Dspotless.check.skip=true -Drat.skip=true -Dcheckstyle.skip=true -Dscala-2.12 -DskipITs -Dfast compile

It passes locally.

I only have two non-blocking follow-ups:

The known memory/IO overhead in BlobFallbackRecordReader is still worth optimizing later, because stale rows are fully advanced even when a newer non-placeholder row has already been found.
There are still a few small wording/typo issues in comments/messages (latestFistRowId, "offset the that row", "could not be null"), but they do not affect correctness.

LGTM from my side.

JingsongLi · 2026-05-26T08:22:44Z

 public class BlobFormatWriter implements FileAwareFormatWriter {

-    public static final byte VERSION = 1;
+    public static final byte VERSION = 2;


It is better to keep current version. This change seems tolerable.

An old version BlobReader will throws an unexpected error when reading file with placeholders.
The BlobFileMeta check nulls by -1:

public boolean isNull(int i) { return blobLengths[i] == -1; }

So in BlobFormatReader it goes to the normal read code path:

if (fileMeta.isNull(currentPosition)) { blob = null; } else { // offsets and length are invalid. long offset = fileMeta.blobOffset(currentPosition) + 4; long length = fileMeta.blobLength(currentPosition) - 16; if (in != null) { blob = Blob.fromData(readInlineBlob(in, offset, length)); } else { blob = Blob.fromFile(fileIO, filePathString, offset, length); } }

The error msg is:

Invalid blob at index 2: offset=3, length=-18 java.io.IOException: Invalid blob at index 2: offset=3, length=-18

I think we should tell users that they are reading V2 data files rather than just this vague error.

As long as it can throw exceptions without producing incorrect data, it's okay

Upgrading all engines to read Blob is a huge undertaking, do not upgrade versions easily

JingsongLi · 2026-05-26T08:27:47Z

+            // for blob bunch, fallback on placeholders
+            int blobIndex = findBlobFieldIndex(readRowType);
+            checkArgument(blobIndex >= 0, "Blob bunch read type should contain a blob field.");
+            return new BlobFallbackRecordReader(


Is there a fast bunch.files().size() == 1 channel here? When there is no need to merge, it should be a simple code path.

Thanks! I've add a shortcut that if all blob files belong to a same max_seq group, just use the concat reader to sequenatially read them

JingsongLi

Thanks @steFaiz for update. Left two comments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

JingsongLi · 2026-05-26T14:08:54Z

+1

steFaiz marked this pull request as draft May 18, 2026 11:17

steFaiz changed the title ~~[core] introduce Placeholder for Blob File Format~~ [wip][core] introduce Placeholder for Blob File Format May 18, 2026

steFaiz marked this pull request as ready for review May 19, 2026 06:19

steFaiz changed the title ~~[wip][core] introduce Placeholder for Blob File Format~~ [core] introduce Placeholder for Blob File Format May 19, 2026

JingsongLi reviewed May 20, 2026

View reviewed changes

steFaiz force-pushed the placeholder_blob branch 2 times, most recently from 355a05b to b2cc640 Compare May 22, 2026 15:34

JingsongLi reviewed May 23, 2026

View reviewed changes

leaves12138 approved these changes May 26, 2026

View reviewed changes

JingsongLi reviewed May 26, 2026

View reviewed changes

JingsongLi requested changes May 26, 2026

View reviewed changes

steFaiz added 3 commits May 26, 2026 20:00

[core] introduce Placeholder for Blob File Format

b430b24

fix tests

84f12d0

do not bump blob file version

573f2d1

steFaiz force-pushed the placeholder_blob branch from ae4578d to 573f2d1 Compare May 26, 2026 12:18

[core] Fix test compilation: inline shaded parquet constants

6046757

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

JingsongLi merged commit 5ce4e01 into apache:master May 26, 2026
15 of 17 checks passed

Conversation

steFaiz commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Tests

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JingsongLi left a comment

Choose a reason for hiding this comment

Review Comments for PR #7889

Issues

Minor / Style

Positive

Uh oh!

steFaiz commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leaves12138 left a comment

Choose a reason for hiding this comment

Uh oh!

JingsongLi May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JingsongLi May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JingsongLi left a comment

Choose a reason for hiding this comment

Uh oh!

JingsongLi commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

steFaiz commented May 18, 2026 •

edited

Loading

steFaiz commented May 25, 2026 •

edited

Loading

JingsongLi May 26, 2026 •

edited

Loading

JingsongLi May 26, 2026 •

edited

Loading