[Spark] Support writing Log Compaction Files by felipepessoto · Pull Request #7080 · delta-io/delta

felipepessoto · 2026-06-23T22:34:32Z

Which Delta project/connector is this regarding?

Description

This PR adds write support for Log Compaction Files to the Spark. Delta already supports reading compaction files during snapshot construction (spark.databricks.delta.deltaLog.minorCompaction.useForReads, useCompactedDeltasForLogSegment), but there was no way to produce them. This is the writer half, and addresses the Spark portion of #2072.

A log compaction file <x>.<y>.compacted.json aggregates the reconciled actions of commits [x, y] (same action-reconciliation rules as a checkpoint, with commitInfo stripped) into a single file. Readers that understand them can construct a snapshot from checkpoint + a few compactions + a few commits instead of checkpoint + many commits, reducing the number of individual commit files read during snapshot construction (and making a higher checkpoint interval viable in the future — though this PR does not change the default checkpoint interval). Per the protocol they are optional and require no protocol or table-feature upgrade; readers that don't understand them simply ignore them.

Once this is merged, I plan to send a separate PR to address the: Increase the Checkpoint Interval to 20 without affecting the performance of Read/Write operations of #2072.

What this PR adds

LogCompaction — a small writer that reads commits [startVersion, endVersion], reconciles them via InMemoryLogReplay, and writes <start>.<end>.compacted.json. Commit-file paths are resolved through DeltaCommitFileProvider so it works for both filesystem and coordinated-commits / catalog-managed path layouts. Because a compaction file incrementally replaces its commit range (rather than being a full snapshot like a checkpoint), the reconciliation retains tombstones that suppress earlier (pre-window) state: RemoveFile tombstones (already retained by InMemoryLogReplay) and removed = true DomainMetadata tombstones — for the latter, compact opts into a new InMemoryLogReplay(retainDomainMetadataTombstones = true) mode, since the default (checkpoint/snapshot) reconciliation drops domain tombstones per the protocol. Without this, a domain removed inside the window whose add was in an earlier checkpoint/commit would wrongly reappear as active in a compaction-backed snapshot. It is idempotent: if the target <start>.<end>.compacted.json already exists it returns early without re-reading or rewriting, so concurrent writers reaching the same interval boundary don't redundantly recompute the same file. The write itself uses overwrite = false, so the residual race where two writers both pass the existence check is resolved atomically — the loser receives a FileAlreadyExistsException, which (because the reconciled content for a range is deterministic and therefore equivalent) is caught and treated as a successful no-op rather than clobbering the existing file. To bound driver memory (reconciliation runs in a single in-memory log replay on the driver, unlike distributed checkpoint/snapshot reconstruction), a window whose combined commit-file size exceeds deltaLog.minorCompaction.maxWindowSizeBytes (default 1 GiB) is skipped — measured for free from the snapshot's already-listed FileStatus entries. Each compact call emits a delta.logCompaction.stats telemetry event (LogCompactionMetrics: start/end version, status, duration, commits/actions reconciled, resulting file size, window size, and skip reason) for observability, mirroring how checkpointing records metrics.
LogCompactionHook — a post-commit hook that, when enabled, produces a compaction file after a commit whose version is a multiple of the configured interval. It compacts a fixed window [max(committedVersion - interval + 1, lastCheckpointVersion + 1), committedVersion], and is skipped when a checkpoint was just written (a checkpoint already subsumes those commits). Fixed, non-overlapping windows are required so the reader's greedy selection can chain them; an ever-growing range would produce overlapping files where the reader only ever uses the smallest one. To get cleanly tiling, reader-usable files, set the checkpoint interval to a multiple of (and larger than) the compaction interval. Per the protocol, a compaction may only be produced for published (backfilled) versions; on catalog-managed / coordinated-commits tables the hook therefore synchronously backfills the window's commits before compacting via Snapshot.ensureCommitFilesBackfilled — the same mechanism checkpoint writing uses. This is a no-op on filesystem-based tables.
Configuration:
- delta.logCompactionInterval (table property, int, default 5, must be >= 2) — the compaction interval, persisted per-table like delta.checkpointInterval.
- spark.databricks.delta.deltaLog.minorCompaction.useForWrites (session SQLConf, internal, default true) — whether the post-commit hook produces compaction files. Named to mirror the existing read config.
- spark.databricks.delta.deltaLog.minorCompaction.maxWindowSizeBytes (session SQLConf, internal, default 1 GiB) — driver-memory guard: if the combined size of a window's commit files exceeds this, that window is skipped (reconciliation happens in a single in-memory log replay on the driver). A non-positive value disables the guard.
- Read side (existing, unchanged): spark.databricks.delta.deltaLog.minorCompaction.useForReads (internal, default true) — whether snapshots are built using compaction files.
- The default delta.checkpointInterval is unchanged (10). With the default compaction interval of 5 (a divisor of, and smaller than, the checkpoint interval), the hook produces one mid-cadence compaction (e.g. [1,5]) between checkpoints — so a compaction-aware reader reads checkpoint + 1 compaction + a few commits instead of checkpoint + up to 9 commits, without changing checkpoint frequency for anyone.
Metadata cleanup — expired compaction files are now cleaned up: a compaction whose start version is at or before the latest checkpoint version (read from _last_checkpoint) and older than the retention cutoff is deleted (the age gate preserves anything still inside the retention / time-travel window), in the spirit of the protocol's metadata cleanup section. This runs inside the existing cleanUpExpiredLogs flow, after the same enableExpiredLogCleanup / metadataCleanupAllowed gates as the existing delta/checkpoint/checksum cleanup, so it inherits the same policy (including for catalog-managed tables) and does not introduce a new cleanup entry point. The deletion itself is a separate pass, intentionally not fed through BufferingLogDeletionIterator, so it can never influence the retention decisions for actual commit/checkpoint/checksum files. Expired compaction files are peeled off the same _delta_log listing that drives commit/checkpoint cleanup (no second listing). This peeling is best-effort and self-healing: if the listing iterator stops early at a non-expired segment, a trailing expired compaction file is simply collected in a later cleanup round — it's a derived/optional file already subsumed by the checkpoint and never selected by readers, so this never affects correctness or commit-file retention.

Known limitations / follow-ups

Enabled-by-default test interference. Because the hook is on by default and the default compaction interval (5) is smaller than the checkpoint interval (10), suites that make precise _delta_log file-set assertions across ≥5 commits can now see compaction files. The affected suites disable the write hook in their base (DeltaRetentionSuiteBase, DeltaLogMinorCompactionSuite); a full CI run may surface additional suites needing the same one-line opt-out. (The global delta.checkpointInterval default is intentionally not changed by this PR.)
Asynchronous compaction — intentionally declined. Compaction runs synchronously on the committing thread, exactly like CheckpointHook. Moving it to a background thread to avoid the boundary-commit latency spike would be a deliberate new design, revisited only if that latency becomes a concern.

How was this patch tested?

New and existing unit tests (Spark):

LogCompactionSuite (new): the feature is enabled by default and can be disabled via deltaLog.minorCompaction.useForWrites; the hook produces the expected non-overlapping fixed windows at the configured interval; produced files are used for snapshot construction and yield identical state/checksum vs. reading raw commits; a checkpoint subsumes the compaction at its boundary and bounds the next window; no compaction is produced before a full window exists; the hook is registered on every transaction; LogCompaction.compact writes a reconciled file without commitInfo; LogCompaction.compact is a no-op when the target compaction file already exists (idempotency / skip-if-exists); and completed vs. skipped compactions emit the expected delta.logCompaction.stats telemetry (LogCompactionMetrics); and a compaction over commits containing non-trivial action types (deletion vectors, row-tracking baseRowId, domain metadata, and a SetTransaction identifier) preserves them and yields a snapshot identical to one built from the raw commits; and a compaction whose window contains a DomainMetadata removal whose add precedes the window retains the removal tombstone so the compaction-backed snapshot agrees with the raw-commit snapshot (the domain stays removed); and a window whose combined commit-file size exceeds deltaLog.minorCompaction.maxWindowSizeBytes is skipped (no file written, windowTooLarge telemetry) while a window under the cap is compacted.
LogCompactionWithCatalogOwnedSuite (new): on a catalog-owned table with batched backfill, the hook backfills the window's commits and then produces the same compaction windows it would on a filesystem table ([1,5], [6,10]), each covering only published (backfilled) versions; the snapshot uses them and the table reads correctly.
DeltaRetentionSuite (new test): old compaction files (start version ≤ the latest checkpoint version, including the boundary where start version equals the checkpoint version) are cleaned up while newer ones are retained; also verified under DeltaRetentionWithCatalogOwnedBatch1Suite (catalog-owned).

A full CI run may surface additional suites making precise _delta_log assertions that need the one-line write-hook opt-out.

Does this PR introduce any user-facing changes?

Yes. By default, Delta now writes optional log compaction files (<x>.<y>.compacted.json) to _delta_log between checkpoints (session config …deltaLog.minorCompaction.useForWrites, default true; table property delta.logCompactionInterval, default 5). The default delta.checkpointInterval is unchanged (10). These files require no protocol or table-feature change and are ignored by readers that don't support them; compaction can be turned off via deltaLog.minorCompaction.useForWrites = false.

Delta already supports reading log compaction files (`<x>.<y>.compacted.json`); this adds the write side for Spark. A compaction file aggregates the reconciled actions of a commit range into one file (same action-reconciliation rules as a checkpoint, with `commitInfo` stripped), letting a compaction-aware reader build a snapshot from `checkpoint + a few compactions + a few commits` instead of `checkpoint + many commits`. They are optional and require no protocol or table-feature upgrade; readers that don't understand them ignore them. What this adds: - LogCompaction: reconciles commits [start, end] via InMemoryLogReplay and writes <start>.<end>.compacted.json. Commit paths are resolved through DeltaCommitFileProvider, so it works for filesystem and coordinated-commits / catalog-managed layouts. Idempotent: skips when the target exists (fs.exists fast-path) and writes with overwrite = false, treating a concurrent FileAlreadyExistsException as a successful no-op (content is deterministic). Bounds driver memory by skipping windows whose combined commit-file size exceeds deltaLog.minorCompaction.maxWindowSizeBytes. Emits a delta.logCompaction.stats event (LogCompactionMetrics) for each call. - LogCompactionHook: post-commit hook that, when enabled, produces a compaction over the fixed window [max(committedVersion - interval + 1, lastCheckpointVersion + 1), committedVersion] at each interval boundary, and is skipped when a checkpoint was just written. Fixed, non-overlapping windows are required so the reader's greedy selection can chain them. Per the protocol, only published (backfilled) versions are compacted, so on coordinated-commits / catalog-managed tables it first calls Snapshot.ensureCommitFilesBackfilled (the same mechanism checkpoint writing uses). Runs synchronously like CheckpointHook. - InMemoryLogReplay: add retainDomainMetadataTombstones (default false, so snapshot/checkpoint/checksum behavior is unchanged). A compaction is an incremental replacement for its range, so unlike a checkpoint it must retain removed=true DomainMetadata tombstones to suppress an add that precedes the window; LogCompaction opts in. - Metadata cleanup: expired compaction files are deleted inside the existing cleanUpExpiredLogs flow, under the same gates as delta/checkpoint/checksum cleanup. A file is removed when its start version is at or before the latest checkpoint and it is older than the retention cutoff. Deletion is a separate pass, never fed through BufferingLogDeletionIterator, so it cannot perturb commit-file retention; expired files are peeled off the same _delta_log listing to avoid a second listing (best-effort and self-healing). - Config: delta.logCompactionInterval (table property, default 5) and session SQLConfs deltaLog.minorCompaction.{useForWrites (default true), maxWindowSizeBytes (default 1 GiB)}; the existing read-side deltaLog.minorCompaction.useForReads is unchanged. The global delta.checkpointInterval default is left unchanged. Tests: LogCompactionSuite (windows, snapshot equivalence vs raw commits, idempotency, telemetry, non-trivial actions incl. DVs/row-tracking/ domain metadata/SetTransaction, domain-tombstone retention, size guard), LogCompactionWithCatalogOwnedSuite (backfill-then-compact end to end), and a DeltaRetentionSuite cleanup test. The read path remains covered by DeltaLogMinorCompactionSuite. Signed-off-by: Felipe Fujiy Pessoto <felipepessoto@hotmail.com>

felipepessoto · 2026-06-23T22:39:45Z

@timothyw553 could you please trigger CI.

Do you know who would be the best person to review it? @prakharjain09 / @ryan-johnson-databricks?

TIA

…delta_log assertions Signed-off-by: Felipe Pessoto <felipepessoto@hotmail.com>

felipepessoto · 2026-06-25T04:58:23Z

@timothyw553 I fixed some tests relying on precise number of delta log file count. Could you trigger it again, please?

felipepessoto force-pushed the log_compaction_files branch from c6c544f to 8191bbd Compare June 23, 2026 22:37

felipepessoto force-pushed the log_compaction_files branch from 8191bbd to e5aac7c Compare June 23, 2026 22:38

Disable default-on log compaction write hook in suites with precise _…

0e52297

…delta_log assertions Signed-off-by: Felipe Pessoto <felipepessoto@hotmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Spark] Support writing Log Compaction Files#7080

[Spark] Support writing Log Compaction Files#7080
felipepessoto wants to merge 2 commits into
delta-io:masterfrom
felipepessoto:log_compaction_files

felipepessoto commented Jun 23, 2026

Uh oh!

felipepessoto commented Jun 23, 2026

Uh oh!

felipepessoto commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

felipepessoto commented Jun 23, 2026

Which Delta project/connector is this regarding?

Description

What this PR adds

Known limitations / follow-ups

How was this patch tested?

Does this PR introduce any user-facing changes?

Uh oh!

felipepessoto commented Jun 23, 2026

Uh oh!

felipepessoto commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant