Skip to content

ct/l0/gc: collect stranded L0 objects after all cloud topics deleted#30200

Merged
oleiman merged 1 commit intodevfrom
ct/noticket/l0-gc-no-cloud-topics
Apr 24, 2026
Merged

ct/l0/gc: collect stranded L0 objects after all cloud topics deleted#30200
oleiman merged 1 commit intodevfrom
ct/noticket/l0-gc-no-cloud-topics

Conversation

@oleiman
Copy link
Copy Markdown
Member

@oleiman oleiman commented Apr 16, 2026

max_gc_eligible_epoch returned std::nullopt when the partition
snapshot was empty, causing try_to_collect to report
no_collectible_epoch after listing. L0 objects left over from
previously deleted topics stayed in the bucket forever.

Fall through with snap_revision (controller last applied offset)
as the watermark: this is the zero-iteration case of the existing
join, which already starts at snap_revision and walks it down per
partition.

Safety rests on invariants from the existing algorithm:

  • get_partitions ensures that the topic table snapshot is in sync
    with the controller stm (i.e. snap revision)
  • L0 objects never have an epoch less than any constinuent topic's
    initial revision ID.

So with an empty topic table at snapshot revision N, any new data
must have an epoch > N. Therefore any L0 objects w/ epoch <= N is
safe to GC.

Includes a ducktape regression test:

  • pause GC
  • accumulate some L0 objects
  • delete the only cloud topic
  • resume GC
  • assert that orphaned objects drain.

Without the fixt the test times out w/ stuck objects.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v26.1.x
  • v25.3.x
  • v25.2.x

Release Notes

Bug Fixes

  • Fix an issue with Clout Topics garbage collection where deleting all cloud topics could leave stale objects in the bucket, forever.

@oleiman oleiman self-assigned this Apr 16, 2026
Copilot AI review requested due to automatic review settings April 16, 2026 20:45
@oleiman oleiman force-pushed the ct/noticket/l0-gc-no-cloud-topics branch from 030cb4e to ed04e68 Compare April 16, 2026 20:46
@oleiman oleiman added the claude-review Adding this label to a PR will trigger a workflow to review the code using claude. label Apr 16, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an L0 GC edge case where deleting all cloud topics could leave previously-created L0 objects stranded in object storage by ensuring the GC watermark still advances when the partition snapshot is empty.

Changes:

  • Update max_gc_eligible_epoch to fall through to snap_revision when there are no cloud topic partitions.
  • Adjust unit tests to validate the new empty-snapshot watermark behavior.
  • Add a ducktape regression test that deletes the last cloud topic and verifies L0 objects drain after GC resumes.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
tests/rptest/tests/cloud_topics/l0_gc_test.py Adds an integration regression test for GC progress after all cloud topics are deleted.
src/v/cloud_topics/level_zero/gc/tests/level_zero_gc_tests.cc Updates unit tests to expect snap_revision as the watermark for empty snapshots.
src/v/cloud_topics/level_zero/gc/level_zero_gc.cc Changes empty-snapshot behavior to return snap_revision instead of nullopt, enabling stranded-object collection.

Comment thread tests/rptest/tests/cloud_topics/l0_gc_test.py Outdated
Comment thread tests/rptest/tests/cloud_topics/l0_gc_test.py Outdated
Comment thread src/v/cloud_topics/level_zero/gc/level_zero_gc.cc
Comment thread tests/rptest/tests/cloud_topics/l0_gc_test.py
Comment thread tests/rptest/tests/cloud_topics/l0_gc_test.py
@claude
Copy link
Copy Markdown

claude Bot commented Apr 16, 2026

Code Review

Overview

Fixes a garbage-collection hole in cloud-topics L0 GC: when every cloud topic is deleted, epoch_source_impl::max_gc_eligible_epoch returned std::nullopt, causing try_to_collect to fall into the no_collectible_epoch branch and never drain the leftover L0 objects. The fix falls through to snap_revision as the watermark — the natural zero-iteration case of the existing min-join — so stranded objects age past the grace period and are collected.

Correctness

  • Fix is logically sound. The non-empty path already initializes result = snap_revision and then min-reduces per partition; the zero-partition case trivially returns snap_revision. Using it as the watermark is consistent with the existing semantics.
  • Safety argument holds. topic_table consistency with controller_stm is the same invariant the min-join already relies on. The residual edge case — a topic created after snap_revision whose L0 object carries a stale checkpoint epoch — is covered by the 12h grace-period age check (cloud_topics_short_term_gc_minimum_object_age, default 12h) applied at level_zero_gc.cc:1202.
  • Probe updated (set_min_partition_gc_epoch) to keep the metric fresh on the empty path, matching the non-empty path.

Tests

  • Unit tests (EmptySnapshot, EmptySnapshotNonEmptyReport) are updated to assert the new behavior and explicitly set snap_revision. Good.
  • Integration test (CloudTopicsL0GCAllTopicsDeletedTest) reproduces the scenario end-to-end. The assertion (count L0 objects → 0) is sound. See inline comments for two tightening suggestions: (1) explicitly wait for topic deletion to propagate before gc_start() so the empty-snapshot path is definitely exercised on the first post-resume round, and (2) consider asserting a probe metric to make the regression guard more specific.

Code quality / style

  • Matches coding conventions (snake_case, vlog, co_return, modern Python typing).
  • The added comment (level_zero_gc.cc:359-363) explains the non-obvious why — exactly what the codebase guidelines ask for.
  • Minor: no debug vlog on the empty-snapshot branch (see inline comment); worth considering for post-deploy debuggability.

Performance / security

  • No perf impact — empty-snapshot case skips the health-report query entirely.
  • No security concerns; safety relies only on the pre-existing grace-period invariant.

Risk assessment

Low. Change is localized to one branch of one function; the watermark value used (snap_revision) is already the initial value in the corresponding non-empty branch. Backport to v26.1.x is appropriately flagged.

Overall: LGTM modulo the test-tightening suggestions.

@oleiman oleiman force-pushed the ct/noticket/l0-gc-no-cloud-topics branch from ed04e68 to d133c87 Compare April 16, 2026 20:59
@oleiman oleiman requested a review from Copilot April 16, 2026 21:00
@oleiman oleiman added claude-review Adding this label to a PR will trigger a workflow to review the code using claude. and removed claude-review Adding this label to a PR will trigger a workflow to review the code using claude. labels Apr 16, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a Cloud Topics L0 GC edge case where deleting all cloud topics could leave previously-created L0 objects stranded in object storage indefinitely by ensuring an empty partition snapshot still yields a GC watermark.

Changes:

  • Update epoch_source::max_gc_eligible_epoch to fall through to snap_revision when the partition snapshot is empty (instead of returning std::nullopt).
  • Adjust Level Zero GC unit tests to reflect the new empty-snapshot behavior.
  • Add a ducktape regression test that pauses GC, accumulates L0 objects, deletes the only cloud topic, resumes GC, and asserts the bucket drains.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/v/cloud_topics/level_zero/gc/level_zero_gc.cc Uses snap_revision as the watermark when there are no cloud topic partitions, enabling collection of stranded L0 objects.
src/v/cloud_topics/level_zero/gc/tests/level_zero_gc_tests.cc Updates unit test expectations for empty-snapshot cases to assert the watermark equals snap_revision.
tests/rptest/tests/cloud_topics/l0_gc_test.py Adds an integration regression test covering the “all topics deleted but L0 objects remain” scenario.

"epoch {}",
result);
if (probe_) {
probe_->set_min_partition_gc_epoch(result);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is inherently contradictory, but it's an interesting point.

Comment thread src/v/cloud_topics/level_zero/gc/level_zero_gc.cc
Comment thread tests/rptest/tests/cloud_topics/l0_gc_test.py
Comment thread src/v/cloud_topics/level_zero/gc/level_zero_gc.cc
Comment thread tests/rptest/tests/cloud_topics/l0_gc_test.py
@claude
Copy link
Copy Markdown

claude Bot commented Apr 16, 2026

Review

Nice, targeted fix. The bug analysis is clear and the fix is minimal: when get_partitions returns an empty set, fall through to snap_revision as the watermark instead of std::nullopt. This is a natural extension of the existing min-join semantics (which already starts at snap_revision and walks down) — the empty-partition case is just the zero-iteration form.

Correctness

The safety argument holds:

  1. snap_revision is the controller-stm last-applied offset, wrapped as cluster_epoch. Since topic deletion itself advances the controller offset, every L0 object stamped before the deletion must have epoch ≤ snap_revision. Using it as the watermark cannot incorrectly spare stranded objects nor over-delete in-use objects under normal operation.
  2. Downstream gating is preserved: do_try_to_collect still filters by object.last_modified > max_gc_birthday (grace period), so a topic created between the health-report snapshot and the partition snapshot with an L0 object stamped against a stale checkpoint epoch is protected by the 12h default grace period.
  3. The topic_table/controller_stm consistency invariant (same node, same applied prefix) that the min-join already relies on is sufficient here too.

Tests

  • Unit-test changes correctly capture the new semantics: EmptySnapshot now asserts snap_revision passthrough, and EmptySnapshotNonEmptyReport confirms stale report entries don't bring the watermark down when no partition is in the snapshot.
  • The ducktape regression test (test_gc_completes_after_all_topics_deleted) is well-structured — pause → produce → delete → resume → assert drain — and will reliably fail without the fix.

Minor suggestions (non-blocking)

  1. Code comment: The comment at the empty-partition branch explains what; consider also noting why it's safe (controller offset advances on delete, so all stranded objects have epoch ≤ snap_revision). Right now that reasoning only lives in the PR description.
  2. Probe DRY: probe_->set_min_partition_gc_epoch(result) is called in both branches. Could be lifted out below the if-else.
  3. Test assertion strength: Adding an assertion that vectorized_cloud_topics_l0_gc_objects_deleted_total actually increased would catch a regression where drain happens via an unrelated path (e.g. a delete-triggered purge) rather than the GC collector, which is the specific code path this PR fixes.
  4. Test timeout: The 120s drain budget is tight relative to grace_period(10s) + gc_backoff(10s) + reconciliation + list/delete RTTs. 180s might be safer on CI.

Backport

Marked for v26.1.x — agreed, this is a user-visible correctness bug where deleted topics leave bytes stranded in cloud storage indefinitely. Worth backporting.

Overall: LGTM.

@oleiman oleiman requested review from dotnwat and nvartolomei April 16, 2026 21:50
@vbotbuildovich
Copy link
Copy Markdown
Collaborator

vbotbuildovich commented Apr 16, 2026

CI test results

test results on build#83276
test_status test_class test_method test_arguments test_kind job_url passed reason test_history
FLAKY(PASS) DataMigrationsApiTest test_migrated_topic_data_integrity {"params": {"cancellation": {"dir": "out", "stage": "executed"}, "include_groups": true, "transfer_leadership": true, "use_alias": true}} integration https://buildkite.com/redpanda/redpanda/builds/83276#019d9830-9ab1-4680-b52e-798a47e6feb8 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0008, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=DataMigrationsApiTest&test_method=test_migrated_topic_data_integrity
FLAKY(PASS) PartitionMovementUpgradeTest test_basic_upgrade null integration https://buildkite.com/redpanda/redpanda/builds/83276#019d9830-9ab3-4441-b7c7-8a021d19484f 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0000, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=PartitionMovementUpgradeTest&test_method=test_basic_upgrade
FLAKY(PASS) WriteCachingFailureInjectionE2ETest test_crash_all {"use_transactions": false} integration https://buildkite.com/redpanda/redpanda/builds/83276#019d982e-28aa-4d56-bd96-adff57b2886d 9/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0689, p0=0.5104, reject_threshold=0.0100. adj_baseline=0.1928, p1=0.3978, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=WriteCachingFailureInjectionE2ETest&test_method=test_crash_all
test results on build#83599
test_status test_class test_method test_arguments test_kind job_url passed reason test_history
FLAKY(PASS) ShadowLinkingReplicationTests test_with_restart {"storage_mode": "cloud"} integration https://buildkite.com/redpanda/redpanda/builds/83599#019dbbcf-c93e-421a-ae34-be8b9160be2f 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0151, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_with_restart
FLAKY(PASS) NodePostRestartProbeTest post_restart_probe_test null integration https://buildkite.com/redpanda/redpanda/builds/83599#019dbbd1-a587-468b-9cef-5bd726c50621 19/21 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0060, p0=0.1138, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3917, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=NodePostRestartProbeTest&test_method=post_restart_probe_test

max_gc_eligible_epoch returned std::nullopt when the partition
snapshot was empty, causing try_to_collect to report
no_collectible_epoch after listing. L0 objects left over from
previously deleted topics stayed in the bucket forever.

Fall through with snap_revision (controller last applied offset)
as the watermark: this is the zero-iteration case of the existing
join, which already starts at snap_revision and walks it down per
partition.

Safety rests on invariants from the existing algorithm:

- get_partitions ensures that the topic table snapshot is in sync
  with the controller stm (i.e. snap revision)
- L0 objects never have an epoch less than any constinuent topic's
  initial revision ID.

So with an empty topic table at snapshot revision N, any _new_ data
must have an epoch > N. Therefore any L0 objects w/ epoch <= N is
safe to GC.

Includes a ducktape regression test:

- pause GC
- accumulate some L0 objects
- delete the only cloud topic
- resume GC
- assert that orphaned objects drain.

Without the fixt the test times out w/ stuck objects.

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
@oleiman oleiman force-pushed the ct/noticket/l0-gc-no-cloud-topics branch from d133c87 to 0eee6a8 Compare April 23, 2026 19:15
Copy link
Copy Markdown
Contributor

@nvartolomei nvartolomei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a nit but lgtm

# L0 objects exist to be collected.
self.produce_some(topics=[topic.name])

wait_until(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be direct assert, right? If produce succeeded then the objects must be already there.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah fair. get_objects_from_si should make a direct call to the object store. test isn't trying to make any claims about the produce path though, so I'm inclined to use wait_until rather than assume this synchronous chain between produce -> upload -> s3 <- LIST, even if it's technically expected.

@oleiman oleiman merged commit 54aeb5e into dev Apr 24, 2026
22 checks passed
@oleiman oleiman deleted the ct/noticket/l0-gc-no-cloud-topics branch April 24, 2026 00:06
@vbotbuildovich
Copy link
Copy Markdown
Collaborator

/backport v26.1.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/redpanda claude-review Adding this label to a PR will trigger a workflow to review the code using claude.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants