Skip to content

AI JAM: SR-native iceberg checks#30184

Draft
oleiman wants to merge 8 commits into
devfrom
ai-jam/sr-inline-iceberg-checks
Draft

AI JAM: SR-native iceberg checks#30184
oleiman wants to merge 8 commits into
devfrom
ai-jam/sr-inline-iceberg-checks

Conversation

@oleiman
Copy link
Copy Markdown
Member

@oleiman oleiman commented Apr 15, 2026

This PR introduces an opt-in Schema Registry validation path that enforces Iceberg schema-evolution rules when a subject schema is registered with the redpanda.iceberg.compatible=true metadata property.

Changes:

  • Add iceberg::simulate_evolution to validate an Iceberg schema lineage across multiple versions.
  • Add Schema Registry-side check_iceberg_compatibility(...) and invoke it during schema registration when the metadata flag is enabled.
  • Add unit/integration tests and Bazel targets covering both the Iceberg evolution simulator and SR integration behavior.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v26.1.x
  • v25.3.x
  • v25.2.x

Release Notes

  • none

@oleiman
Copy link
Copy Markdown
Member Author

oleiman commented Apr 16, 2026

/ci-repeat 1

@oleiman oleiman force-pushed the ai-jam/sr-inline-iceberg-checks branch from 60d5b83 to d886789 Compare April 16, 2026 03:09
@oleiman
Copy link
Copy Markdown
Member Author

oleiman commented Apr 16, 2026

/ci-repeat 1

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an opt-in Schema Registry validation path that enforces Iceberg schema-evolution rules when a subject schema is registered with the redpanda.iceberg.compatible=true metadata property.

Changes:

  • Add iceberg::simulate_evolution to validate an Iceberg schema lineage across multiple versions.
  • Add Schema Registry-side check_iceberg_compatibility(...) and invoke it during schema registration when the metadata flag is enabled.
  • Add unit/integration tests and Bazel targets covering both the Iceberg evolution simulator and SR integration behavior.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/rptest/tests/schema_registry_test.py Adds an end-to-end rptest verifying flag-controlled Iceberg evolution enforcement and metadata override behavior.
src/v/pandaproxy/schema_registry/test/iceberg_compat_test.cc Adds SR-level unit tests for the Iceberg compatibility check function (skip/pass/fail scenarios).
src/v/pandaproxy/schema_registry/test/BUILD Registers the new iceberg_compat btest target.
src/v/pandaproxy/schema_registry/iceberg_compat.h Declares the metadata key constant and the SR compatibility-check API.
src/v/pandaproxy/schema_registry/iceberg_compat.cc Implements SR-side lineage validation by converting Avro → Iceberg types and running simulate_evolution.
src/v/pandaproxy/schema_registry/handlers.cc Wires the Iceberg evolution check into POST /subjects/{subject}/versions (non-import mode).
src/v/pandaproxy/schema_registry/BUILD Adds the iceberg_compat library target and links it into the SR server.
src/v/iceberg/tests/simulate_evolution_test.cc Adds focused gtests for simulate_evolution behavior across common evolution scenarios.
src/v/iceberg/tests/BUILD Registers the new simulate_evolution_test gtest target.
src/v/iceberg/simulate_evolution.h Declares the evolution simulator API and result type.
src/v/iceberg/simulate_evolution.cc Implements schema evolution replay (subset-write fast path + merge/evolve + fresh-ID assignment).
src/v/iceberg/compatibility.cc Makes promotion-policy visitor overloads const to satisfy visitor usage requirements.
src/v/iceberg/BUILD Adds the new simulate_evolution library target.

Comment thread src/v/pandaproxy/schema_registry/handlers.cc
@oleiman oleiman added the claude-review Adding this label to a PR will trigger a workflow to review the code using claude. label Apr 16, 2026
Comment thread src/v/pandaproxy/schema_registry/iceberg_compat.cc
Comment thread src/v/pandaproxy/schema_registry/iceberg_compat.cc
Comment thread src/v/pandaproxy/schema_registry/iceberg_compat.cc Outdated
Comment thread src/v/pandaproxy/schema_registry/iceberg_compat.cc Outdated
Comment thread src/v/iceberg/simulate_evolution.cc
Comment thread src/v/iceberg/simulate_evolution.cc
Comment thread src/v/pandaproxy/schema_registry/handlers.cc
Comment thread tests/rptest/tests/schema_registry_test.py
@claude
Copy link
Copy Markdown

claude Bot commented Apr 16, 2026

PR Review: SR-native iceberg checks

Summary

This PR adds inline iceberg schema evolution validation to the Schema Registry's post_subject_versions handler. When a candidate schema carries the redpanda.iceberg.compatible metadata flag, the system fetches all existing versions (including soft-deleted), converts each Avro schema to an iceberg struct_type, and replays the full evolution simulation to validate compatibility.

New components:

  • iceberg::simulate_evolution — replays a sequence of struct_types through the existing iceberg compatibility/evolution machinery, assigning field IDs and validating each step
  • pandaproxy::schema_registry::check_iceberg_compatibility — orchestrates the store lookups, Avro-to-iceberg conversion, and simulation call
  • Good test coverage at all three levels: unit tests for simulate_evolution, integration tests for check_iceberg_compatibility, and an rptest for the full HTTP path

What looks good

  • Clean layering: The iceberg simulation logic is properly separated from the SR integration layer, with clear interfaces between them
  • Correct use of include_deleted::yes: Iceberg evolution must consider the full lineage since field IDs assigned to deleted columns can't be reused with different types
  • Metadata inheritance: Correctly relies on the existing make_canonical_schema_with_metadata to handle flag inheritance from previous versions — the compat check just reads the resolved metadata
  • Thorough unit tests: The simulate_evolution_test.cc covers single version, additions, promotions, drops, reintroductions (same and different type), nested structs, lists, maps, multi-step evolution, and required field rejection
  • The const additions to compatibility.cc visitor operators are a nice cleanup

Issues to address

Performance (medium priority):
The current implementation replays the entire version history on every schema registration. For subjects with many versions, this means O(N) store lookups, O(N) Avro→iceberg conversions, and the simulation cost itself. This should at minimum be acknowledged with a TODO/tracking issue. Ideally, the accumulated iceberg struct_type would be cached so new registrations only need to evolve one step forward.

Redundant work in handler (low priority):
The iceberg check runs even when matched == true (the schema already exists as a version of this subject). In that case, nothing new is being registered and the check is wasted work. Consider gating with if (!matched && mode != mode::import).

Handler coverage:
Should this check also apply to PUT endpoints for registering at specific versions? If users can bypass this path, the iceberg invariant could be silently violated.

Error message usability (low priority):
The error message exposes internal 0-indexed step numbers ("failed at step 3") which are opaque to users. Translating to actual schema version numbers would improve debuggability.

Minor/style nits

  • Copyright year 2024 on new simulate_evolution files (should be 2025 to match the BSL files in this PR)
  • schema_to_iceberg uses an out-parameter + optional-error pattern rather than the checked<T, E> pattern used elsewhere in the iceberg module
  • Double string copy in schema_to_iceberg (linearize_to_string()std::string{...})
  • The subtle try_fill_field_ids == no + schema_changed == no → incompatible case in simulate_evolution.cc:119-125 deserves a comment explaining when this happens in practice

PR description

The PR body is just "." — this is a significant feature addition and would benefit from a proper description covering the motivation, design decisions (e.g., why replay full history, why opt-in via metadata flag), and any known limitations.

Overall this is a well-structured change with good test coverage. The main concern is the O(N) replay cost on every registration, which should at least be tracked for future optimization.

@vbotbuildovich
Copy link
Copy Markdown
Collaborator

CI test results

test results on build#83214
test_status test_class test_method test_arguments test_kind job_url passed reason test_history
FLAKY(PASS) EndToEndHydrationTimeoutTest test_hydration_completes_when_consumer_killed null integration https://buildkite.com/redpanda/redpanda/builds/83214#019d9453-2ca2-422b-8280-88150070cff1 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0040, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=EndToEndHydrationTimeoutTest&test_method=test_hydration_completes_when_consumer_killed

oleiman added 8 commits April 16, 2026 15:21
Add simulate_evolution(), which will replay SR schema version history
through iceberg evolution primitives to validate compatibility. For now
the implementation is a stub that returns an error; the real logic
comes in a subsequent commit.
Comprehensive GTest suite for simulate_evolution() covering 19 test
cases: single version passthrough, additive field evolution, type
promotions (int->long, float->double), field drops and reintroduction,
nested struct/list/map evolution, field ID assignment, multi-step
evolution, error conditions (incompatible types, type narrowing, new
required fields, empty sequence), and middle-step failure reporting.

Tests compile and run against the current stub implementation; most will
fail until simulate_evolution is implemented.
Replays iceberg schema evolution across a sequence of struct_types,
starting from the first as the initial table schema. Uses evolve_schema
for compatibility checking at each step, building a merged schema that
preserves removed fields and assigns monotonically increasing field IDs.
Replace the evolve_schema-only approach with the three-step pattern
used in the datalake layer: (1) try_fill_field_ids to check if the
writer is a compatible subset, (2) merge_struct_types to incorporate
new writer fields, (3) evolve_schema to validate the merge.

This fixes the false rejection of writers that use a narrower type
than the accumulated table schema (e.g. int writing to a long
column). Also detect no-op merges that indicate an incompatible
writer, and fix missing const on three primitive_type_promotion_-
policy_visitor overloads that made float->double and decimal/fixed
promotions silently unreachable via the constexpr visitor instance.
Add the iceberg_compat module skeleton: a header with the
check_iceberg_compatibility() function signature and a stub .cc that
unconditionally returns nullopt (check skipped). The BUILD target
declares implementation_deps on iceberg libraries that will be needed
once the stub is filled in.
Implement check_iceberg_compatibility which validates that registering a
new schema version for a subject would produce a valid iceberg schema
evolution history. The function checks the "redpanda.iceberg.compatible"
metadata flag on the candidate schema; if set, it fetches all existing
versions, converts each to iceberg struct_type via type_to_iceberg, and
runs simulate_evolution on the full sequence including the candidate.

Tests cover: flag not set (skip), first version (pass), compatible field
addition, incompatible type change, drop-and-reintroduce same type vs
different type, and error message content.
After the existing SR compatibility check, call
check_iceberg_compatibility to validate that the new schema is
compatible with iceberg schema evolution rules. The check is a no-op
when the subject does not have the iceberg metadata flag set.
Exercises the redpanda.iceberg.compatible metadata flag end-to-end
through the SR HTTP API: compatible addition passes, incompatible type
change is rejected with 409, and explicit empty metadata overrides
inheritance to skip the check.
@oleiman oleiman force-pushed the ai-jam/sr-inline-iceberg-checks branch from d886789 to d1d19af Compare April 16, 2026 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/build area/redpanda claude-review Adding this label to a PR will trigger a workflow to review the code using claude.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants