AI JAM: SR-native iceberg checks#30184
Conversation
|
/ci-repeat 1 |
60d5b83 to
d886789
Compare
|
/ci-repeat 1 |
There was a problem hiding this comment.
Pull request overview
This PR introduces an opt-in Schema Registry validation path that enforces Iceberg schema-evolution rules when a subject schema is registered with the redpanda.iceberg.compatible=true metadata property.
Changes:
- Add
iceberg::simulate_evolutionto validate an Iceberg schema lineage across multiple versions. - Add Schema Registry-side
check_iceberg_compatibility(...)and invoke it during schema registration when the metadata flag is enabled. - Add unit/integration tests and Bazel targets covering both the Iceberg evolution simulator and SR integration behavior.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/rptest/tests/schema_registry_test.py | Adds an end-to-end rptest verifying flag-controlled Iceberg evolution enforcement and metadata override behavior. |
| src/v/pandaproxy/schema_registry/test/iceberg_compat_test.cc | Adds SR-level unit tests for the Iceberg compatibility check function (skip/pass/fail scenarios). |
| src/v/pandaproxy/schema_registry/test/BUILD | Registers the new iceberg_compat btest target. |
| src/v/pandaproxy/schema_registry/iceberg_compat.h | Declares the metadata key constant and the SR compatibility-check API. |
| src/v/pandaproxy/schema_registry/iceberg_compat.cc | Implements SR-side lineage validation by converting Avro → Iceberg types and running simulate_evolution. |
| src/v/pandaproxy/schema_registry/handlers.cc | Wires the Iceberg evolution check into POST /subjects/{subject}/versions (non-import mode). |
| src/v/pandaproxy/schema_registry/BUILD | Adds the iceberg_compat library target and links it into the SR server. |
| src/v/iceberg/tests/simulate_evolution_test.cc | Adds focused gtests for simulate_evolution behavior across common evolution scenarios. |
| src/v/iceberg/tests/BUILD | Registers the new simulate_evolution_test gtest target. |
| src/v/iceberg/simulate_evolution.h | Declares the evolution simulator API and result type. |
| src/v/iceberg/simulate_evolution.cc | Implements schema evolution replay (subset-write fast path + merge/evolve + fresh-ID assignment). |
| src/v/iceberg/compatibility.cc | Makes promotion-policy visitor overloads const to satisfy visitor usage requirements. |
| src/v/iceberg/BUILD | Adds the new simulate_evolution library target. |
PR Review: SR-native iceberg checksSummaryThis PR adds inline iceberg schema evolution validation to the Schema Registry's New components:
What looks good
Issues to addressPerformance (medium priority): Redundant work in handler (low priority): Handler coverage: Error message usability (low priority): Minor/style nits
PR descriptionThe PR body is just "." — this is a significant feature addition and would benefit from a proper description covering the motivation, design decisions (e.g., why replay full history, why opt-in via metadata flag), and any known limitations. Overall this is a well-structured change with good test coverage. The main concern is the O(N) replay cost on every registration, which should at least be tracked for future optimization. |
CI test resultstest results on build#83214
|
Add simulate_evolution(), which will replay SR schema version history through iceberg evolution primitives to validate compatibility. For now the implementation is a stub that returns an error; the real logic comes in a subsequent commit.
Comprehensive GTest suite for simulate_evolution() covering 19 test cases: single version passthrough, additive field evolution, type promotions (int->long, float->double), field drops and reintroduction, nested struct/list/map evolution, field ID assignment, multi-step evolution, error conditions (incompatible types, type narrowing, new required fields, empty sequence), and middle-step failure reporting. Tests compile and run against the current stub implementation; most will fail until simulate_evolution is implemented.
Replays iceberg schema evolution across a sequence of struct_types, starting from the first as the initial table schema. Uses evolve_schema for compatibility checking at each step, building a merged schema that preserves removed fields and assigns monotonically increasing field IDs.
Replace the evolve_schema-only approach with the three-step pattern used in the datalake layer: (1) try_fill_field_ids to check if the writer is a compatible subset, (2) merge_struct_types to incorporate new writer fields, (3) evolve_schema to validate the merge. This fixes the false rejection of writers that use a narrower type than the accumulated table schema (e.g. int writing to a long column). Also detect no-op merges that indicate an incompatible writer, and fix missing const on three primitive_type_promotion_- policy_visitor overloads that made float->double and decimal/fixed promotions silently unreachable via the constexpr visitor instance.
Add the iceberg_compat module skeleton: a header with the check_iceberg_compatibility() function signature and a stub .cc that unconditionally returns nullopt (check skipped). The BUILD target declares implementation_deps on iceberg libraries that will be needed once the stub is filled in.
Implement check_iceberg_compatibility which validates that registering a new schema version for a subject would produce a valid iceberg schema evolution history. The function checks the "redpanda.iceberg.compatible" metadata flag on the candidate schema; if set, it fetches all existing versions, converts each to iceberg struct_type via type_to_iceberg, and runs simulate_evolution on the full sequence including the candidate. Tests cover: flag not set (skip), first version (pass), compatible field addition, incompatible type change, drop-and-reintroduce same type vs different type, and error message content.
After the existing SR compatibility check, call check_iceberg_compatibility to validate that the new schema is compatible with iceberg schema evolution rules. The check is a no-op when the subject does not have the iceberg metadata flag set.
Exercises the redpanda.iceberg.compatible metadata flag end-to-end through the SR HTTP API: compatible addition passes, incompatible type change is rejected with 409, and explicit empty metadata overrides inheritance to skip the check.
d886789 to
d1d19af
Compare
This PR introduces an opt-in Schema Registry validation path that enforces Iceberg schema-evolution rules when a subject schema is registered with the redpanda.iceberg.compatible=true metadata property.
Changes:
Backports Required
Release Notes