datalake: compile-time schema descriptors for table_definition#30170
datalake: compile-time schema descriptors for table_definition#30170andrwng wants to merge 1 commit intoredpanda-data:devfrom
Conversation
4139da5 to
74d8b19
Compare
There was a problem hiding this comment.
Pull request overview
Introduces a compile-time schema descriptor system for datalake table definitions, replacing hand-built runtime schema construction and brittle positional access for the embedded redpanda system struct.
Changes:
- Added
schema_descriptor.hprovidingstruct_desc/field_descdescriptors, compile-timeindex_of, and typedtype_field/value_fieldaccessors. - Updated
table_definition.*to define the schemaless schema via descriptors and to centralize construction of theredpandastruct_value. - Updated
record_translator.ccto use typed accessors (rp_struct_type/rp_struct_value) instead offields[0]indexing.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/v/datalake/table_definition.h | Defines compile-time schema descriptors and adds typed accessors + build_rp_struct declaration. |
| src/v/datalake/table_definition.cc | Builds runtime schema via descriptors and implements centralized build_rp_struct. |
| src/v/datalake/schema_descriptor.h | New compile-time descriptor framework and typed field access helpers. |
| src/v/datalake/record_translator.cc | Replaces brittle positional access with typed accessors for the redpanda nested struct. |
| src/v/datalake/BUILD | Wires the new header and its dependencies into the build. |
fa04ce9 to
06180af
Compare
Replace the hand-rolled schemaless_struct_type() with a compile-time
schema descriptor system. The schema is defined as a type:
```
using schemaless_desc = struct_desc<
field_desc<"redpanda", rp_desc>>;
```
From this single definition:
- schemaless_struct_type() generates the runtime iceberg::struct_type
- index_of<"field_name"> provides compile-time field indices
- type_field/value_field accessors replace brittle fields[0]
- rp_struct_type()/rp_struct_value() provide named access
So code like this:
```
std::unique_ptr<iceberg::struct_value> build_rp_struct(
model::partition_id pid,
kafka::offset o,
std::optional<iobuf> key,
model::timestamp ts,
model::timestamp_type ts_t,
const chunked_vector<model::record_header>& headers) {
auto system_data = std::make_unique<iceberg::struct_value>();
system_data->fields.reserve(6);
system_data->fields.emplace_back(iceberg::int_value(pid));
system_data->fields.emplace_back(iceberg::long_value(o));
// NOTE: Kafka uses milliseconds, Iceberg uses microseconds.
system_data->fields.emplace_back(
iceberg::timestamptz_value(ts.value() * 1000));
... brittle, carefully ordered code ...
```
...becomes:
```
std::unique_ptr<iceberg::struct_value> build_rp_struct(
model::partition_id pid,
kafka::offset o,
std::optional<iobuf> key,
model::timestamp ts,
model::timestamp_type ts_t,
const chunked_vector<model::record_header>& headers) {
return rp_desc::build_value(
val<"partition">(iceberg::int_value(pid)),
val<"offset">(iceberg::long_value(o)),
val<"timestamp">(iceberg::timestamptz_value(ts.value() * 1000)),
```
and the definitions of compile-time schemas becomes more crisply defined
as well.
06180af to
e4024cb
Compare
oleiman
left a comment
There was a problem hiding this comment.
Really cool. The structural ct_string is a nice touch. I need to review with fresh eyes, but my intuition is that if it builds and tests pass it's almost certainly correct?
Yeah that's my intuition too. |
Replace the hand-rolled schemaless_struct_type() with a compile-time schema descriptor system. The schema is defined as a type:
From this single definition:
So code like this:
...becomes:
and the definitions of compile-time schemas becomes more crisply defined
as well.
Backports Required
Release Notes