Skip to content

datalake: compile-time schema descriptors for table_definition#30170

Open
andrwng wants to merge 1 commit intoredpanda-data:devfrom
andrwng:compile-time-iceberg-structs
Open

datalake: compile-time schema descriptors for table_definition#30170
andrwng wants to merge 1 commit intoredpanda-data:devfrom
andrwng:compile-time-iceberg-structs

Conversation

@andrwng
Copy link
Copy Markdown
Contributor

@andrwng andrwng commented Apr 15, 2026

Replace the hand-rolled schemaless_struct_type() with a compile-time schema descriptor system. The schema is defined as a type:

  using schemaless_desc = struct_desc<
    field_desc<"redpanda", rp_desc>>;

From this single definition:

  • schemaless_struct_type() generates the runtime iceberg::struct_type
  • index_of<"field_name"> provides compile-time field indices
  • type_field/value_field accessors replace brittle fields[0]
  • rp_struct_type()/rp_struct_value() provide named access

So code like this:

std::unique_ptr<iceberg::struct_value> build_rp_struct(
  model::partition_id pid,
  kafka::offset o,
  std::optional<iobuf> key,
  model::timestamp ts,
  model::timestamp_type ts_t,
  const chunked_vector<model::record_header>& headers) {
    auto system_data = std::make_unique<iceberg::struct_value>();
    system_data->fields.reserve(6);

    system_data->fields.emplace_back(iceberg::int_value(pid));
    system_data->fields.emplace_back(iceberg::long_value(o));
    // NOTE: Kafka uses milliseconds, Iceberg uses microseconds.
    system_data->fields.emplace_back(
      iceberg::timestamptz_value(ts.value() * 1000));
    ... brittle, carefully ordered code ...

...becomes:

std::unique_ptr<iceberg::struct_value> build_rp_struct(
  model::partition_id pid,
  kafka::offset o,
  std::optional<iobuf> key,
  model::timestamp ts,
  model::timestamp_type ts_t,
  const chunked_vector<model::record_header>& headers) {
    return rp_desc::build_value(
      val<"partition">(iceberg::int_value(pid)),
      val<"offset">(iceberg::long_value(o)),
      val<"timestamp">(iceberg::timestamptz_value(ts.value() * 1000)),

and the definitions of compile-time schemas becomes more crisply defined
as well.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v26.1.x
  • v25.3.x
  • v25.2.x

Release Notes

  • None

Copilot AI review requested due to automatic review settings April 15, 2026 07:01
@andrwng andrwng force-pushed the compile-time-iceberg-structs branch from 4139da5 to 74d8b19 Compare April 15, 2026 07:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a compile-time schema descriptor system for datalake table definitions, replacing hand-built runtime schema construction and brittle positional access for the embedded redpanda system struct.

Changes:

  • Added schema_descriptor.h providing struct_desc/field_desc descriptors, compile-time index_of, and typed type_field/value_field accessors.
  • Updated table_definition.* to define the schemaless schema via descriptors and to centralize construction of the redpanda struct_value.
  • Updated record_translator.cc to use typed accessors (rp_struct_type / rp_struct_value) instead of fields[0] indexing.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/v/datalake/table_definition.h Defines compile-time schema descriptors and adds typed accessors + build_rp_struct declaration.
src/v/datalake/table_definition.cc Builds runtime schema via descriptors and implements centralized build_rp_struct.
src/v/datalake/schema_descriptor.h New compile-time descriptor framework and typed field access helpers.
src/v/datalake/record_translator.cc Replaces brittle positional access with typed accessors for the redpanda nested struct.
src/v/datalake/BUILD Wires the new header and its dependencies into the build.

Comment thread src/v/datalake/schema_descriptor.h Outdated
Comment thread src/v/datalake/schema_descriptor.h
@andrwng andrwng force-pushed the compile-time-iceberg-structs branch 4 times, most recently from fa04ce9 to 06180af Compare April 15, 2026 07:53
Replace the hand-rolled schemaless_struct_type() with a compile-time
schema descriptor system. The schema is defined as a type:

```
  using schemaless_desc = struct_desc<
    field_desc<"redpanda", rp_desc>>;
```

From this single definition:
- schemaless_struct_type() generates the runtime iceberg::struct_type
- index_of<"field_name"> provides compile-time field indices
- type_field/value_field accessors replace brittle fields[0]
- rp_struct_type()/rp_struct_value() provide named access

So code like this:

```
std::unique_ptr<iceberg::struct_value> build_rp_struct(
  model::partition_id pid,
  kafka::offset o,
  std::optional<iobuf> key,
  model::timestamp ts,
  model::timestamp_type ts_t,
  const chunked_vector<model::record_header>& headers) {
    auto system_data = std::make_unique<iceberg::struct_value>();
    system_data->fields.reserve(6);

    system_data->fields.emplace_back(iceberg::int_value(pid));
    system_data->fields.emplace_back(iceberg::long_value(o));
    // NOTE: Kafka uses milliseconds, Iceberg uses microseconds.
    system_data->fields.emplace_back(
      iceberg::timestamptz_value(ts.value() * 1000));
    ... brittle, carefully ordered code ...
```

...becomes:

```
std::unique_ptr<iceberg::struct_value> build_rp_struct(
  model::partition_id pid,
  kafka::offset o,
  std::optional<iobuf> key,
  model::timestamp ts,
  model::timestamp_type ts_t,
  const chunked_vector<model::record_header>& headers) {
    return rp_desc::build_value(
      val<"partition">(iceberg::int_value(pid)),
      val<"offset">(iceberg::long_value(o)),
      val<"timestamp">(iceberg::timestamptz_value(ts.value() * 1000)),
```

and the definitions of compile-time schemas becomes more crisply defined
as well.
@andrwng andrwng force-pushed the compile-time-iceberg-structs branch from 06180af to e4024cb Compare April 15, 2026 07:55
Copy link
Copy Markdown
Member

@oleiman oleiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool. The structural ct_string is a nice touch. I need to review with fresh eyes, but my intuition is that if it builds and tests pass it's almost certainly correct?

@andrwng
Copy link
Copy Markdown
Contributor Author

andrwng commented Apr 16, 2026

Really cool. The structural ct_string is a nice touch. I need to review with fresh eyes, but my intuition is that if it builds and tests pass it's almost certainly correct?

Yeah that's my intuition too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants