Document the udt (UserDefinedType) data type disposition by sanujbasu · Pull Request #7093 · delta-io/delta

sanujbasu · 2026-06-25T02:58:14Z

Description

Spark writes UserDefinedType (udt) columns into metaData.schemaString, but the protocol's schema serialization type system does not define udt (only primitive / struct / array / map / variant). Such columns are therefore non-conformant today even though they already exist in tables in the wild, and a reader that rejects the unknown type fails to read the entire table.

This adds a note under Schema Serialization Format > Primitive Types, parallel to the existing void note, documenting the disposition rather than gating it behind a table feature:

Existing tables may contain columns of Spark's udt (UserDefinedType) complex type... A reader that does not implement that engine code MUST interpret the column as its physical sqlType; the sqlType is the on-disk Parquet representation.

Why no table feature: udt introduces no new physical representation. It is an engine-specific annotation (class/pyClass reference JVM/Python deserialization code) over an existing physical type. A UDT-unaware reader that reads the sqlType reads correct data, so unlike timestampNtz/variant there is nothing for a reader to opt into. This mirrors the void precedent.

A companion kernel implementation (DataType::UserDefined read support in delta-kernel-rs) is proposed separately.

How was this patch tested?

Documentation-only change.

Does this PR introduce any user-facing changes?

No behavioral change. Documents the disposition of an existing, previously-undocumented data type that Spark already writes.

Authored with assistance from Claude Code.

Spark writes UserDefinedType (`udt`) columns into `schemaString`, but the protocol's schema type system does not define `udt`, so such columns are non-conformant today even though they already exist in tables in the wild. Add a note (parallel to the existing `void` note) documenting the disposition: a reader that cannot run the engine's deserialization code MUST read the column as its physical `sqlType`, which is the on-disk Parquet representation. No table feature is introduced: `udt` adds no new physical representation (it is an annotation over an existing physical type), so a UDT-unaware reader that reads the `sqlType` reads correct data. Co-authored-by: Isaac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Document the udt (UserDefinedType) data type disposition#7093

Document the udt (UserDefinedType) data type disposition#7093
sanujbasu wants to merge 1 commit into
delta-io:masterfrom
sanujbasu:udt-protocol-note

sanujbasu commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

sanujbasu commented Jun 25, 2026

Description

How was this patch tested?

Does this PR introduce any user-facing changes?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant