From c593bdc2eef18e2eba719cf25e9fc7e57bf4f030 Mon Sep 17 00:00:00 2001 From: Ziya Mukhtarov Date: Tue, 23 Jun 2026 09:13:33 +0000 Subject: [PATCH 1/3] [PROTOCOL RFC] Full Void Type Support --- protocol_rfcs/README.md | 3 +- protocol_rfcs/void-type.md | 75 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 77 insertions(+), 1 deletion(-) create mode 100644 protocol_rfcs/void-type.md diff --git a/protocol_rfcs/README.md b/protocol_rfcs/README.md index 2c6bb51ec7d..489896a025b 100644 --- a/protocol_rfcs/README.md +++ b/protocol_rfcs/README.md @@ -22,7 +22,8 @@ Here is the history of all the RFCs propose/accepted/rejected since Feb 6, 2024, | 2024-04-30 | [collated-string-type.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/collated-string-type.md) | https://github.com/delta-io/delta/issues/2894 | Collated String Type | | 2025-03-13 | [checkpoint-protection.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/checkpoint-protection.md) | https://github.com/delta-io/delta/issues/4152 | Checkpoint Protection | | 2025-03-18 | [iceberg-writer-compat-v1.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/iceberg-writer-compat-v1.md) | https://github.com/delta-io/delta/issues/4284 | IcebergWriterCompatV1 | -| 2025-11-20 | [materialize-partition-columns.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/materialize-partition-columns.md) | https://github.com/delta-io/delta/issues/5555 | Materialize Partition Columns | +| 2025-11-20 | [materialize-partition-columns.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/materialize-partition-columns.md) | https://github.com/delta-io/delta/issues/5555 | Materialize Partition Columns | +| 2026-06-22 | [void-type.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/void-type.md) | https://github.com/delta-io/delta/issues/7072 | Void Type | ### Accepted RFCs diff --git a/protocol_rfcs/void-type.md b/protocol_rfcs/void-type.md new file mode 100644 index 00000000000..f9f648bb104 --- /dev/null +++ b/protocol_rfcs/void-type.md @@ -0,0 +1,75 @@ +# Void Type +**Associated Github issue for discussions: https://github.com/delta-io/delta/issues/7072** + +This protocol change adds support for using the `void` data type (also known as `NullType` in Spark, `UnknownType` in Iceberg, and `UNKNOWN` in Parquet) anywhere in a Delta table schema, via a new reader/writer table feature, `voidType`. + +`void` is a data type with a single possible value: `NULL`. A column ends up with this type when the writer has no information about its actual type, typically because every value observed so far has been `NULL` (for example, `CREATE TABLE t AS SELECT NULL AS a`, or schema evolution that adds a column containing only `NULL`s). + +Today, `void` columns are represented by omitting them from data files and reconstructing them as all-`NULL` columns on read (the missing columns mechanism). That representation cannot encode four schema shapes - a table whose columns are all `void`, a `struct` whose fields are all `void`, a `void` nested in an `array`, and a `void` nested in a `map` - because in each case omitting the `void` column(s) would leave the enclosing `struct`, `array`, or `map` (or the table itself) with nothing written to a data file, and therefore nowhere to record whether the enclosing value is `NULL`, empty, or how long it is. Writers must reject writing data in those cases. + +The `voidType` table feature lifts these restrictions by storing the `void` columns those shapes require - the **structural** `void` columns - using the Parquet [`UNKNOWN` logical type](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#unknown-always-null): a column whose values are always `NULL` but which is physically present in the data file, so it can carry the information about its enclosing complex value. Older clients cannot read `UNKNOWN` columns, so the representation is gated behind this reader/writer feature. + +-------- + +> ***New section after the [Variant Shredding](/PROTOCOL.md#variant-shredding) section*** + +# Void Type + +`void` is a primitive data type (see [Primitive Types](/PROTOCOL.md#primitive-types)) with a single possible value, `NULL`. A `void` column can be represented in a data file in one of two ways: + +- **Omitted** - the column is not written to the data file and is reconstructed on read as an all-`NULL` column, following the [rule](/PROTOCOL.md#consistency-between-table-metadata-and-data-files) that a column present in the table schema but absent from a data file is read as `NULL`. +- **Stored as `UNKNOWN`** - the column is written using the Parquet [`UNKNOWN` logical type](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#unknown-always-null). Its values are always `NULL`, but unlike an omitted column it is physically present in the data file. This representation requires the `voidType` table feature. + +A `void` column may be changed to any other data type through supported schema-evolution operations; this does not require the [Type Widening](/PROTOCOL.md#type-widening) table feature, even when the `void` column is stored as `UNKNOWN`. + +## Void columns without the table feature + +When the `voidType` feature is not supported, `void` columns can only be **omitted**. Because a `void` column is never written to a data file, writers must reject **writing data** to a table whose schema contains any of the following shapes, in which omitting the `void` column(s) would leave nowhere to record the nullability or length of an enclosing value: +- a `void` type directly inside an `array` or `map` at any nesting level; +- a `struct` (at any nesting level) whose fields are all `void`; or +- a table whose columns are all `void`. + +These restrictions are stated in terms of the **table schema**, not the schema of any individual data file. A table with such a schema can still be created, altered through metadata-only operations, and read. It can be made writable by evolving its schema - for example, by changing a `void` column to another type - or by enabling the `voidType` feature. + +## Void Type table feature + +The `voidType` table feature lifts the restrictions above by allowing those schema shapes: the `void` columns that cannot be omitted - the **structural** `void` columns - are instead stored using the Parquet `UNKNOWN` logical type. + +To support this feature: +- The table must be on Reader Version 3 and Writer Version 7. +- The feature `voidType` must exist in the table `protocol`'s `readerFeatures` and `writerFeatures`. + +The feature has a dual purpose: + +1. A client that supports `voidType` is guaranteed to correctly read and write `void` columns that rely on the missing columns mechanism. Enabling the feature for a table that only uses such columns is **optional**; a user may choose to enable it so that only clients capable of handling `void` columns correctly interact with the table. +2. A client that supports `voidType` is also guaranteed to correctly read and write structural `void` columns (those stored as `UNKNOWN`). Enabling the feature is **required** to write data for a schema that needs the `UNKNOWN` representation, because clients that do not support the feature cannot read `UNKNOWN` columns. + +### Structural void columns + +A `void` column is **structural** when it must be stored as `UNKNOWN` rather than omitted, because omitting it would leave an enclosing `struct`, `array`, or `map` (or the table) with nothing written to the data file. This arises only in the schema shapes that the missing columns mechanism cannot represent (the shapes listed in [Void columns without the table feature](#void-columns-without-the-table-feature)): +- a `void` directly inside an `array` or `map` (at any nesting level): the `void` element or value is structural and must be stored as `UNKNOWN`. +- a `struct` whose fields are all `void`, or a table whose columns are all `void`: at least one of the `void` columns is structural and must be stored as `UNKNOWN`. + +A `void` column in any other position is never structural: it can be omitted, and does not require the feature. A schema that contains one of the shapes above is said to **require** the `voidType` feature. + +### Writer Requirements for Void Type + +When Void Type is supported (when the `writerFeatures` field of a table's `protocol` action contains `voidType`), writers: +- must store the table's structural `void` columns as `UNKNOWN` (see [Structural void columns](#structural-void-columns)). +- may store any non-structural `void` column either by omission or as an `UNKNOWN` column. + +### Reader Requirements for Void Type + +When Void Type is supported (when the `readerFeatures` field of a table's `protocol` action contains `voidType`), readers: +- must recognize and tolerate a `void` data type anywhere in a Delta table schema. +- must read a `void` column stored as `UNKNOWN` as an all-`NULL` column. +- must read an omitted `void` column as an all-`NULL` [missing column](/PROTOCOL.md#consistency-between-table-metadata-and-data-files). +- must, within a single scan, correctly combine data files that represent the same column differently - omitted, stored as `UNKNOWN`, or (after a type change) stored as a concrete type - into the requested read schema. + +### Removing the Void Type feature + +Because clients that do not support the feature cannot read `UNKNOWN` columns, removing `voidType` requires that the table no longer depend on the `UNKNOWN` representation. In the version that removes `voidType` from the `writerFeatures` and `readerFeatures` fields of the table's `protocol` action, writers: +- must ensure that the table schema does not require the feature - it must contain none of the shapes in [Structural void columns](#structural-void-columns). +- must ensure that no data file reachable by the table (including via time travel within the retained history) contains a column stored as `UNKNOWN`. This may require rewriting existing data files so that every `void` column is represented by omission. + +After the feature is removed, the table reverts to representing `void` columns only by omission, and the shapes that require the feature are again rejected when writing data. From a09d6e49359b71c702596a5b3c6b639d73167383 Mon Sep 17 00:00:00 2001 From: Ziya Mukhtarov Date: Thu, 25 Jun 2026 15:03:00 +0000 Subject: [PATCH 2/3] Address comments --- protocol_rfcs/void-type.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/protocol_rfcs/void-type.md b/protocol_rfcs/void-type.md index f9f648bb104..315c8eda905 100644 --- a/protocol_rfcs/void-type.md +++ b/protocol_rfcs/void-type.md @@ -46,9 +46,9 @@ The feature has a dual purpose: ### Structural void columns -A `void` column is **structural** when it must be stored as `UNKNOWN` rather than omitted, because omitting it would leave an enclosing `struct`, `array`, or `map` (or the table) with nothing written to the data file. This arises only in the schema shapes that the missing columns mechanism cannot represent (the shapes listed in [Void columns without the table feature](#void-columns-without-the-table-feature)): +A `void` column is **structural** when it cannot be omitted and is therefore stored as `UNKNOWN`, because omitting it would leave an enclosing `struct`, `array`, or `map` (or the table) with nothing written to the data file. This arises only in the schema shapes that the missing columns mechanism cannot represent (the shapes listed in [Void columns without the table feature](#void-columns-without-the-table-feature)): - a `void` directly inside an `array` or `map` (at any nesting level): the `void` element or value is structural and must be stored as `UNKNOWN`. -- a `struct` whose fields are all `void`, or a table whose columns are all `void`: at least one of the `void` columns is structural and must be stored as `UNKNOWN`. +- a `struct` whose fields are all `void`, or a table whose columns are all `void`: the writer chooses which `void` column is structural and must store it as `UNKNOWN`. A `void` column in any other position is never structural: it can be omitted, and does not require the feature. A schema that contains one of the shapes above is said to **require** the `voidType` feature. @@ -56,7 +56,7 @@ A `void` column in any other position is never structural: it can be omitted, an When Void Type is supported (when the `writerFeatures` field of a table's `protocol` action contains `voidType`), writers: - must store the table's structural `void` columns as `UNKNOWN` (see [Structural void columns](#structural-void-columns)). -- may store any non-structural `void` column either by omission or as an `UNKNOWN` column. +- should store any non-structural `void` column by omission. ### Reader Requirements for Void Type From a5327eaed6a67ec432f13cc48457c49d73c7ebe8 Mon Sep 17 00:00:00 2001 From: Ziya Mukhtarov Date: Fri, 26 Jun 2026 13:20:05 +0000 Subject: [PATCH 3/3] Address comments --- protocol_rfcs/void-type.md | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/protocol_rfcs/void-type.md b/protocol_rfcs/void-type.md index 315c8eda905..f1155334ba5 100644 --- a/protocol_rfcs/void-type.md +++ b/protocol_rfcs/void-type.md @@ -1,7 +1,7 @@ # Void Type **Associated Github issue for discussions: https://github.com/delta-io/delta/issues/7072** -This protocol change adds support for using the `void` data type (also known as `NullType` in Spark, `UnknownType` in Iceberg, and `UNKNOWN` in Parquet) anywhere in a Delta table schema, via a new reader/writer table feature, `voidType`. +The `voidType` reader/writer table feature adds support for using the `void` data type (also known as `NullType` in Spark, `UnknownType` in Iceberg, and `UNKNOWN` in Parquet) anywhere in a Delta table schema. `void` is a data type with a single possible value: `NULL`. A column ends up with this type when the writer has no information about its actual type, typically because every value observed so far has been `NULL` (for example, `CREATE TABLE t AS SELECT NULL AS a`, or schema evolution that adds a column containing only `NULL`s). @@ -55,16 +55,15 @@ A `void` column in any other position is never structural: it can be omitted, an ### Writer Requirements for Void Type When Void Type is supported (when the `writerFeatures` field of a table's `protocol` action contains `voidType`), writers: -- must store the table's structural `void` columns as `UNKNOWN` (see [Structural void columns](#structural-void-columns)). -- should store any non-structural `void` column by omission. +- must write the table's structural `void` columns to data files (see [Structural void columns](#structural-void-columns)). +- should omit any non-structural `void` column. ### Reader Requirements for Void Type When Void Type is supported (when the `readerFeatures` field of a table's `protocol` action contains `voidType`), readers: -- must recognize and tolerate a `void` data type anywhere in a Delta table schema. -- must read a `void` column stored as `UNKNOWN` as an all-`NULL` column. -- must read an omitted `void` column as an all-`NULL` [missing column](/PROTOCOL.md#consistency-between-table-metadata-and-data-files). -- must, within a single scan, correctly combine data files that represent the same column differently - omitted, stored as `UNKNOWN`, or (after a type change) stored as a concrete type - into the requested read schema. +- must allow a `void` data type anywhere in a Delta table schema. +- must return only `NULL` values for a `void` column regardless of how it is represented. +- must, within a single scan, correctly combine data files that represent the same column differently - omitted, written as an all-`NULL` column, or (after a type change) written with a concrete type - into the requested read schema. ### Removing the Void Type feature