From a20cce20179b702b516958f93ed2b56870f06170 Mon Sep 17 00:00:00 2001 From: Ayush Raj Date: Tue, 23 Jun 2026 19:09:16 +0000 Subject: [PATCH 1/3] interval type rfc --- protocol_rfcs/interval-types.md | 68 +++++++++++++++++++++++++++++++++ 1 file changed, 68 insertions(+) create mode 100644 protocol_rfcs/interval-types.md diff --git a/protocol_rfcs/interval-types.md b/protocol_rfcs/interval-types.md new file mode 100644 index 00000000000..2c83bfd2f60 --- /dev/null +++ b/protocol_rfcs/interval-types.md @@ -0,0 +1,68 @@ +# Interval Types +**Associated Github issue for discussions: https://github.com/delta-io/delta/issues/7077** + +This protocol change adds support for interval types. It consists of two changes to the protocol: + +- One new reader/writer table feature +- Two new primitive types (year-month and day-second) + +-------- + +> ***Add a new section in front of the [Primitive Types](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#primitive-types) section.*** + +# Interval Types Table Feature + +This table feature (`intervalTypes`) adds the year-month and day-time interval types from ANSI SQL: + +1. **interval year to month**: A signed number of months, e.g. `INTERVAL '1-6' YEAR TO MONTH` represents 18 (1 year + 6 months = 18 months). +2. **interval day to second**: A signed number of microseconds, e.g. `INTERVAL '1 00:00:01.000000' DAY TO SECOND` represents 86,401,000,000 (1 day + 1 second = 86,400,000,000 + 1,000,000 μs). + +To support this feature: +- The table must be on **Reader Version 3** and **Writer Version 7**. +- The feature `intervalTypes` must be listed in the table `protocol`'s `readerFeatures` and `writerFeatures`. + +## Type Definitions + +In the schema, interval types are serialized as + +- `interval year to month` +- `interval day to second` + +When this table feature is supported: + +- Readers must interpret `interval year to month` and `interval day to second` as signed counts of months and microseconds, respectively. +- Writers must serialize an interval field's type in `Metadata.schemaString` as `interval year to month` or `interval day to second`. + +## Partition Value Serialization + +Intervals can be a partition value, so we define Partition Value Serialization as the ANSI literal form for interval types as defined by the Spark SQL guide [1]. We provide an example below: + +``` +Interval Year Month: "INTERVAL '1-0' YEAR TO MONTH" +Interval Day Second: "INTERVAL '7 12:34:56.123456' DAY TO SECOND" +``` + +Where `'1-0'` refers to `years-months` and `'7 12:34:56.123456'` refers to `days hours:minutes:seconds.microseconds`. + +## Per-file Statistics + +Interval types do not support per-file statistics or data skipping. Writers must not record `minValues` or `maxValues` statistics for interval columns, and readers must not perform data skipping over interval columns. This is consistent with existing tables that contain interval types, which do not support per-file statistics for these columns. + +## Parquet Format + +We use raw `int32` values to represent year-month intervals and raw `int64` values to represent day-time intervals in Parquet. This allows us to support signed intervals and microsecond precision. + +The choice of underlying Parquet representation involves design trade-offs that are still under discussion; these design decisions are tracked in the associated GitHub issue [2]. + +> ***Add new rows to the [Primitive Types](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#primitive-types) table.*** + +| Type Name | Description | +| --- | --- | +| interval year to month | Signed duration in the precision of months | +| interval day to second | Signed duration in the precision of microseconds | + +# References + +[1] https://spark.apache.org/docs/latest/sql-ref-literals.html#interval-literal + +[2] https://github.com/delta-io/delta/issues/7077 From 0c25e37175cd7297bf1e50a14693840ac2e718a7 Mon Sep 17 00:00:00 2001 From: Ayush Raj Date: Thu, 25 Jun 2026 00:28:08 +0000 Subject: [PATCH 2/3] updated rfc --- protocol_rfcs/interval-types.md | 28 ++++++++++++++++++++-------- 1 file changed, 20 insertions(+), 8 deletions(-) diff --git a/protocol_rfcs/interval-types.md b/protocol_rfcs/interval-types.md index 2c83bfd2f60..7cff4ca8067 100644 --- a/protocol_rfcs/interval-types.md +++ b/protocol_rfcs/interval-types.md @@ -12,7 +12,7 @@ This protocol change adds support for interval types. It consists of two changes # Interval Types Table Feature -This table feature (`intervalTypes`) adds the year-month and day-time interval types from ANSI SQL: +This table feature (`intervalTypes`) adds the year-month and day-second interval types from ANSI SQL: 1. **interval year to month**: A signed number of months, e.g. `INTERVAL '1-6' YEAR TO MONTH` represents 18 (1 year + 6 months = 18 months). 2. **interval day to second**: A signed number of microseconds, e.g. `INTERVAL '1 00:00:01.000000' DAY TO SECOND` represents 86,401,000,000 (1 day + 1 second = 86,400,000,000 + 1,000,000 μs). @@ -23,15 +23,25 @@ To support this feature: ## Type Definitions -In the schema, interval types are serialized as +In the schema, interval types are serialized in `Metadata.schemaString` as - `interval year to month` - `interval day to second` -When this table feature is supported: +These are the canonical type-name strings. ANSI SQL also permits narrowed spellings that denote the same two types: for year-month, `interval year` and `interval month`; for day-second, `interval day`, `interval hour`, `interval minute`, `interval second`, and any ` to ` range between those fields (e.g. `interval day to minute`, `interval hour to second`). Mixed-family spellings (e.g. `interval month to day`) are not valid. Tables written by existing engines may use these narrowed spellings. -- Readers must interpret `interval year to month` and `interval day to second` as signed counts of months and microseconds, respectively. -- Writers must serialize an interval field's type in `Metadata.schemaString` as `interval year to month` or `interval day to second`. +### Reader Requirements + +When this table feature is supported, readers must: + +- Interpret `interval year to month` as a signed count of months, and `interval day to second` as a signed count of microseconds. +- Accept the narrowed spellings above and normalize each to its family: any year-month spelling is treated as `interval year to month`, and any day-second spelling is treated as `interval day to second`. + +### Writer Requirements + +When this table feature is supported, writers must: + +- Serialize an interval field's type in `Metadata.schemaString` using the canonical `interval year to month` or `interval day to second` form. ## Partition Value Serialization @@ -46,13 +56,15 @@ Where `'1-0'` refers to `years-months` and `'7 12:34:56.123456'` refers to `days ## Per-file Statistics -Interval types do not support per-file statistics or data skipping. Writers must not record `minValues` or `maxValues` statistics for interval columns, and readers must not perform data skipping over interval columns. This is consistent with existing tables that contain interval types, which do not support per-file statistics for these columns. +Interval columns do not support `minValues`/`maxValues` statistics or data skipping. Writers must not record `minValues` or `maxValues` for interval columns, and readers must not perform data skipping over interval columns. The per-column `nullCount` and the per-file `numRecords` statistics are unaffected and are still recorded as normal, since they do not require interpreting interval values. This is consistent with existing tables that contain interval types, which do not record `minValues`/`maxValues` for these columns. ## Parquet Format -We use raw `int32` values to represent year-month intervals and raw `int64` values to represent day-time intervals in Parquet. This allows us to support signed intervals and microsecond precision. +We use raw `int32` values to represent year-month intervals and raw `int64` values to represent day-second intervals in Parquet. This allows us to support signed intervals & microsecond precision while matching existing interval types. + +## Feature Interactions -The choice of underlying Parquet representation involves design trade-offs that are still under discussion; these design decisions are tracked in the associated GitHub issue [2]. +Beyond the partition-value and statistics behavior described above, interval types have no special interactions with other table features. > ***Add new rows to the [Primitive Types](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#primitive-types) table.*** From 395c0dd9d03cf2a77731316a24e3e1d7cd511639 Mon Sep 17 00:00:00 2001 From: Ayush Raj Date: Thu, 25 Jun 2026 20:00:44 +0000 Subject: [PATCH 3/3] added more details --- protocol_rfcs/interval-types.md | 47 +++++++++++++++++++++++++++++++-- 1 file changed, 45 insertions(+), 2 deletions(-) diff --git a/protocol_rfcs/interval-types.md b/protocol_rfcs/interval-types.md index 7cff4ca8067..4ac90d5b251 100644 --- a/protocol_rfcs/interval-types.md +++ b/protocol_rfcs/interval-types.md @@ -30,6 +30,26 @@ In the schema, interval types are serialized in `Metadata.schemaString` as These are the canonical type-name strings. ANSI SQL also permits narrowed spellings that denote the same two types: for year-month, `interval year` and `interval month`; for day-second, `interval day`, `interval hour`, `interval minute`, `interval second`, and any ` to ` range between those fields (e.g. `interval day to minute`, `interval hour to second`). Mixed-family spellings (e.g. `interval month to day`) are not valid. Tables written by existing engines may use these narrowed spellings. +Regardless of which spelling is used, the stored value is the same: every year-month spelling stores a signed count of months, and every day-second spelling stores a signed count of microseconds. The spelling affects only how a value is displayed, not how it is stored. + +Interval types are permitted anywhere a primitive type is permitted: as top-level columns, as nested struct fields, as array element types, and as map key or value types. For example: + +``` +{ + "type": "struct", + "fields": [ + { "name": "duration_ym", "type": "interval year to month", "nullable": true, "metadata": {} }, + { "name": "duration_dt", "type": "interval day to second", "nullable": false, "metadata": {} }, + { + "name": "durations", + "type": { "type": "array", "elementType": "interval day to second", "containsNull": true }, + "nullable": true, + "metadata": {} + } + ] +} +``` + ### Reader Requirements When this table feature is supported, readers must: @@ -42,6 +62,7 @@ When this table feature is supported, readers must: When this table feature is supported, writers must: - Serialize an interval field's type in `Metadata.schemaString` using the canonical `interval year to month` or `interval day to second` form. +- Ensure the `intervalTypes` feature is present in the table `protocol`'s `readerFeatures` and `writerFeatures` whenever the table schema contains an interval type. The feature is enabled automatically by the presence of an interval-typed column; there is no separate table property to set (analogous to `timestampNtz`). ## Partition Value Serialization @@ -54,17 +75,32 @@ Interval Day Second: "INTERVAL '7 12:34:56.123456' DAY TO SECOND" Where `'1-0'` refers to `years-months` and `'7 12:34:56.123456'` refers to `days hours:minutes:seconds.microseconds`. +Interval partition values must not be used for partition pruning. Consistent with the data-skipping restriction for interval columns (see [Per-file Statistics](#per-file-statistics)), readers must not eliminate files based on interval partition values. + ## Per-file Statistics Interval columns do not support `minValues`/`maxValues` statistics or data skipping. Writers must not record `minValues` or `maxValues` for interval columns, and readers must not perform data skipping over interval columns. The per-column `nullCount` and the per-file `numRecords` statistics are unaffected and are still recorded as normal, since they do not require interpreting interval values. This is consistent with existing tables that contain interval types, which do not record `minValues`/`maxValues` for these columns. ## Parquet Format -We use raw `int32` values to represent year-month intervals and raw `int64` values to represent day-second intervals in Parquet. This allows us to support signed intervals & microsecond precision while matching existing interval types. +Interval values are stored using a raw Parquet physical type with no logical-type annotation: + +- `interval year to month` is stored as a Parquet `int32` holding the signed count of months. +- `interval day to second` is stored as a Parquet `int64` holding the signed count of microseconds. + +Because no Parquet logical type is written, an interval column is physically indistinguishable from a Parquet `int32`/`int64` (i.e. a Delta `integer`/`long`); the interval semantics are carried solely by the Delta schema in `Metadata.schemaString`. This representation supports signed intervals and microsecond precision while matching the physical layout of existing interval tables. ## Feature Interactions -Beyond the partition-value and statistics behavior described above, interval types have no special interactions with other table features. +Beyond the partition-value and statistics behavior described above, and the restrictions listed in [Error Conditions](#error-conditions), interval types have no special interactions with other table features. + +## Error Conditions + +- **Unrecognized type-name strings.** Type-name matching is case-sensitive. A reader that encounters an interval type-name string that is not one of the recognized canonical or narrowed spellings, including a mixed-family spelling such as `interval month to day`, or a case variant such as `INTERVAL Year To Month`, must reject the schema with an error rather than silently coercing it to a supported type. +- **Feature not present.** A writer must add `intervalTypes` to the table `protocol`'s `readerFeatures` and `writerFeatures` whenever it writes a schema containing an interval type (see [Writer Requirements](#writer-requirements)). A reader that encounters an interval type in the schema while `intervalTypes` is absent from `readerFeatures` must reject the table. +- **Value overflow on write.** An `interval year to month` value must fit in a signed `int32` count of months, and an `interval day to second` value must fit in a signed `int64` count of microseconds. A writer must reject any value that overflows these bounds. +- **Malformed or out-of-range partition values.** When reading, a partition value that is not a valid ANSI interval literal, or whose decoded value does not fit the column's underlying `int32`/`int64` range, must be rejected with an error. +- **IcebergCompat incompatibility.** Apache Iceberg has no interval type. When any of the `icebergCompatV1`, `icebergCompatV2`, or `icebergCompatV3` features is enabled, a writer must reject — at schema validation — any schema containing an interval type. > ***Add new rows to the [Primitive Types](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#primitive-types) table.*** @@ -73,6 +109,13 @@ Beyond the partition-value and statistics behavior described above, interval typ | interval year to month | Signed duration in the precision of months | | interval day to second | Signed duration in the precision of microseconds | +> ***Add new rows to the [Delta Data Type to Parquet Type Mappings](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#delta-data-type-to-parquet-type-mappings) table.*** + +| Delta Type Name | Parquet Physical Type | Parquet Logical Type | +| --- | --- | --- | +| interval year to month | `int32` | | +| interval day to second | `int64` | | + # References [1] https://spark.apache.org/docs/latest/sql-ref-literals.html#interval-literal