Skip to content
Open
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
123 changes: 123 additions & 0 deletions protocol_rfcs/interval-types.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# Interval Types
**Associated Github issue for discussions: https://github.com/delta-io/delta/issues/7077**

This protocol change adds support for interval types. It consists of two changes to the protocol:

- One new reader/writer table feature
- Two new primitive types (year-month and day-second)

--------

> ***Add a new section in front of the [Primitive Types](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#primitive-types) section.***

# Interval Types Table Feature

This table feature (`intervalTypes`) adds the year-month and day-second interval types from ANSI SQL:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe link to a definition of interval types. https://docs.databricks.com/aws/en/sql/language-manual/data-types/interval-type would be one option, but maybe there's a more generic "sql reference". I didn't find one in a 20 second search but there should be one somewhere :)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@rajayush143 rajayush143 Jun 29, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, thought I had updated this with a link to spark SQL documentation: https://spark.apache.org/docs/latest/sql-ref-datatypes.html. I'll add that link


1. **interval year to month**: A signed number of months, e.g. `INTERVAL '1-6' YEAR TO MONTH` represents 18 (1 year + 6 months = 18 months).
2. **interval day to second**: A signed number of microseconds, e.g. `INTERVAL '1 00:00:01.000000' DAY TO SECOND` represents 86,401,000,000 (1 day + 1 second = 86,400,000,000 + 1,000,000 μs).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use a more interesting example? here you have 1 day and 1 second. Let's include hour, minute, microsecond, etc so that this example is more representative


To support this feature:
- The table must be on **Reader Version 3** and **Writer Version 7**.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to bold these -- please follow existing style conventions.
i.e. I looked at Clustering feature and the versions are not bolded.

- The feature `intervalTypes` must be listed in the table `protocol`'s `readerFeatures` and `writerFeatures`.

## Type Definitions

In the schema, interval types are serialized in `Metadata.schemaString` as

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Case sensitive? insensitive?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added case sensitivity in Error conditions section, but will also mention it here.


- `interval year to month`
- `interval day to second`

These are the canonical type-name strings. ANSI SQL also permits narrowed spellings that denote the same two types: for year-month, `interval year` and `interval month`; for day-second, `interval day`, `interval hour`, `interval minute`, `interval second`, and any `<start> to <end>` range between those fields (e.g. `interval day to minute`, `interval hour to second`). Mixed-family spellings (e.g. `interval month to day`) are not valid. Tables written by existing engines may use these narrowed spellings.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and any to range between those fields -- what does "between those fields mean"? between which fields? I see the following example but I thik the "those fields" part can be clearer.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixed-family spellings (e.g. interval month to day) --- not clear what a family is here

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should more crisply define the exact two family types here


Regardless of which spelling is used, the stored value is the same: every year-month spelling stores a signed count of months, and every day-second spelling stores a signed count of microseconds. The spelling affects only how a value is displayed, not how it is stored.

Interval types are permitted anywhere a primitive type is permitted: as top-level columns, as nested struct fields, as array element types, and as map key or value types. For example:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value type? (singular?) I defer to you ..


```
{
"type": "struct",
"fields": [
{ "name": "duration_ym", "type": "interval year to month", "nullable": true, "metadata": {} },
{ "name": "duration_dt", "type": "interval day to second", "nullable": false, "metadata": {} },
{
"name": "durations",
"type": { "type": "array", "elementType": "interval day to second", "containsNull": true },
"nullable": true,
"metadata": {}
}
]
}
```

### Reader Requirements

When this table feature is supported, readers must:

- Interpret `interval year to month` as a signed count of months, and `interval day to second` as a signed count of microseconds.
- Accept the narrowed spellings above and normalize each to its family: any year-month spelling is treated as `interval year to month`, and any day-second spelling is treated as `interval day to second`.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above: I think you should more crisply define each family type. There are exactly two of them.


### Writer Requirements

When this table feature is supported, writers must:

- Serialize an interval field's type in `Metadata.schemaString` using the canonical `interval year to month` or `interval day to second` form.
- Ensure the `intervalTypes` feature is present in the table `protocol`'s `readerFeatures` and `writerFeatures` whenever the table schema contains an interval type. The feature is enabled automatically by the presence of an interval-typed column; there is no separate table property to set (analogous to `timestampNtz`).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. I'm not sure about this. There are existing tables with interval type today, right?

So, customer X is using engine Foo to write to some Delta table. Their pipeline has been running for 2 years.

Then Kernel comes, writes to the table, "ensures that he feature is present", and upgrades the protocol??!!

Then that Foo engine breaks since it is not aware of the intervalTypes table feature??

I think we need to double check this requirement.

@rajayush143 rajayush143 Jun 29, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out! The requirement is: Kernel must not write interval types to any table without the intervalType table feature present, although implementing read support for interval types from tables without the intervalType table feature present is up to the engine/connector. I'll clarify the part that says the feature is enabled automatically to reflect this requirement more accurately on the writer's side.


## Partition Value Serialization

Intervals can be a partition value, so we define Partition Value Serialization as the ANSI literal form for interval types as defined by the Spark SQL guide [1]. We provide an example below:

```
Interval Year Month: "INTERVAL '1-0' YEAR TO MONTH"
Interval Day Second: "INTERVAL '7 12:34:56.123456' DAY TO SECOND"
```

Where `'1-0'` refers to `years-months` and `'7 12:34:56.123456'` refers to `days hours:minutes:seconds.microseconds`.

Interval partition values must not be used for partition pruning. Consistent with the data-skipping restriction for interval columns (see [Per-file Statistics](#per-file-statistics)), readers must not eliminate files based on interval partition values.

## Per-file Statistics

Interval columns do not support `minValues`/`maxValues` statistics or data skipping. Writers must not record `minValues` or `maxValues` for interval columns, and readers must not perform data skipping over interval columns. The per-column `nullCount` and the per-file `numRecords` statistics are unaffected and are still recorded as normal, since they do not require interpreting interval values. This is consistent with existing tables that contain interval types, which do not record `minValues`/`maxValues` for these columns.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Interval columns do not support `minValues`/`maxValues` statistics or data skipping. Writers must not record `minValues` or `maxValues` for interval columns, and readers must not perform data skipping over interval columns. The per-column `nullCount` and the per-file `numRecords` statistics are unaffected and are still recorded as normal, since they do not require interpreting interval values. This is consistent with existing tables that contain interval types, which do not record `minValues`/`maxValues` for these columns.
Interval columns do not support `minValues`/`maxValues` statistics or data skipping. Writers must not record `minValues` or `maxValues` for interval columns, and readers must not perform data skipping over interval columns. The per-column `nullCount` and the per-file `numRecords` statistics are unaffected and are still recorded as normal, since they do not require interpreting interval values.

I don't think we need to note this


## Parquet Format

Interval values are stored using a raw Parquet physical type with no logical-type annotation:

- `interval year to month` is stored as a Parquet `int32` holding the signed count of months.
- `interval day to second` is stored as a Parquet `int64` holding the signed count of microseconds.

Because no Parquet logical type is written, an interval column is physically indistinguishable from a Parquet `int32`/`int64` (i.e. a Delta `integer`/`long`); the interval semantics are carried solely by the Delta schema in `Metadata.schemaString`. This representation supports signed intervals and microsecond precision while matching the physical layout of existing interval tables.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Because no Parquet logical type is written, an interval column is physically indistinguishable from a Parquet `int32`/`int64` (i.e. a Delta `integer`/`long`); the interval semantics are carried solely by the Delta schema in `Metadata.schemaString`. This representation supports signed intervals and microsecond precision while matching the physical layout of existing interval tables.
Because no Parquet logical type is written, an interval column is physically indistinguishable from a Parquet `int32`/`int64` (i.e. a Delta `integer`/`long`); the interval semantics are carried solely by the Delta schema in `Metadata.schemaString`. This representation supports signed intervals and microsecond precision.


## Feature Interactions

Beyond the partition-value and statistics behavior described above, and the restrictions listed in [Error Conditions](#error-conditions), interval types have no special interactions with other table features.

## Error Conditions

- **Unrecognized type-name strings.** Type-name matching is case-sensitive. A reader that encounters an interval type-name string that is not one of the recognized canonical or narrowed spellings, including a mixed-family spelling such as `interval month to day`, or a case variant such as `INTERVAL Year To Month`, must reject the schema with an error rather than silently coercing it to a supported type.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **Unrecognized type-name strings.** Type-name matching is case-sensitive. A reader that encounters an interval type-name string that is not one of the recognized canonical or narrowed spellings, including a mixed-family spelling such as `interval month to day`, or a case variant such as `INTERVAL Year To Month`, must reject the schema with an error rather than silently coercing it to a supported type.
- **Unrecognized type-name strings.** Type-name matching is case-sensitive. A reader that encounters an interval type-name string that is not one of the recognized canonical or narrowed spellings, including a mixed-family spelling such as `interval month to day` (not one of `year to month` or `day to second`), or a case variant such as `INTERVAL Year To Month`, must reject the schema with an error rather than silently coercing it to a supported type.

- **Feature not present.** A writer must add `intervalTypes` to the table `protocol`'s `readerFeatures` and `writerFeatures` whenever it writes a schema containing an interval type (see [Writer Requirements](#writer-requirements)). A reader that encounters an interval type in the schema while `intervalTypes` is absent from `readerFeatures` must reject the table.
- **Value overflow on write.** An `interval year to month` value must fit in a signed `int32` count of months, and an `interval day to second` value must fit in a signed `int64` count of microseconds. A writer must reject any value that overflows these bounds.
- **Malformed or out-of-range partition values.** When reading, a partition value that is not a valid ANSI interval literal, or whose decoded value does not fit the column's underlying `int32`/`int64` range, must be rejected with an error.
- **IcebergCompat incompatibility.** Apache Iceberg has no interval type. When any of the `icebergCompatV1`, `icebergCompatV2`, or `icebergCompatV3` features is enabled, a writer must reject — at schema validation — any schema containing an interval type.

> ***Add new rows to the [Primitive Types](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#primitive-types) table.***

| Type Name | Description |
| --- | --- |
| interval year to month | Signed duration in the precision of months |
| interval day to second | Signed duration in the precision of microseconds |

> ***Add new rows to the [Delta Data Type to Parquet Type Mappings](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#delta-data-type-to-parquet-type-mappings) table.***

| Delta Type Name | Parquet Physical Type | Parquet Logical Type |
| --- | --- | --- |
| interval year to month | `int32` | |
| interval day to second | `int64` | |

# References

[1] https://spark.apache.org/docs/latest/sql-ref-literals.html#interval-literal

[2] https://github.com/delta-io/delta/issues/7077