Skip to content

[PROTOCOL][DISCUSSION] Empty string for string type partition columns#7091

Open
dengsh12 wants to merge 1 commit into
delta-io:masterfrom
dengsh12:empty_string
Open

[PROTOCOL][DISCUSSION] Empty string for string type partition columns#7091
dengsh12 wants to merge 1 commit into
delta-io:masterfrom
dengsh12:empty_string

Conversation

@dengsh12

Copy link
Copy Markdown
Collaborator

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (protocol)

Description

@DrakeLin @Jameson-Crate and I found a mismatch between spark and protocol:

According the delta protocol, for string type partition columns, empty strings will be serialized as "" in json commits, and read back as null.

Spark behaves differently: serialized empty string as null. If the partition value in json commits is "", spark read back as "".

Example 1 — write INSERT INTO t VALUES ('a', '') (partition col p = ''):

partitionValues in JSON commit
Protocol {"p": ""}
Spark {"p": null}

Example 2 — read commit already has partitionValues: {"p": ""}, then SELECT p:

result
Protocol null
Spark '' (empty string)

I feel like we have several options:

  • Change the protocol to align with spark
    • Not breaking, but lose the ability to distinguish "" vs null
  • Change both protocol and spark
    • For string partition columns, just leave it as it was on both read and write
      • Breaking, but provide the ability to distinguish "" vs null

Or maybe other solutions?

How was this patch tested?

Does this PR introduce any user-facing changes?

@dengsh12 dengsh12 requested a review from tdas as a code owner June 25, 2026 00:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant