Skip to content

[Feature Request]: hive partition external table #24320

@iamlinjunhong

Description

@iamlinjunhong

Is there an existing issue for the same feature request?

  • I have checked the existing issues.

Is your feature request related to a problem?

Yes. Today MatrixOne can read Parquet from S3 via CREATE EXTERNAL TABLE ... URL s3option{...}, but cannot recognize Hive-style
  partitioned layouts. A typical TPC-DS / data lake setup looks like this:

  s3://nova-nt/user/hive/warehouse/training.db/catalog_returns/data/
    cr_returned_date_sk=2450815/part-00000.parquet
    cr_returned_date_sk=2450816/part-00000.parquet
    ...
    cr_returned_date_sk=2452641/part-00000.parquet    # 1,800+ daily partitions

  Problems this creates for users today:

  - No recursive discovery. filepath='.../catalog_returns/data/' returns nothing — users must hand-write glob patterns like
  .../data/*/*.
  - Partition key is invisible. cr_returned_date_sk exists only in the directory name; Spark/Hive strip it from the Parquet file to save
   space. MatrixOne has no way to expose it as a column.
  - No partition pruning. WHERE cr_returned_date_sk = 2450821 ends up scanning all 1,800+ partitions instead of opening one file.
  - Lakehouse interoperability gap. Trino, Spark, StarRocks, and Impala all read this layout natively. MatrixOne users either maintain a
   separate copy of the data or can't participate in a multi-engine lakehouse.

Describe the feature you'd like

Explicit, opt-in Hive-style partitioned external tables. DDL:

CREATE EXTERNAL TABLE catalog_returns (
cr_item_sk INT NOT NULL,
cr_order_number BIGINT NOT NULL,
cr_net_loss DECIMAL(7,2),
cr_returned_date_sk INT -- partition column
) URL s3option{
"endpoint" = "https://storage.yandexcloud.net",
"bucket" = "nova-nt",
"filepath" = "user/hive/warehouse/training.db/catalog_returns/data/",
"format" = "parquet",
"hive_partitioning" = "true",
"hive_partition_columns" = "cr_returned_date_sk"
};

Runtime behaviour:

  • Point filepath at the partition root; MatrixOne discovers key=value/ directories automatically (single-level or multi-level).
  • Partition columns behave like regular columns: DESCRIBE, SELECT *, WHERE, GROUP BY, ORDER BY, JOIN all work, even when the Parquet
    files don't physically contain the column.
  • WHERE col = v and WHERE col IN (...) are pushed into directory discovery, so List/Get counts on the object store scale with the
    number of matching partitions rather than the total partition count.
  • Unsupported predicates (range, OR, cast, LIKE '%...%') fall back to row-level filtering; results are always correct.
  • Works identically against local filesystem (INFILE{...}) and S3-compatible object stores (MinIO, AWS S3, Yandex).

Non-goals (explicit, not for this issue):

  • Apache Iceberg / Hudi / Delta Lake table formats
  • Hive Metastore / AWS Glue / Unity Catalog integration
  • Writes (INSERT / LOAD DATA into partitioned external tables)
  • Schema evolution
  • CSV / JSONLINE hive partitions (Parquet only in P0)

Describe implementation you've considered

A parallel discovery path separate from the existing ReadDir glob flow. Key pieces:

  • DiscoverHivePartitions (new): recursive list-and-filter over key=value/ directories. For a declared N-level partitioning, the walk
    descends N times then collects files. S3 request shape is List = matched_partitions + 1 + Get = matched_partitions.
  • Conservative three-state matching (MatchTrue / MatchFalse / MatchUnknown). Only MatchFalse prunes. Unknown types (DECIMAL, DATE,
    TIMESTAMP, BOOL, collation-sensitive VARCHAR mismatches) are kept and evaluated by the row filter — never a false-negative pruning
    decision.
  • Double filtering for correctness: partition filters live in both the directory pruner and the runtime row filter, so a buggy or
    conservative pruner cannot drop data that SQL semantics requires.
  • Partition column types persisted in CreateSql JSON (HivePartitionColTypes snapshot at DDL time), not looked up from catalog at
    compile. Avoids an extra RPC and closes a race window with concurrent DDL.
  • Hard limits: warnPartitionCount = 5000 (WARN log), maxPartitionCount = 50000 (error), maxListCalls = 10000 (error) — protects the
    coordinator from pathological layouts.
  • Compile integration: getHivePartitionFileList replaces ReadDir when hive_partitioning=true. A runFilePathFilters helper evaluates
    __mo_filepath LIKE ... predicates at compile time; unconsumed filters are appended back to row filters so nothing is ever silently
    dropped.
  • Path-value wins when a Parquet file happens to contain a physical column with the same name as a partition column (Hive/Spark
    convention).
  • HIVE_DEFAULT_PARTITION maps to SQL NULL on nullable columns; raises constraint violation on NOT NULL.
  • URL decoding happens in exactly one place (ParseHivePartitionSegment). Directory names with literal % are currently rejected with an
    explicit error (known P0 limitation due to the FileService path parser; fix deferred to a follow-up).

Explicitly out of scope for the first iteration: distributed per-file execution across CNs (forced param.Parallel = false), partition
metadata caching between queries, range-predicate directory pruning, and concurrent List calls.

Documentation, Adoption, Use Case, Migration Strategy

Additional information

hive_partition_user_guide_en.md

Metadata

Metadata

Assignees

Labels

kind/featurepriority/p0Critical feature that should be implemented in this version

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions