Commit 7b04c0f
catalog: query-aware statistics requests via ScanArgs / ScanResult
Adds an opt-in handshake that lets callers ask a `TableProvider` for
specific stats by name and receive only what the provider can answer
cheaply, instead of the all-or-nothing dense `Statistics` we have today.
## What's new
* `datafusion-common::stats::StatisticsRequest` — enum of stat kinds
that mirror `Statistics` / `ColumnStatistics` (Min, Max, NullCount,
DistinctCount, Sum, ByteSize, RowCount, TotalByteSize). `Hash + Eq`
so it can key a `HashMap`.
* `datafusion-common::stats::StatisticsValue` — `Scalar(Precision<...>)
| Distribution(Arc<dyn Any>) | Sketch(Arc<dyn Any>) | Absent`. Whether
a value is exact or estimated travels in the `Precision` wrapper, not
the variant.
* `ScanArgs::with_statistics_requests` / `statistics_requests()` — the
caller's question.
* `ScanResult::with_statistics` / `statistics()` / `into_parts()` — the
provider's answer, paired 1:1 with the requests slice.
* `PartitionedFile::satisfied_stats` — sparse,
`Arc<HashMap<StatisticsRequest, StatisticsValue>>` for per-file
answers. Memory scales with what was asked, not with table width.
Providers that store stats out-of-band (Delta/Iceberg/Hudi manifests,
Hive Metastore, custom catalogs) can populate this directly without
rebuilding a full dense `Statistics`.
* `FilePruner` learns to consume the sparse map. Internally,
`file_stats_pruning` is now `Box<dyn PruningStatistics + Send + Sync>`
so we can dispatch between the existing `PrunableStatistics` (dense)
and a new `SparseFilePruningStats` adapter (sparse). The sparse
adapter looks up each `StatisticsRequest` directly in the map and
materializes single-row arrays only for the columns the pruning
predicate touches — no densify-then-throw-away.
* `ListingTable::scan_with_args` populates `ScanResult.statistics` from
the merged dense `Statistics` it already computed when
`args.statistics_requests()` is set and `collect_statistics=true`.
When `collect_statistics=false` it returns `Absent` for everything
(the contract is "answer what's free"). `DistinctCount`/`Sum`/
`ByteSize` are likewise `Absent` for parquet — those aren't in
thrift footers; layered helpers (or richer providers) can fill the
gaps.
## Backwards compat
All additions are opt-in:
* `ScanArgs` / `ScanResult` gain new fields with `Default`-friendly
initializers; existing callers that don't use the new builders see
no change.
* `FilePruner`'s field-type change is internal (private field).
* The only minor source-level break is a new pub field on
`PartitionedFile` (`satisfied_stats`). Callers using
`PartitionedFile::new` / `From<ObjectMeta>` / the existing builders
are unaffected. Direct struct literals — uncommon, none in-tree —
need to add `satisfied_stats: None` (or use the new
`with_satisfied_stats` builder).
## Tests
* `datafusion-common::stats::tests::statistics_request_is_hashable_keyable`
— round-trip a `StatisticsRequest` through a `HashMap`.
* `datafusion-pruning::file_pruner::tests` — three tests demonstrating
end-to-end pruning against a sparse-only `PartitionedFile` (`x > 100`
prunes a `[10, 20]` file, `x > 15` doesn't, no stats at all → no
pruner).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 9a29e33 commit 7b04c0f
5 files changed
Lines changed: 626 additions & 8 deletions
File tree
- datafusion
- catalog-listing/src
- catalog/src
- datasource/src
- expr-common/src
- pruning/src
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
26 | | - | |
| 26 | + | |
| 27 | + | |
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
| |||
39 | 40 | | |
40 | 41 | | |
41 | 42 | | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
42 | 46 | | |
43 | 47 | | |
44 | 48 | | |
| |||
515 | 519 | | |
516 | 520 | | |
517 | 521 | | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
518 | 555 | | |
519 | 556 | | |
520 | 557 | | |
| |||
688 | 725 | | |
689 | 726 | | |
690 | 727 | | |
| 728 | + | |
| 729 | + | |
| 730 | + | |
| 731 | + | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
| 740 | + | |
| 741 | + | |
| 742 | + | |
| 743 | + | |
| 744 | + | |
| 745 | + | |
| 746 | + | |
| 747 | + | |
| 748 | + | |
| 749 | + | |
| 750 | + | |
| 751 | + | |
| 752 | + | |
| 753 | + | |
| 754 | + | |
| 755 | + | |
| 756 | + | |
| 757 | + | |
| 758 | + | |
| 759 | + | |
| 760 | + | |
| 761 | + | |
| 762 | + | |
| 763 | + | |
| 764 | + | |
| 765 | + | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
| 770 | + | |
| 771 | + | |
| 772 | + | |
| 773 | + | |
| 774 | + | |
| 775 | + | |
| 776 | + | |
| 777 | + | |
| 778 | + | |
| 779 | + | |
| 780 | + | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
| 793 | + | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
| 797 | + | |
| 798 | + | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
| 804 | + | |
| 805 | + | |
| 806 | + | |
| 807 | + | |
| 808 | + | |
| 809 | + | |
| 810 | + | |
| 811 | + | |
| 812 | + | |
| 813 | + | |
| 814 | + | |
| 815 | + | |
| 816 | + | |
| 817 | + | |
| 818 | + | |
| 819 | + | |
| 820 | + | |
| 821 | + | |
| 822 | + | |
| 823 | + | |
| 824 | + | |
| 825 | + | |
| 826 | + | |
| 827 | + | |
| 828 | + | |
| 829 | + | |
| 830 | + | |
| 831 | + | |
| 832 | + | |
| 833 | + | |
691 | 834 | | |
692 | 835 | | |
693 | 836 | | |
| |||
1049 | 1192 | | |
1050 | 1193 | | |
1051 | 1194 | | |
| 1195 | + | |
| 1196 | + | |
| 1197 | + | |
| 1198 | + | |
| 1199 | + | |
| 1200 | + | |
| 1201 | + | |
| 1202 | + | |
| 1203 | + | |
| 1204 | + | |
| 1205 | + | |
| 1206 | + | |
| 1207 | + | |
| 1208 | + | |
| 1209 | + | |
| 1210 | + | |
| 1211 | + | |
| 1212 | + | |
| 1213 | + | |
| 1214 | + | |
| 1215 | + | |
| 1216 | + | |
| 1217 | + | |
| 1218 | + | |
| 1219 | + | |
| 1220 | + | |
| 1221 | + | |
| 1222 | + | |
| 1223 | + | |
| 1224 | + | |
| 1225 | + | |
| 1226 | + | |
| 1227 | + | |
| 1228 | + | |
| 1229 | + | |
| 1230 | + | |
| 1231 | + | |
| 1232 | + | |
| 1233 | + | |
| 1234 | + | |
| 1235 | + | |
| 1236 | + | |
| 1237 | + | |
| 1238 | + | |
| 1239 | + | |
| 1240 | + | |
| 1241 | + | |
| 1242 | + | |
| 1243 | + | |
| 1244 | + | |
| 1245 | + | |
| 1246 | + | |
| 1247 | + | |
| 1248 | + | |
| 1249 | + | |
| 1250 | + | |
| 1251 | + | |
| 1252 | + | |
| 1253 | + | |
| 1254 | + | |
| 1255 | + | |
| 1256 | + | |
| 1257 | + | |
| 1258 | + | |
| 1259 | + | |
| 1260 | + | |
| 1261 | + | |
| 1262 | + | |
| 1263 | + | |
| 1264 | + | |
| 1265 | + | |
| 1266 | + | |
| 1267 | + | |
| 1268 | + | |
1052 | 1269 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
| 29 | + | |
29 | 30 | | |
30 | 31 | | |
31 | 32 | | |
| |||
406 | 407 | | |
407 | 408 | | |
408 | 409 | | |
| 410 | + | |
409 | 411 | | |
410 | 412 | | |
411 | 413 | | |
| |||
467 | 469 | | |
468 | 470 | | |
469 | 471 | | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
470 | 497 | | |
471 | 498 | | |
472 | 499 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
58 | 58 | | |
59 | 59 | | |
60 | 60 | | |
| 61 | + | |
61 | 62 | | |
62 | 63 | | |
63 | 64 | | |
| |||
138 | 139 | | |
139 | 140 | | |
140 | 141 | | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
141 | 154 | | |
142 | 155 | | |
143 | 156 | | |
| |||
168 | 181 | | |
169 | 182 | | |
170 | 183 | | |
| 184 | + | |
171 | 185 | | |
172 | 186 | | |
173 | 187 | | |
| |||
181 | 195 | | |
182 | 196 | | |
183 | 197 | | |
| 198 | + | |
184 | 199 | | |
185 | 200 | | |
186 | 201 | | |
| |||
200 | 215 | | |
201 | 216 | | |
202 | 217 | | |
| 218 | + | |
203 | 219 | | |
204 | 220 | | |
205 | 221 | | |
| |||
328 | 344 | | |
329 | 345 | | |
330 | 346 | | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
331 | 362 | | |
332 | 363 | | |
333 | 364 | | |
| |||
337 | 368 | | |
338 | 369 | | |
339 | 370 | | |
| 371 | + | |
340 | 372 | | |
341 | 373 | | |
342 | 374 | | |
| |||
534 | 566 | | |
535 | 567 | | |
536 | 568 | | |
| 569 | + | |
537 | 570 | | |
538 | 571 | | |
539 | 572 | | |
| |||
0 commit comments