Skip to content

Commit de9b800

Browse files
committed
Add upgrade guide for partition_statistics signature change
1 parent 6ee8895 commit de9b800

1 file changed

Lines changed: 90 additions & 0 deletions

File tree

  • docs/source/library-user-guide/upgrading

docs/source/library-user-guide/upgrading/54.0.0.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -380,3 +380,93 @@ impl Default for MyTreeNode {
380380
}
381381
}
382382
```
383+
384+
### `ExecutionPlan::partition_statistics` now accepts a `StatisticsContext`
385+
386+
`ExecutionPlan::partition_statistics` now takes an additional
387+
`ctx: &StatisticsContext` parameter that carries pre-computed child statistics
388+
and additional context for statistics computation.
389+
390+
**Before:**
391+
392+
```rust,ignore
393+
fn partition_statistics(&self, partition: Option<usize>) -> Result<Arc<Statistics>> {
394+
// Leaf node
395+
Ok(Arc::new(Statistics::new_unknown(&self.schema())))
396+
}
397+
```
398+
399+
**After:**
400+
401+
```rust,ignore
402+
fn partition_statistics(
403+
&self,
404+
partition: Option<usize>,
405+
_ctx: &StatisticsContext,
406+
) -> Result<Arc<Statistics>> {
407+
// Leaf node: ignore ctx, return own stats
408+
Ok(Arc::new(Statistics::new_unknown(&self.schema())))
409+
}
410+
```
411+
412+
**Who is affected:**
413+
414+
- Users who implement custom `ExecutionPlan` nodes
415+
- Users who call `partition_statistics` directly
416+
417+
**Migration guide:**
418+
419+
For **implementations**, add the `ctx: &StatisticsContext` parameter. Leaf nodes
420+
that do not have children can use `_ctx` (ignored). Non-leaf nodes that
421+
previously called `self.input.partition_statistics(partition)?` to obtain child
422+
statistics can use `ctx.child_stats()[0]` instead (or `ctx.child_stats()[i]`
423+
for multi-child operators like joins):
424+
425+
```rust,ignore
426+
// Before (non-leaf):
427+
fn partition_statistics(&self, partition: Option<usize>) -> Result<Arc<Statistics>> {
428+
let child_stats = self.input.partition_statistics(partition)?;
429+
// ... transform child_stats ...
430+
}
431+
432+
// After (non-leaf):
433+
fn partition_statistics(
434+
&self,
435+
_partition: Option<usize>,
436+
ctx: &StatisticsContext,
437+
) -> Result<Arc<Statistics>> {
438+
let child_stats = Arc::clone(&ctx.child_stats()[0]);
439+
// ... transform child_stats ...
440+
}
441+
```
442+
443+
Operators that **merge or repartition** their input (e.g., coalesce, sort
444+
without partition preservation, sort-preserving merge) always need overall
445+
child statistics regardless of which output partition is requested. These
446+
operators should call `compute_statistics` with `None` on the relevant
447+
child instead of using `ctx.child_stats()`:
448+
449+
```rust,ignore
450+
// Operator that merges all input partitions into one:
451+
fn partition_statistics(
452+
&self,
453+
_partition: Option<usize>,
454+
_ctx: &StatisticsContext,
455+
) -> Result<Arc<Statistics>> {
456+
compute_statistics(self.input.as_ref(), None)
457+
}
458+
```
459+
460+
For **callers**, replace direct calls with `compute_statistics`, which walks
461+
the plan tree bottom-up and threads child statistics through the context
462+
automatically:
463+
464+
```rust,ignore
465+
use datafusion_physical_plan::compute_statistics;
466+
467+
// Before:
468+
let stats = plan.partition_statistics(None)?;
469+
470+
// After:
471+
let stats = compute_statistics(plan.as_ref(), None)?;
472+
```

0 commit comments

Comments
 (0)