@@ -380,3 +380,93 @@ impl Default for MyTreeNode {
380380 }
381381}
382382```
383+
384+ ### ` ExecutionPlan::partition_statistics ` now accepts a ` StatisticsContext `
385+
386+ ` ExecutionPlan::partition_statistics ` now takes an additional
387+ ` ctx: &StatisticsContext ` parameter that carries pre-computed child statistics
388+ and additional context for statistics computation.
389+
390+ ** Before:**
391+
392+ ``` rust,ignore
393+ fn partition_statistics(&self, partition: Option<usize>) -> Result<Arc<Statistics>> {
394+ // Leaf node
395+ Ok(Arc::new(Statistics::new_unknown(&self.schema())))
396+ }
397+ ```
398+
399+ ** After:**
400+
401+ ``` rust,ignore
402+ fn partition_statistics(
403+ &self,
404+ partition: Option<usize>,
405+ _ctx: &StatisticsContext,
406+ ) -> Result<Arc<Statistics>> {
407+ // Leaf node: ignore ctx, return own stats
408+ Ok(Arc::new(Statistics::new_unknown(&self.schema())))
409+ }
410+ ```
411+
412+ ** Who is affected:**
413+
414+ - Users who implement custom ` ExecutionPlan ` nodes
415+ - Users who call ` partition_statistics ` directly
416+
417+ ** Migration guide:**
418+
419+ For ** implementations** , add the ` ctx: &StatisticsContext ` parameter. Leaf nodes
420+ that do not have children can use ` _ctx ` (ignored). Non-leaf nodes that
421+ previously called ` self.input.partition_statistics(partition)? ` to obtain child
422+ statistics can use ` ctx.child_stats()[0] ` instead (or ` ctx.child_stats()[i] `
423+ for multi-child operators like joins):
424+
425+ ``` rust,ignore
426+ // Before (non-leaf):
427+ fn partition_statistics(&self, partition: Option<usize>) -> Result<Arc<Statistics>> {
428+ let child_stats = self.input.partition_statistics(partition)?;
429+ // ... transform child_stats ...
430+ }
431+
432+ // After (non-leaf):
433+ fn partition_statistics(
434+ &self,
435+ _partition: Option<usize>,
436+ ctx: &StatisticsContext,
437+ ) -> Result<Arc<Statistics>> {
438+ let child_stats = Arc::clone(&ctx.child_stats()[0]);
439+ // ... transform child_stats ...
440+ }
441+ ```
442+
443+ Operators that ** merge or repartition** their input (e.g., coalesce, sort
444+ without partition preservation, sort-preserving merge) always need overall
445+ child statistics regardless of which output partition is requested. These
446+ operators should call ` compute_statistics ` with ` None ` on the relevant
447+ child instead of using ` ctx.child_stats() ` :
448+
449+ ``` rust,ignore
450+ // Operator that merges all input partitions into one:
451+ fn partition_statistics(
452+ &self,
453+ _partition: Option<usize>,
454+ _ctx: &StatisticsContext,
455+ ) -> Result<Arc<Statistics>> {
456+ compute_statistics(self.input.as_ref(), None)
457+ }
458+ ```
459+
460+ For ** callers** , replace direct calls with ` compute_statistics ` , which walks
461+ the plan tree bottom-up and threads child statistics through the context
462+ automatically:
463+
464+ ``` rust,ignore
465+ use datafusion_physical_plan::compute_statistics;
466+
467+ // Before:
468+ let stats = plan.partition_statistics(None)?;
469+
470+ // After:
471+ let stats = compute_statistics(plan.as_ref(), None)?;
472+ ```
0 commit comments