Skip to content

Commit 54a0df7

Browse files
committed
Clarify enable_expression_analyzer scope in config and doc comments
1 parent 25e473a commit 54a0df7

5 files changed

Lines changed: 12 additions & 10 deletions

File tree

datafusion/common/src/config.rs

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -963,10 +963,11 @@ config_namespace! {
963963
/// So if you disable `enable_topk_dynamic_filter_pushdown`, then enable `enable_dynamic_filter_pushdown`, the `enable_topk_dynamic_filter_pushdown` will be overridden.
964964
pub enable_dynamic_filter_pushdown: bool, default = true
965965

966-
/// When set to true, the physical planner will use the ExpressionAnalyzer
966+
/// When set to true, the physical planner uses the ExpressionAnalyzer
967967
/// framework for expression-level statistics estimation (NDV, selectivity,
968-
/// min/max, null fraction). When false, existing behavior without
969-
/// expression-level statistics support is used.
968+
/// min/max, null fraction) for operators created during logical-to-physical
969+
/// translation. Optimizer-created operators fall back to built-in estimation.
970+
/// When false, existing behavior is unchanged.
970971
pub enable_expression_analyzer: bool, default = false
971972

972973
/// When set to true, the optimizer will insert filters before a join between

datafusion/physical-expr/src/projection.rs

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -203,9 +203,9 @@ impl ProjectionExprs {
203203
/// The physical planner injects the registry from `SessionState` when
204204
/// creating projections. Projections created later by optimizer rules
205205
/// do not receive the registry and fall back to
206-
/// `DefaultExpressionAnalyzer`. Propagating the registry to all
207-
/// operator construction sites requires an operator-level statistics
208-
/// registry, which is orthogonal to this work.
206+
/// `DefaultExpressionAnalyzer`. Full coverage requires an operator-level
207+
/// statistics registry (tracked in
208+
/// <https://github.com/apache/datafusion/issues/21443>).
209209
pub fn with_expression_analyzer_registry(
210210
mut self,
211211
registry: Arc<ExpressionAnalyzerRegistry>,

datafusion/physical-plan/src/filter.rs

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -187,8 +187,9 @@ impl FilterExecBuilder {
187187
/// Same limitation as [`ProjectionExprs::with_expression_analyzer_registry`]:
188188
/// the planner injects this from `SessionState`, but filters created
189189
/// by optimizer rules (e.g., filter pushdown into unions) fall back to
190-
/// the default selectivity. An operator-level statistics registry is
191-
/// needed for full coverage.
190+
/// the default selectivity. Full coverage requires an operator-level
191+
/// statistics registry (tracked in
192+
/// <https://github.com/apache/datafusion/issues/21443>).
192193
///
193194
/// [`ProjectionExprs::with_expression_analyzer_registry`]: datafusion_physical_expr::projection::ProjectionExprs::with_expression_analyzer_registry
194195
pub fn with_expression_analyzer_registry(

datafusion/sqllogictest/test_files/information_schema.slt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -441,7 +441,7 @@ datafusion.optimizer.default_filter_selectivity 20 The default filter selectivit
441441
datafusion.optimizer.enable_aggregate_dynamic_filter_pushdown true When set to true, the optimizer will attempt to push down Aggregate dynamic filters into the file scan phase.
442442
datafusion.optimizer.enable_distinct_aggregation_soft_limit true When set to true, the optimizer will push a limit operation into grouped aggregations which have no aggregate expressions, as a soft limit, emitting groups once the limit is reached, before all rows in the group are read.
443443
datafusion.optimizer.enable_dynamic_filter_pushdown true When set to true attempts to push down dynamic filters generated by operators (TopK, Join & Aggregate) into the file scan phase. For example, for a query such as `SELECT * FROM t ORDER BY timestamp DESC LIMIT 10`, the optimizer will attempt to push down the current top 10 timestamps that the TopK operator references into the file scans. This means that if we already have 10 timestamps in the year 2025 any files that only have timestamps in the year 2024 can be skipped / pruned at various stages in the scan. The config will suppress `enable_join_dynamic_filter_pushdown`, `enable_topk_dynamic_filter_pushdown` & `enable_aggregate_dynamic_filter_pushdown` So if you disable `enable_topk_dynamic_filter_pushdown`, then enable `enable_dynamic_filter_pushdown`, the `enable_topk_dynamic_filter_pushdown` will be overridden.
444-
datafusion.optimizer.enable_expression_analyzer false When set to true, the physical planner will use the ExpressionAnalyzer framework for expression-level statistics estimation (NDV, selectivity, min/max, null fraction). When false, existing behavior without expression-level statistics support is used.
444+
datafusion.optimizer.enable_expression_analyzer false When set to true, the physical planner uses the ExpressionAnalyzer framework for expression-level statistics estimation (NDV, selectivity, min/max, null fraction) for operators created during logical-to-physical translation. Optimizer-created operators fall back to built-in estimation. When false, existing behavior is unchanged.
445445
datafusion.optimizer.enable_join_dynamic_filter_pushdown true When set to true, the optimizer will attempt to push down Join dynamic filters into the file scan phase.
446446
datafusion.optimizer.enable_leaf_expression_pushdown true When set to true, the optimizer will extract leaf expressions (such as `get_field`) from filter/sort/join nodes into projections closer to the leaf table scans, and push those projections down towards the leaf nodes.
447447
datafusion.optimizer.enable_piecewise_merge_join false When set to true, piecewise merge join is enabled. PiecewiseMergeJoin is currently experimental. Physical planner will opt for PiecewiseMergeJoin when there is only one range filter.

docs/source/user-guide/configs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,7 @@ The following configuration settings are available:
143143
| datafusion.optimizer.enable_join_dynamic_filter_pushdown | true | When set to true, the optimizer will attempt to push down Join dynamic filters into the file scan phase. |
144144
| datafusion.optimizer.enable_aggregate_dynamic_filter_pushdown | true | When set to true, the optimizer will attempt to push down Aggregate dynamic filters into the file scan phase. |
145145
| datafusion.optimizer.enable_dynamic_filter_pushdown | true | When set to true attempts to push down dynamic filters generated by operators (TopK, Join & Aggregate) into the file scan phase. For example, for a query such as `SELECT * FROM t ORDER BY timestamp DESC LIMIT 10`, the optimizer will attempt to push down the current top 10 timestamps that the TopK operator references into the file scans. This means that if we already have 10 timestamps in the year 2025 any files that only have timestamps in the year 2024 can be skipped / pruned at various stages in the scan. The config will suppress `enable_join_dynamic_filter_pushdown`, `enable_topk_dynamic_filter_pushdown` & `enable_aggregate_dynamic_filter_pushdown` So if you disable `enable_topk_dynamic_filter_pushdown`, then enable `enable_dynamic_filter_pushdown`, the `enable_topk_dynamic_filter_pushdown` will be overridden. |
146-
| datafusion.optimizer.enable_expression_analyzer | false | When set to true, the physical planner will use the ExpressionAnalyzer framework for expression-level statistics estimation (NDV, selectivity, min/max, null fraction). When false, existing behavior without expression-level statistics support is used. |
146+
| datafusion.optimizer.enable_expression_analyzer | false | When set to true, the physical planner uses the ExpressionAnalyzer framework for expression-level statistics estimation (NDV, selectivity, min/max, null fraction) for operators created during logical-to-physical translation. Optimizer-created operators fall back to built-in estimation. When false, existing behavior is unchanged. |
147147
| datafusion.optimizer.filter_null_join_keys | false | When set to true, the optimizer will insert filters before a join between a nullable and non-nullable column to filter out nulls on the nullable side. This filter can add additional overhead when the file format does not fully support predicate push down. |
148148
| datafusion.optimizer.repartition_aggregations | true | Should DataFusion repartition data using the aggregate keys to execute aggregates in parallel using the provided `target_partitions` level |
149149
| datafusion.optimizer.repartition_file_min_size | 10485760 | Minimum total files size in bytes to perform file scan repartitioning. |

0 commit comments

Comments
 (0)