Skip to content

PartitionPruning Logical Optimization

PartitionPruning is a logical optimization for Dynamic Partition Pruning.

PartitionPruning is a Rule[LogicalPlan] (a rule for logical operators).

PartitionPruning is part of the PartitionPruning batch of the SparkOptimizer.

Executing Rule

  plan: LogicalPlan): LogicalPlan

For Subquery operators that are correlated, apply simply does nothing and gives it back unmodified.

apply does nothing when the spark.sql.optimizer.dynamicPartitionPruning.enabled configuration property is disabled (false).

For all other cases, apply applies prune optimization.

apply is part of the Rule abstraction.

prune Internal Method

  plan: LogicalPlan): LogicalPlan

prune transforms up all logical operators in the given logical query plan.

prune leaves Join operators unmodified when either operators are Filters with DynamicPruningSubquery condition.

prune transforms Join operators of the following "shape":

  1. The join condition is defined and of type EqualTo (=)

  2. Any expressions are attributes of a LogicalRelation over a HadoopFsRelation

  3. The join type is one of Inner, LeftSemi, RightOuter, LeftOuter

More Work Needed

prune needs more love and would benefit from more insight on how it works.

prune is used when PartitionPruning is executed.

Last update: 2020-11-07