EnsureRequirements Physical Optimization¶
EnsureRequirements is a physical query optimization (aka physical query preparation rule or simply preparation rule) that
QueryExecution uses to optimize the physical plan of a structured query by transforming the following physical operators (up the plan tree):
. Removes two adjacent ShuffleExchangeExec.md[ShuffleExchangeExec] physical operators if the child partitioning scheme guarantees the parent's partitioning
. For other non-
ShuffleExchangeExec physical operators, <
EnsureRequirements is just a catalyst/Rule.md[Catalyst rule] for transforming SparkPlan.md[physical query plans], i.e.
EnsureRequirements is part of preparations batch of physical query plan rules and is executed when
QueryExecution is requested for the optimized physical query plan (i.e. in executedPlan phase of a query execution).
EnsureRequirements takes a SQLConf when created.
val q = ??? // FIXME val sparkPlan = q.queryExecution.sparkPlan
import org.apache.spark.sql.execution.exchange.EnsureRequirements val plan = EnsureRequirements(spark.sessionState.conf).apply(sparkPlan)
createPartitioning Internal Method
defaultNumPreShufflePartitions Internal Method
Enforcing Partition Requirements (Distribution and Ordering) of Physical Operator¶
ensureDistributionAndOrdering( operator: SparkPlan): SparkPlan
ensureDistributionAndOrdering takes the following from the input physical
SparkPlan.md#requiredChildDistribution[required partition requirements] for the children
SparkPlan.md#requiredChildOrdering[required sort ordering] per the required partition requirements per child
child physical plans
NOTE: The number of requirements for partitions and their sort ordering has to match the number and the order of the child physical plans.
ensureDistributionAndOrdering matches the operator's required partition requirements of children (
requiredChildDistributions) to the children's SparkPlan.md#outputPartitioning[output partitioning] and (in that order):
. If the child satisfies the requested distribution, the child is left unchanged
. For BroadcastDistribution, the child becomes the child of BroadcastExchangeExec.md[BroadcastExchangeExec] unary operator for BroadcastHashJoinExec.md[broadcast hash joins]
. Any other pair of child and distribution leads to ShuffleExchangeExec.md[ShuffleExchangeExec] unary physical operator (with proper <
NOTE: ShuffleExchangeExec.md[ShuffleExchangeExec] can appear in the physical plan when the children's output partitioning cannot satisfy the physical operator's required child distribution.
If the input
operator has multiple children and specifies child output distributions, then the children's SparkPlan.md#outputPartitioning[output partitionings] have to be compatible.
If the children's output partitionings are not all compatible, then...FIXME
NOTE: At this point in
ensureDistributionAndOrdering the required child distributions are already handled.
ensureDistributionAndOrdering matches the operator's required sort ordering of children (
requiredChildOrderings) to the children's SparkPlan.md#outputPartitioning[output partitioning] and if the orderings do not match, SortExec.md#creating-instance[SortExec] unary physical operator is created as a new child.
In the end,
ensureDistributionAndOrdering sets the new children for the input
ensureDistributionAndOrdering is used when
EnsureRequirements physical optimization is executed.
reorderJoinPredicates Internal Method
reorderJoinPredicates(plan: SparkPlan): SparkPlan¶
reorderJoinPredicates is used when...FIXME