DemoteBroadcastHashJoin Logical Optimization¶
DemoteBroadcastHashJoin
is a logical optimization in Adaptive Query Execution to transform Join logical operators (with no join hints).
Quoting What's new in Apache Spark 3.0 - demote broadcast hash join article:
This rule checks whether the nodes involved in the join have a lot of empty partitions. If it's the case, the rule adds a no broadcast hash join hint to prevent the broadcast strategy to be applied.
DemoteBroadcastHashJoin
uses spark.sql.adaptive.nonEmptyPartitionRatioForBroadcastJoin configuration property for the threshold to demote a broadcast hash join.
DemoteBroadcastHashJoin
is a Catalyst rule for transforming logical plans (Rule[LogicalPlan]
).
Creating Instance¶
DemoteBroadcastHashJoin
takes the following to be created:
DemoteBroadcastHashJoin
is created when AdaptiveSparkPlanExec physical operator is requested for the logical optimizer.
Executing Rule¶
apply(
plan: LogicalPlan): LogicalPlan
apply
transforms Join logical operators with no JoinStrategyHint hints.
apply
checks whether to shouldDemote or not for the left first and then for the right side of the join operator. If so, apply
registers NO_BROADCAST_HASH join hint with the join operator.
apply
is part of the Rule abstraction.
shouldDemote Internal Method¶
shouldDemote(
plan: LogicalPlan): Boolean
shouldDemote
supports LogicalQueryStage logical operators with ShuffleQueryStageExec physical operators only. For any other logical operators shouldDemote
is false
.
shouldDemote
makes sure that the result and MapOutputStatistics of the ShuffleQueryStageExec
operator are available. Otherwise, shouldDemote
is false
.
shouldDemote
is true
when the ratio of the non-empty partitions to all the partitions (based on the MapOutputStatistics
) is below spark.sql.adaptive.nonEmptyPartitionRatioForBroadcastJoin configuration property.
shouldDemote
is used when DemoteBroadcastHashJoin
optimization is executed.