Skip to content

DemoteBroadcastHashJoin Logical Optimization

DemoteBroadcastHashJoin is a logical optimization in Adaptive Query Execution to transform Join logical operators (with no join hints).

Quoting What's new in Apache Spark 3.0 - demote broadcast hash join article:

This rule checks whether the nodes involved in the join have a lot of empty partitions. If it's the case, the rule adds a no broadcast hash join hint to prevent the broadcast strategy to be applied.

DemoteBroadcastHashJoin uses spark.sql.adaptive.nonEmptyPartitionRatioForBroadcastJoin configuration property for the threshold to demote a broadcast hash join.

DemoteBroadcastHashJoin is a Catalyst rule for transforming logical plans (Rule[LogicalPlan]).

Creating Instance

DemoteBroadcastHashJoin takes no arguments to be created.

DemoteBroadcastHashJoin is created when:

  • AQEOptimizer is requested for the batches

Executing Rule

apply(
  plan: LogicalPlan): LogicalPlan

apply is part of the Rule abstraction.

apply transforms Join logical operators with no JoinStrategyHint hints.

apply checks whether to demote or not for the left first and then for the right side of the join operator. If so, apply registers NO_BROADCAST_HASH join hint with the join operator.

shouldDemote

shouldDemote(
  plan: LogicalPlan): Boolean

shouldDemote supports LogicalQueryStage logical operators with ShuffleQueryStageExec physical operators only. For any other logical operators shouldDemote is false.

shouldDemote makes sure that the result and MapOutputStatistics of the ShuffleQueryStageExec operator are available. Otherwise, shouldDemote is false.

shouldDemote is true when the ratio of the non-empty partitions to all the partitions (based on the MapOutputStatistics) is below spark.sql.adaptive.nonEmptyPartitionRatioForBroadcastJoin configuration property.


Last update: 2021-05-02
Back to top