Skip to content

HashJoin — Hash-Based Join Physical Operators

HashJoin is an abstraction of hash-based join physical operators.

Contract

leftKeys

leftKeys: Seq[Expression]

rightKeys

rightKeys: Seq[Expression]

joinType

joinType: JoinType

buildSide

buildSide: BuildSide

condition

condition: Option[Expression]

left

left: SparkPlan
right: SparkPlan

Implementations

join Method

join(
  streamedIter: Iterator[InternalRow],
  hashed: HashedRelation,
  numOutputRows: SQLMetric): Iterator[InternalRow]

join branches off per joinType to create a join iterator of internal rows (Iterator[InternalRow]) for the input streamedIter and hashed:

join requests TaskContext to add a TaskCompletionListener to update the input avg hash probe SQL metric. The TaskCompletionListener is executed on a task completion (regardless of the task status: success, failure, or cancellation) and uses getAverageProbesPerLookup from the input hashed to set the input avg hash probe.

join createResultProjection.

In the end, for every row in the join iterator of internal rows join increments the input numOutputRows SQL metric and applies the result projection.

join reports a IllegalArgumentException when the joinType is not supported:

[x] JoinType is not supported

join is used when BroadcastHashJoinExec and ShuffledHashJoinExec are executed.


Last update: 2020-08-24