HashJoin — Hash-Based Join Physical Operators¶
HashJoin
is an abstraction of hash-based join physical operators.
Contract¶
leftKeys¶
leftKeys: Seq[Expression]
rightKeys¶
rightKeys: Seq[Expression]
joinType¶
joinType: JoinType
buildSide¶
buildSide: BuildSide
condition¶
condition: Option[Expression]
left¶
left: SparkPlan
right¶
right: SparkPlan
Implementations¶
join Method¶
join(
streamedIter: Iterator[InternalRow],
hashed: HashedRelation,
numOutputRows: SQLMetric): Iterator[InternalRow]
join
branches off per joinType to create a join iterator of internal rows (Iterator[InternalRow]
) for the input streamedIter
and hashed
:
-
outerJoin for a LeftOuter or a RightOuter join
-
existenceJoin for a ExistenceJoin join
join
requests TaskContext
to add a TaskCompletionListener
to update the input avg hash probe SQL metric. The TaskCompletionListener
is executed on a task completion (regardless of the task status: success, failure, or cancellation) and uses getAverageProbesPerLookup from the input hashed
to set the input avg hash probe.
join
createResultProjection.
In the end, for every row in the join iterator of internal rows join
increments the input numOutputRows
SQL metric and applies the result projection.
join
reports a IllegalArgumentException
when the joinType is not supported:
[x] JoinType is not supported
join
is used when BroadcastHashJoinExec and ShuffledHashJoinExec are executed.