Skip to content

HashJoin — Hash-Based Join Physical Operators

HashJoin is an extension of the BaseJoinExec abstraction for hash-based join physical operators with support for Java code generation.

Contract

buildSide

buildSide: BuildSide

Preparing HashedRelation

prepareRelation(
  ctx: CodegenContext): HashedRelationInfo

Used when:

Implementations

join

join(
  streamedIter: Iterator[InternalRow],
  hashed: HashedRelation,
  numOutputRows: SQLMetric): Iterator[InternalRow]

join branches off per JoinType to create an joined rows iterator (off the rows from the input streamedIter and hashed):

join creates a result projection.

In the end, for every row in the joined rows iterator join increments the input numOutputRows SQL metric and applies the result projection.

join reports an IllegalArgumentException for unsupported JoinType:

HashJoin should not take [joinType] as the JoinType

join is used when:

Generating Java Code for Anti Join

codegenAnti(
  ctx: CodegenContext,
  input: Seq[ExprCode]): String

codegenAnti...FIXME

codegenAnti is used when:


Last update: 2021-05-08