ColumnarToRowExec Physical Operator

ColumnarToRowExec is a unary physical operator for Columnar Processing.

ColumnarToRowExec supports Whole-Stage Java Code Generation.

Creating Instance

ColumnarToRowExec takes the following to be created:

ColumnarToRowExec requires that the child physical operator supportsColumnar.

ColumnarToRowExec is created when ApplyColumnarRulesAndInsertTransitions physical optimization is executed.

Performance Metrics

Key Name (in web UI) Description
numInputBatches number of input batches Number of input batches
numOutputRows number of output rows Number of output rows (across all input batches)

Executing Physical Operator

doExecute(): RDD[InternalRow]

doExecute is part of the SparkPlan abstraction.

doExecute requests the child physical operator to executeColumnar and RDD.mapPartitionsInternal over batches (Iterator[ColumnarBatch]) to "unpack" to rows. doExecute counts the number of batches and rows (as the metrics).

Generating Java Source Code for Produce Path

  ctx: CodegenContext): String

doProduce is part of the CodegenSupport abstraction.


Input RDDs

inputRDDs(): Seq[RDD[InternalRow]]

inputRDDs is a single RDD[ColumnarBatch] that the child physical operator gives when requested to executeColumnar.

inputRDDs is part of the CodegenSupport abstraction.

canCheckLimitNotReached Flag

canCheckLimitNotReached: Boolean

canCheckLimitNotReached is always true.

canCheckLimitNotReached is part of the CodegenSupport abstraction.

genCodeColumnVector Internal Method

  ctx: CodegenContext,
  columnVar: String,
  ordinal: String,
  dataType: DataType,
  nullable: Boolean): ExprCode


genCodeColumnVector is used when ColumnarToRowExec physical operator is requested to generate Java source code for produce path.

Last update: 2020-11-15