Skip to content

ProjectExec Unary Physical Operator

ProjectExec is a unary physical operator that...FIXME

ProjectExec supports Java code generation (aka codegen).

ProjectExec is <> when:


The following is the order of applying the above execution planning strategies to logical query plans when SparkPlanner or hive/[Hive-specific SparkPlanner] are requested to catalyst/[plan a logical query plan into one or more physical query plans]:

  1. HiveTableScans
  2. FileSourceStrategy
  3. DataSourceStrategy
  4. InMemoryScans
  5. BasicOperators

=== [[doExecute]] Executing Physical Operator (Generating RDD[InternalRow]) -- doExecute Method

[source, scala]

doExecute(): RDD[InternalRow]

doExecute is part of the SparkPlan abstraction.

doExecute requests the input <> to[produce an RDD of internal rows] and applies a <> (using RDD.mapPartitionsWithIndexInternal).

.RDD.mapPartitionsWithIndexInternal [source, scala]


==== [[doExecute-mapPartitionsWithIndexInternal]] Inside doExecute (RDD.mapPartitionsWithIndexInternal)

Inside the function (that is part of RDD.mapPartitionsWithIndexInternal), doExecute creates an UnsafeProjection with the following:

. <>

. catalyst/[Output] of the <> physical operator as the input schema

.[subexpressionEliminationEnabled] flag

doExecute requests the UnsafeProjection to[initialize] and maps over the internal rows (of a partition) using the projection.

=== [[creating-instance]] Creating ProjectExec Instance

ProjectExec takes the following when created:

  • [[projectList]] expressions/[NamedExpressions] for the projection
  • [[child]] Child[physical operator]

=== [[doConsume]] Generating Java Source Code for Consume Path in Whole-Stage Code Generation -- doConsume Method

[source, scala]

doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String


doConsume is part of the CodegenSupport abstraction.

Last update: 2021-02-18