DataSourceV2ScanExec Leaf Physical Operator¶
Warning
As of this commit DataSourceV2ScanExec is no longer available in Spark 3.0.0 and the page will soon be removed (once DataSourceV2ScanExecBase takes over).
DataSourceV2ScanExec
is a leaf physical operator that represents a DataSourceV2Relation logical operator at execution time.
DataSourceV2ScanExec
supports ColumnarBatchScan with vectorized batch decoding..
[[inputRDDs]] DataSourceV2ScanExec
gives the single <WholeStageCodegenExec
physical operator is WholeStageCodegenExec.md#doExecute[executed]).
Creating Instance¶
DataSourceV2ScanExec
takes the following to be created:
- [[output]] Output schema (as a collection of
AttributeReferences
) - [[reader]] FIXME
DataSourceV2ScanExec
is <
=== [[doExecute]] Executing Physical Operator (Generating RDD[InternalRow]) -- doExecute
Method
[source, scala]¶
doExecute(): RDD[InternalRow]¶
doExecute
...FIXME
doExecute
is part of the SparkPlan abstraction.
=== [[internal-properties]] Internal Properties
[cols="30m,70",options="header",width="100%"] |=== | Name | Description
| batchPartitions a| [[batchPartitions]] Input partitions of ColumnarBatches (Seq[InputPartition[ColumnarBatch]]
)
| partitions a| [[partitions]] Input partitions of InternalRows (Seq[InputPartition[InternalRow]]
)
|===