DataSourceScanExec -- Leaf Physical Operators to Scan Over BaseRelation¶
DataSourceScanExec
is the <
NOTE: There are two <
DataSourceScanExec
supports Java code generation (aka codegen)
[[contract]] [source, scala]
package org.apache.spark.sql.execution
trait DataSourceScanExec extends LeafExecNode with CodegenSupport { // only required vals and methods that have no implementation // the others follow def metadata: Map[String, String] val relation: BaseRelation val tableIdentifier: Option[TableIdentifier] }
.(Subset of) DataSourceScanExec Contract [cols="1,2",options="header",width="100%"] |=== | Property | Description
| metadata
| [[metadata]] Metadata (as a collection of key-value pairs) that describes the scan when requested for the <
| relation
| [[relation]] BaseRelation that is used in the <
| tableIdentifier
| [[tableIdentifier]] Optional TableIdentifier
|===
NOTE: The prefix for variable names for DataSourceScanExec
operators in a generated Java source code is scan.
[[nodeNamePrefix]] The default node name prefix is an empty string (that is used in the <
[[nodeName]] DataSourceScanExec
uses the <
Scan [relation] [tableIdentifier]
[[implementations]] .DataSourceScanExecs [width="100%",cols="1,2",options="header"] |=== | DataSourceScanExec | Description
| FileSourceScanExec.md[FileSourceScanExec] | [[FileSourceScanExec]]
| RowDataSourceScanExec.md[RowDataSourceScanExec] | [[RowDataSourceScanExec]] |===
=== [[simpleString]] Simple (Basic) Text Node Description (in Query Plan Tree) -- simpleString
Method
[source, scala]¶
simpleString: String¶
NOTE: simpleString
is part of catalyst/QueryPlan.md#simpleString[QueryPlan Contract] to give the simple text description of a TreeNode
in a query plan tree.
simpleString
creates a text representation of every key-value entry in the <
Internally, simpleString
sorts the <:
). While doing so, simpleString
<
simpleString
uses Spark Core's Utils
to truncatedString
.
In the end, simpleString
returns a text representation that is made up of the <
[nodeNamePrefix][nodeName][[output]][metadata]
[source, scala]¶
val scanExec = basicDataSourceScanExec scala> println(scanExec.simpleString) Scan line143.readiwiwiwiwiwiwiwiw$anon1@57d94b26 [] PushedFilters: [], ReadSchema: struct<>
def basicDataSourceScanExec = { import org.apache.spark.sql.catalyst.expressions.AttributeReference val output = Seq.empty[AttributeReference] val requiredColumnsIndex = output.indices import org.apache.spark.sql.sources.Filter val filters, handledFilters = Set.empty[Filter] import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.expressions.UnsafeRow val row: InternalRow = new UnsafeRow(0) val rdd: RDD[InternalRow] = sc.parallelize(row :: Nil)
import org.apache.spark.sql.sources.{BaseRelation, TableScan} val baseRelation: BaseRelation = new BaseRelation with TableScan { import org.apache.spark.sql.SQLContext val sqlContext: SQLContext = spark.sqlContext
import org.apache.spark.sql.types.StructType
val schema: StructType = new StructType()
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.Row
def buildScan(): RDD[Row] = ???
}
val tableIdentifier = None import org.apache.spark.sql.execution.RowDataSourceScanExec RowDataSourceScanExec( output, requiredColumnsIndex, filters, handledFilters, rdd, baseRelation, tableIdentifier) }
=== [[verboseString]] verboseString
Method
[source, scala]¶
verboseString: String¶
NOTE: verboseString
is part of catalyst/QueryPlan.md#verboseString[QueryPlan Contract] to...FIXME.
verboseString
simply returns the <QueryPlan
).
Text Representation of All Nodes in Tree¶
treeString(
verbose: Boolean,
addSuffix: Boolean): String
treeString
simply returns the <TreeNode
).
treeString
is part of the TreeNode abstraction.
=== [[redact]] Redacting Sensitive Information -- redact
Internal Method
[source, scala]¶
redact(text: String): String¶
redact
...FIXME
NOTE: redact
is used when DataSourceScanExec
is requested for the <