Skip to content

LocalTableScanExec Leaf Physical Operator

LocalTableScanExec is a leaf physical operator and producedAttributes being outputSet.

LocalTableScanExec is <> when BasicOperators execution planning strategy resolves[LocalRelation] and Spark Structured Streaming's MemoryPlan logical operators.

TIP: Read on MemoryPlan logical operator in the[Spark Structured Streaming] gitbook.

[source, scala]

val names = Seq("Jacek", "Agata").toDF("name") val optimizedPlan = names.queryExecution.optimizedPlan

scala> println(optimizedPlan.numberedTreeString) 00 LocalRelation [name#9]

// Physical plan with LocalTableScanExec operator (shown as LocalTableScan) scala> names.explain == Physical Plan == LocalTableScan [name#9]

// Going fairly've been warned

val plan = names.queryExecution.executedPlan import org.apache.spark.sql.execution.LocalTableScanExec val ltse = plan.asInstanceOf[LocalTableScanExec]

val ltseRDD = ltse.execute() scala> :type ltseRDD org.apache.spark.rdd.RDD[org.apache.spark.sql.catalyst.InternalRow]

scala> println(ltseRDD.toDebugString) (2) MapPartitionsRDD[1] at execute at :30 [] | ParallelCollectionRDD[0] at execute at :30 []

// no computation on the source dataset has really occurred yet // Let's trigger a RDD action scala> ltseRDD.first res6: org.apache.spark.sql.catalyst.InternalRow = [0,1000000005,6b6563614a]

// Low-level "show" scala> ltseRDD.foreach(println) [0,1000000005,6b6563614a] [0,1000000005,6174616741]

// High-level show scala> +-----+ | name| +-----+ |Jacek| |Agata| +-----+

[[metrics]] .LocalTableScanExec's Performance Metrics [cols="1,2,2",options="header",width="100%"] |=== | Key | Name (in web UI) | Description

| [[numOutputRows]] numOutputRows | number of output rows | |===


It appears that when no Spark job is used to execute a LocalTableScanExec the <> metric is not displayed in the web UI.

[source, scala]

val names = Seq("Jacek", "Agata").toDF("name")

// The following query gives no numOutputRows metric in web UI's Details for Query (SQL tab) scala> +-----+ | name| +-----+ |Jacek| |Agata| +-----+

// The query gives numOutputRows metric in web UI's Details for Query (SQL tab) scala> names.groupBy(length($"name")) +------------+-----+ |length(name)|count| +------------+-----+ | 5| 2| +------------+-----+

// The (type-preserving) query does also give numOutputRows metric in web UI's Details for Query (SQL tab) scala>[String].map(_.toUpperCase).show +-----+ |value| +-----+ |JACEK| |AGATA| +-----+


When <>, LocalTableScanExec...FIXME

.LocalTableScanExec in web UI (Details for Query) image::images/spark-sql-LocalTableScanExec-webui-query-details.png[align="center"]

[[internal-registries]] .LocalTableScanExec's Internal Properties [cols="1,2",options="header",width="100%"] |=== | Name | Description

| [[unsafeRows]] unsafeRows | InternalRows

[[numParallelism]] numParallelism
[[rdd]] rdd

=== [[doExecute]] Executing Physical Operator (Generating RDD[InternalRow]) -- doExecute Method

[source, scala]

doExecute(): RDD[InternalRow]


doExecute is part of the SparkPlan abstraction.

Creating Instance

LocalTableScanExec takes the following when created:

Last update: 2020-11-07