Skip to content

LogicalPlan — Logical Relational Operators of Structured Query

LogicalPlan is an extension of the QueryPlan abstraction for logical operators to build a logical query plan (as a tree of logical operators).

A logical query plan is a tree of nodes of logical operators that in turn can have (trees of) Catalyst expressions. In other words, there are at least two trees at every level (operator).

LogicalPlan is eventually resolved (transformed) to a physical operator.

Implementations

BinaryNode

Logical operators with two child logical operators

Command

Command

LeafNode

LeafNode is a logical operator with no child operators

UnaryNode

Logical operators with a single child logical operator

Other Logical Operators

Statistics Cache

Cached plan statistics (as Statistics) of the LogicalPlan

Computed and cached in stats

Used in stats and verboseStringWithSuffix

Reset in invalidateStatsCache

Estimated Statistics

stats(
  conf: CatalystConf): Statistics

stats returns the <> or <> (and caches it as <>).

stats is used when:

  • A LogicalPlan <Statistics>>
  • QueryExecution is requested to build a complete text representation
  • JoinSelection checks whether a plan can be broadcast et al
  • CostBasedJoinReorder.md[CostBasedJoinReorder] attempts to reorder inner joins
  • LimitPushDown is LimitPushDown.md#apply[executed] (for spark-sql-joins.md#FullOuter[FullOuter] join)
  • AggregateEstimation estimates Statistics
  • FilterEstimation estimates child Statistics
  • InnerOuterEstimation estimates Statistics of the left and right sides of a join
  • LeftSemiAntiEstimation estimates Statistics
  • ProjectEstimation estimates Statistics

Refreshing Child Logical Operators

refresh(): Unit

refresh calls itself recursively for every child logical operator.

Note

refresh is overriden by LogicalRelation only (that refreshes the location of HadoopFsRelation relations only).

refresh is used when:

resolveQuoted

resolveQuoted(
  name: String,
  resolver: Resolver): Option[NamedExpression]

resolveQuoted...FIXME

resolveQuoted is used when...FIXME

Resolving Column Attributes to References in Query Plan

resolve(
  schema: StructType,
  resolver: Resolver): Seq[Attribute]
resolve(
  nameParts: Seq[String],
  resolver: Resolver): Option[NamedExpression]
resolve(
  nameParts: Seq[String],
  input: Seq[Attribute],
  resolver: Resolver): Option[NamedExpression]  // <1>
<1> A protected method

resolve...FIXME

resolve is used when...FIXME

Accessing Logical Query Plan of Structured Query

In order to get the logical plan of a structured query you should use the <>.

scala> :type q
org.apache.spark.sql.Dataset[Long]

val plan = q.queryExecution.logical
scala> :type plan
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan

LogicalPlan goes through execution stages (as a QueryExecution). In order to convert a LogicalPlan to a QueryExecution you should use SessionState and request it to "execute" the plan.

scala> :type spark
org.apache.spark.sql.SparkSession

// You could use Catalyst DSL to create a logical query plan
scala> :type plan
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan

val qe = spark.sessionState.executePlan(plan)
scala> :type qe
org.apache.spark.sql.execution.QueryExecution

Maximum Number of Records

maxRows: Option[Long]

maxRows is undefined by default (None).

maxRows is used when LogicalPlan is requested for maxRowsPerPartition.

Maximum Number of Records per Partition

maxRowsPerPartition: Option[Long]

maxRowsPerPartition is exactly the maximum number of records by default.

maxRowsPerPartition is used when LimitPushDown logical optimization is executed.

Executing Logical Plan

A common idiom in Spark SQL to make sure that a logical plan can be analyzed is to request a SparkSession for the SessionState that is in turn requested to "execute" the logical plan (which simply creates a QueryExecution).

scala> :type plan
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan

val qe = sparkSession.sessionState.executePlan(plan)
qe.assertAnalyzed()
// the following gives the analyzed logical plan
// no exceptions are expected since analysis went fine
val analyzedPlan = qe.analyzed

Converting Logical Plan to Dataset

Another common idiom in Spark SQL to convert a LogicalPlan into a Dataset is to use Dataset.ofRows internal method that "executes" the logical plan followed by creating a Dataset with the QueryExecution and RowEncoder.

childrenResolved

childrenResolved: Boolean

A logical operator is considered partially resolved when its child operators are resolved (aka children resolved).

resolved

resolved: Boolean

A logical operator is (fully) resolved to a specific schema when all catalyst/QueryPlan.md#expressions[expressions] and the <>.

scala> plan.resolved
res2: Boolean = true

Last update: 2020-11-16