LogicalPlan — Logical Relational Operators of Structured Query¶
LogicalPlan is eventually resolved (transformed) to a physical operator.
Logical operators with two child logical operators
Logical operators with a single child logical operator
Other Logical Operators¶
Cached plan statistics (as
Statistics) of the
Computed and cached in stats
Reset in invalidateStatsCache
stats( conf: CatalystConf): Statistics
stats returns the <
stats is used when:
QueryExecutionis requested to build a complete text representation
JoinSelectionchecks whether a plan can be broadcast et al
- CostBasedJoinReorder.md[CostBasedJoinReorder] attempts to reorder inner joins
LimitPushDownis executed (for FullOuter join)
Statisticsof the left and right sides of a join
Refreshing Child Logical Operators¶
refresh calls itself recursively for every child logical operator.
refresh is overriden by LogicalRelation only (that refreshes the location of
HadoopFsRelation relations only).
refresh is used when:
resolveQuoted( name: String, resolver: Resolver): Option[NamedExpression]
resolveQuoted is used when...FIXME
Resolving Column Attributes to References in Query Plan¶
resolve( schema: StructType, resolver: Resolver): Seq[Attribute] resolve( nameParts: Seq[String], resolver: Resolver): Option[NamedExpression] resolve( nameParts: Seq[String], input: Seq[Attribute], resolver: Resolver): Option[NamedExpression] // <1>
resolve is used when...FIXME
Accessing Logical Query Plan of Structured Query¶
In order to get the logical plan of a structured query you should use the <
scala> :type q org.apache.spark.sql.Dataset[Long] val plan = q.queryExecution.logical scala> :type plan org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
scala> :type spark org.apache.spark.sql.SparkSession // You could use Catalyst DSL to create a logical query plan scala> :type plan org.apache.spark.sql.catalyst.plans.logical.LogicalPlan val qe = spark.sessionState.executePlan(plan) scala> :type qe org.apache.spark.sql.execution.QueryExecution
Maximum Number of Records¶
maxRows is undefined by default (
maxRows is used when
LogicalPlan is requested for maxRowsPerPartition.
Maximum Number of Records per Partition¶
maxRowsPerPartition is exactly the maximum number of records by default.
maxRowsPerPartition is used when LimitPushDown logical optimization is executed.
Executing Logical Plan¶
A common idiom in Spark SQL to make sure that a logical plan can be analyzed is to request a
SparkSession for the SessionState that is in turn requested to "execute" the logical plan (which simply creates a QueryExecution).
scala> :type plan org.apache.spark.sql.catalyst.plans.logical.LogicalPlan val qe = sparkSession.sessionState.executePlan(plan) qe.assertAnalyzed() // the following gives the analyzed logical plan // no exceptions are expected since analysis went fine val analyzedPlan = qe.analyzed
Converting Logical Plan to Dataset¶
Another common idiom in Spark SQL to convert a
LogicalPlan into a
Dataset is to use Dataset.ofRows internal method that "executes" the logical plan followed by creating a Dataset with the QueryExecution and RowEncoder.
A logical operator is considered partially resolved when its child operators are resolved (aka children resolved).
resolved is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Metadata Output Attributes¶
metadataOutput requests the children for the
is used when:
SubqueryAliasis requested for the metadataOutput
LogicalPlanis requested for the childAttributes
DataSourceV2Relationis requested to include metadata columns