Skip to content

TreeNode — Node in Catalyst Tree

TreeNode is an abstraction of named nodes in Catalyst with zero, one or more children.

Contract

children

children: Seq[BaseType]

Zero, one or more child nodes of the node

simpleStringWithNodeId

simpleStringWithNodeId(): String

One-line description of this node with the node identifier

Used when TreeNode is requested to generateTreeString

verboseString

verboseString(
  maxFields: Int): String

One-line verbose description

Used when TreeNode is requested to verboseStringWithSuffix and generateTreeString (with verbose flag enabled)

Implementations

Simple Node Description

simpleString: String

simpleString gives a simple one-line description of a TreeNode.

Internally, simpleString is the <> followed by <> separated by a single white space.

simpleString is used when TreeNode is requested for <> (of child nodes) and <> (with verbose flag off).

Numbered Text Representation

numberedTreeString: String

numberedTreeString adds numbers to the <>.

numberedTreeString is used primarily for interactive debugging using <> and <> methods.

Getting n-th TreeNode in Tree (for Interactive Debugging)

apply(
  number: Int): TreeNode[_]

apply gives number-th tree node in a tree.

apply can be used for interactive debugging.

Internally, apply <> at number position or null.

Getting n-th BaseType in Tree (for Interactive Debugging)

p(
  number: Int): BaseType

p gives number-th tree node in a tree as BaseType for interactive debugging.

Note

p can be used for interactive debugging.

BaseType is the base type of a tree and in Spark SQL can be:

Text Representation

toString: String

toString simply returns the <>.

toString is part of Java's java.lang.Object for the string representation of an object, e.g. TreeNode.

Text Representation of All Nodes in Tree

// Turns `verbose` flag on
treeString: String
treeString(
  verbose: Boolean,
  addSuffix: Boolean = false,
  maxFields: Int = SQLConf.get.maxToStringFields,
  printOperatorId: Boolean = false): String
treeString(
  append: String => Unit,
  verbose: Boolean,
  addSuffix: Boolean,
  maxFields: Int,
  printOperatorId: Boolean): Unit

treeString gives the string representation of all the nodes in the TreeNode.

import org.apache.spark.sql.{functions => f}
val q = spark.range(10).withColumn("rand", f.rand())
val executedPlan = q.queryExecution.executedPlan

val output = executedPlan.treeString(verbose = true)

scala> println(output)
*(1) Project [id#0L, rand(6790207094253656854) AS rand#2]
+- *(1) Range (0, 10, step=1, splits=8)

Verbose Description with Suffix

verboseStringWithSuffix: String

verboseStringWithSuffix simply returns <>.

verboseStringWithSuffix is used when TreeNode is requested to <> (with verbose and addSuffix flags enabled).

Generating Text Representation of Inner and Regular Child Nodes

generateTreeString(
  depth: Int,
  lastChildren: Seq[Boolean],
  append: String => Unit,
  verbose: Boolean,
  prefix: String = "",
  addSuffix: Boolean = false,
  maxFields: Int,
  printNodeId: Boolean): Unit

generateTreeString...FIXME

generateTreeString is used when:

Inner Child Nodes

innerChildren: Seq[TreeNode[_]]

innerChildren returns the inner nodes that should be shown as an inner nested tree of this node.

innerChildren simply returns an empty collection of TreeNodes.

innerChildren is used when TreeNode is requested to <>, <> and <>.

allChildren

allChildren: Set[TreeNode[_]]

NOTE: allChildren is a Scala lazy value which is computed once when accessed and cached afterwards.

allChildren...FIXME

allChildren is used when...FIXME

foreach

foreach(f: BaseType => Unit): Unit

foreach applies the input function f to itself (this) first and then (recursively) to the <>.

nodeName

nodeName: String

nodeName returns the name of the class with Exec suffix removed (that is used as a naming convention for the class name of physical operators).

nodeName is used when TreeNode is requested for <> and <>.

getTagValue

getTagValue[T](
  tag: TreeNodeTag[T]): Option[T]

getTagValue...FIXME

getTagValue is used when...FIXME

Scala Definition

abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product {
  self: BaseType =>
  // ...
}

TreeNode is a recursive data structure that can have one or many <> that are again TreeNodes.

Tip

Read up on <: type operator in Scala in Upper Type Bounds.

Scala-specific, TreeNode is an abstract class that is the <> of Catalyst <> and <> abstract classes.

TreeNode therefore allows for building entire trees of TreeNodes, e.g. generic <> with concrete <> and physical operators that both use <> (which are TreeNodes again).

NOTE: Spark SQL uses TreeNode for <> and <> that can further be used together to build more advanced trees, e.g. Catalyst expressions can have query plans as <>.

TreeNode can itself be a node in a tree or a collection of nodes, i.e. itself and the <> nodes. Not only does TreeNode come with the <> that you may have used in https://docs.scala-lang.org/overviews/collections/overview.html[Scala Collection API] (e.g. <>, <>, <>, <>, <>), but also specialized ones for more advanced tree manipulation, e.g. <>, <>, <>, <>, <>, <>, <>, <>, <>.

TreeNode abstract type is a fairly advanced Scala type definition (at least comparing to the other Scala types in Spark) so understanding its behaviour even outside Spark might be worthwhile by itself.


Last update: 2020-08-29