Skip to content

TreeNode — Node in Catalyst Tree

TreeNode is an abstraction of named nodes in Catalyst with zero, one or more children.



children: Seq[BaseType]

Zero, one or more child nodes of the node


simpleStringWithNodeId(): String

One-line description of this node with the node identifier

Used when TreeNode is requested to generateTreeString


  maxFields: Int): String

One-line verbose description

Used when TreeNode is requested to verboseStringWithSuffix and generateTreeString (with verbose flag enabled)


Simple Node Description

simpleString: String

simpleString gives a simple one-line description of a TreeNode.

Internally, simpleString is the <> followed by <> separated by a single white space.

simpleString is used when TreeNode is requested for <> (of child nodes) and <> (with verbose flag off).

Numbered Text Representation

numberedTreeString: String

numberedTreeString adds numbers to the <>.

numberedTreeString is used primarily for interactive debugging using <> and <> methods.

Getting n-th TreeNode in Tree (for Interactive Debugging)

  number: Int): TreeNode[_]

apply gives number-th tree node in a tree.

apply can be used for interactive debugging.

Internally, apply <> at number position or null.

Getting n-th BaseType in Tree (for Interactive Debugging)

  number: Int): BaseType

p gives number-th tree node in a tree as BaseType for interactive debugging.


p can be used for interactive debugging.

BaseType is the base type of a tree and in Spark SQL can be:

Text Representation

toString: String

toString simply returns the <>.

toString is part of Java's java.lang.Object for the string representation of an object, e.g. TreeNode.

Text Representation of All Nodes in Tree

// Turns `verbose` flag on
treeString: String
  verbose: Boolean,
  addSuffix: Boolean = false,
  maxFields: Int = SQLConf.get.maxToStringFields,
  printOperatorId: Boolean = false): String
  append: String => Unit,
  verbose: Boolean,
  addSuffix: Boolean,
  maxFields: Int,
  printOperatorId: Boolean): Unit

treeString gives the string representation of all the nodes in the TreeNode.

import org.apache.spark.sql.{functions => f}
val q = spark.range(10).withColumn("rand", f.rand())
val executedPlan = q.queryExecution.executedPlan

val output = executedPlan.treeString(verbose = true)

scala> println(output)
*(1) Project [id#0L, rand(6790207094253656854) AS rand#2]
+- *(1) Range (0, 10, step=1, splits=8)

Verbose Description with Suffix

verboseStringWithSuffix: String

verboseStringWithSuffix simply returns <>.

verboseStringWithSuffix is used when TreeNode is requested to <> (with verbose and addSuffix flags enabled).

Text Representation

  depth: Int,
  lastChildren: Seq[Boolean],
  append: String => Unit,
  verbose: Boolean,
  prefix: String = "",
  addSuffix: Boolean = false,
  maxFields: Int,
  printNodeId: Boolean): Unit


generateTreeString is used when:

Inner Child Nodes

innerChildren: Seq[TreeNode[_]]

innerChildren returns the inner nodes that should be shown as an inner nested tree of this node.

innerChildren simply returns an empty collection of TreeNodes.

innerChildren is used when TreeNode is requested to <>, <> and <>.


allChildren: Set[TreeNode[_]]

NOTE: allChildren is a Scala lazy value which is computed once when accessed and cached afterwards.


allChildren is used when...FIXME


foreach(f: BaseType => Unit): Unit

foreach applies the input function f to itself (this) first and then (recursively) to the <>.

Node Name

nodeName: String

nodeName returns the name of the class with Exec suffix removed (that is used as a naming convention for the class name of physical operators).

nodeName is used when:


  tag: TreeNodeTag[T]): Option[T]


getTagValue is used when...FIXME

Scala Definition

abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product {
  self: BaseType =>
  // ...

TreeNode is a recursive data structure that can have one or many <> that are again TreeNodes.


Read up on <: type operator in Scala in Upper Type Bounds.

Scala-specific, TreeNode is an abstract class that is the <> of Catalyst <> and <> abstract classes.

TreeNode therefore allows for building entire trees of TreeNodes, e.g. generic <> with concrete <> and physical operators that both use <> (which are TreeNodes again).

NOTE: Spark SQL uses TreeNode for <> and <> that can further be used together to build more advanced trees, e.g. Catalyst expressions can have query plans as <>.

TreeNode can itself be a node in a tree or a collection of nodes, i.e. itself and the <> nodes. Not only does TreeNode come with the <> that you may have used in[Scala Collection API] (e.g. <>, <>, <>, <>, <>), but also specialized ones for more advanced tree manipulation, e.g. <>, <>, <>, <>, <>, <>, <>, <>, <>.

TreeNode abstract type is a fairly advanced Scala type definition (at least comparing to the other Scala types in Spark) so understanding its behaviour even outside Spark might be worthwhile by itself.

Last update: 2021-05-08
Back to top