Skip to content

Catalyst Tree Manipulation Framework

Catalyst is an execution-agnostic framework to represent and manipulate a dataflow graph as trees of relational operators and expressions.


The Catalyst framework was introduced in [SPARK-1251] Support for optimizing and executing structured queries (and became part of Apache Spark on 20/Mar/14).

Spark 2.0 uses the Catalyst tree manipulation framework to build an extensible query plan optimizer with a number of query optimizations.

Catalyst supports both rule-based and cost-based optimizations.

Last update: 2020-07-20
Back to top