Skip to content

Catalyst Tree Manipulation Framework

Catalyst is an execution-agnostic framework to represent and manipulate a dataflow graph as trees of relational operators and expressions.

Note

The Catalyst framework was introduced in [SPARK-1251] Support for optimizing and executing structured queries (and became part of Apache Spark on 20/Mar/14).

Spark 2.0 uses the Catalyst tree manipulation framework to build an extensible query plan optimizer with a number of query optimizations.

Catalyst supports both rule-based and cost-based optimizations.


Last update: 2020-07-20
Back to top