Skip to content

Catalyst Expressions

Expression is an extension of the TreeNode abstraction for executable expressions (in the Catalyst Tree Manipulation Framework).

Expression is an executable node that can be evaluated and produce a JVM object (for an InternalRow) in the faster code-generated or the slower interpreted modes.

Contract

DataType

dataType: DataType

The DataType of the result of evaluating this expression

ExprCode

doGenCode(
  ctx: CodegenContext,
  ev: ExprCode): ExprCode

Code-generated expression evaluation that generates a Java source code (that is used to evaluate the expression in a more optimized way and skipping eval).

Used when:

Interpreted Expression Evaluation

eval(
  input: InternalRow = null): Any

Interpreted (non-code-generated) expression evaluation that evaluates this expression to a JVM object for a given InternalRow (and skipping generating a corresponding Java code)

eval is a slower "relative" of the code-generated (non-interpreted) expression evaluation

nullable

nullable: Boolean

Implementations

BinaryExpression

LeafExpression

TernaryExpression

Other Expressions

Code-Generated (Non-Interpreted) Expression Evaluation

genCode(
  ctx: CodegenContext): ExprCode

genCode generates a Java source code for code-generated (non-interpreted) expression evaluation (on an input InternalRow.

Similar to doGenCode but supports expression reuse using Subexpression Elimination.

genCode is a faster "relative" of the interpreted (non-code-generated) expression evaluation.

reduceCodeSize

reduceCodeSize(
  ctx: CodegenContext,
  eval: ExprCode): Unit

reduceCodeSize does its work only when all of the following are met:

  1. Length of the generated code is above spark.sql.codegen.methodSplitThreshold

  2. INPUT_ROW (of the input CodegenContext) is defined

  3. currentVars (of the input CodegenContext) is not defined

This needs your help

FIXME When would the above not be met? What's so special about such an expression?

reduceCodeSize sets the value of the input ExprCode to the fresh term name for the value name.

In the end, reduceCodeSize sets the code of the input ExprCode to the following:

[javaType] [newValue] = [funcFullName]([INPUT_ROW]);

The funcFullName is the fresh term name for the name of the current expression node.

deterministic Flag

Expression is deterministic when evaluates to the same result for the same input(s). An expression is deterministic if all the child expressions are.

Note

A deterministic expression is like a pure function in functional programming languages.

val e = $"a".expr

import org.apache.spark.sql.catalyst.expressions.Expression
assert(e.isInstanceOf[Expression])
assert(e.deterministic)

Demo

// evaluating an expression
// Use Literal expression to create an expression from a Scala object
import org.apache.spark.sql.catalyst.expressions.{Expression, Literal}
val e: Expression = Literal("hello")

import org.apache.spark.sql.catalyst.expressions.EmptyRow
val v: Any = e.eval(EmptyRow)

// Convert to Scala's String
import org.apache.spark.unsafe.types.UTF8String
val s = v.asInstanceOf[UTF8String].toString
assert(s == "hello")
Back to top