Skip to content

SparkSqlParser — Default SQL Parser

SparkSqlParser is a SQL parser to extract Catalyst expressions, plans, table identifiers from SQL texts using SparkSqlAstBuilder (as AstBuilder).

SparkSqlParser is the initial SQL parser in a SparkSession.

SparkSqlParser supports variable substitution.

SparkSqlParser is used to parse table strings into their corresponding table identifiers in the following:

Creating Instance

SparkSqlParser takes the following to be created:

SparkSqlParser is created when:

  • BaseSessionStateBuilder is requested for a SQL parser

  • expr standard function is used

SparkSqlAstBuilder

SparkSqlParser uses SparkSqlAstBuilder (as AstBuilder).

Accessing SparkSqlParser

SparkSqlParser is available as SessionState.sqlParser (unless...FIXME(note)).

import org.apache.spark.sql.SparkSession
assert(spark.isInstanceOf[SparkSession])

import org.apache.spark.sql.catalyst.parser.ParserInterface
val p = spark.sessionState.sqlParser
assert(p.isInstanceOf[ParserInterface])

import org.apache.spark.sql.execution.SparkSqlParser
assert(spark.sessionState.sqlParser.isInstanceOf[SparkSqlParser])

Translating SQL Statements to Logical Operators

SparkSqlParser is used in SparkSession.sql to translate a SQL text to a logical operator.

Translating SQL Statements to Column API

SparkSqlParser is used to translate an expression to the corresponding Column in the following:

scala> expr("token = 'hello'")
16/07/07 18:32:53 INFO SparkSqlParser: Parsing command: token = 'hello'
res0: org.apache.spark.sql.Column = (token = hello)

Variable Substitution

SparkSqlParser creates a VariableSubstitution when created

Logging

Enable ALL logging level for org.apache.spark.sql.execution.SparkSqlParser logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.sql.execution.SparkSqlParser=ALL

Refer to Logging.


Last update: 2021-02-18