SparkSqlParser — Default SQL Parser¶
SparkSqlParser
is a SQL parser to extract Catalyst expressions, plans, table identifiers from SQL texts using SparkSqlAstBuilder (as AstBuilder).
SparkSqlParser
is the initial SQL parser in a SparkSession
.
SparkSqlParser
supports variable substitution.
SparkSqlParser
is used to parse table strings into their corresponding table identifiers in the following:
table
methods in DataFrameReader and SparkSession- insertInto and saveAsTable methods of
DataFrameWriter
createExternalTable
andrefreshTable
methods of Catalog (and SessionState)
Creating Instance¶
SparkSqlParser
takes the following to be created:
SparkSqlParser
is created when:
-
BaseSessionStateBuilder
is requested for a SQL parser -
expr standard function is used
SparkSqlAstBuilder¶
SparkSqlParser
uses SparkSqlAstBuilder (as AstBuilder).
Accessing SparkSqlParser¶
SparkSqlParser
is available as SessionState.sqlParser (unless...FIXME(note)).
import org.apache.spark.sql.SparkSession
assert(spark.isInstanceOf[SparkSession])
import org.apache.spark.sql.catalyst.parser.ParserInterface
val p = spark.sessionState.sqlParser
assert(p.isInstanceOf[ParserInterface])
import org.apache.spark.sql.execution.SparkSqlParser
assert(spark.sessionState.sqlParser.isInstanceOf[SparkSqlParser])
Translating SQL Statements to Logical Operators¶
SparkSqlParser
is used in SparkSession.sql to translate a SQL text to a logical operator.
Translating SQL Statements to Column API¶
SparkSqlParser
is used to translate an expression to the corresponding Column in the following:
- expr standard function
- Dataset operators: selectExpr, filter, where
scala> expr("token = 'hello'")
16/07/07 18:32:53 INFO SparkSqlParser: Parsing command: token = 'hello'
res0: org.apache.spark.sql.Column = (token = hello)
Variable Substitution¶
SparkSqlParser
creates a VariableSubstitution
when created
Logging¶
Enable ALL
logging level for org.apache.spark.sql.execution.SparkSqlParser
logger to see what happens inside.
Add the following line to conf/log4j.properties
:
log4j.logger.org.apache.spark.sql.execution.SparkSqlParser=ALL
Refer to Logging.