Skip to content

CatalystSqlParser — Parser for DataTypes and StructTypes

CatalystSqlParser is an AbstractSqlParser that uses AstBuilder for parsing SQL statements.

import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
import org.apache.spark.sql.internal.SQLConf
val catalystSqlParser = new CatalystSqlParser(SQLConf.get)
scala> :type catalystSqlParser.astBuilder
org.apache.spark.sql.catalyst.parser.AstBuilder

CatalystSqlParser is used to translate DataTypes from their canonical string representation (e.g. when adding fields to a schema or casting column to a different data type) or StructTypes.

import org.apache.spark.sql.types.StructType
scala> val struct = new StructType().add("a", "int")
struct: org.apache.spark.sql.types.StructType = StructType(StructField(a,IntegerType,true))

scala> val asInt = expr("token = 'hello'").cast("int")
asInt: org.apache.spark.sql.Column = CAST((token = hello) AS INT)

When parsing, you should see INFO messages in the logs:

Parsing command: int

It is also used in HiveClientImpl (when converting columns from Hive to Spark) and in OrcFileOperator (when inferring the schema for ORC files).

Creating Instance

CatalystSqlParser takes the following to be created:

CatalystSqlParser is created when:

Accessing CatalystSqlParser

// FIXME:

Logging

Enable ALL logging level for org.apache.spark.sql.catalyst.parser.CatalystSqlParser logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.sql.catalyst.parser.CatalystSqlParser=ALL

Refer to Logging.

Back to top