AstBuilder — ANTLR-based SQL Parser¶
AstBuilder
converts ANTLR ParseTree
s into Catalyst entities using visit callbacks.
AstBuilder
is the only requirement of the AbstractSqlParser abstraction (and used by CatalystSqlParser directly while SparkSqlParser uses SparkSqlAstBuilder instead).
SqlBase.g4 — ANTLR Grammar¶
AstBuilder
is a ANTLR AbstractParseTreeVisitor
(as SqlBaseBaseVisitor
) that is generated from the ANTLR grammar of Spark SQL.
SqlBaseBaseVisitor
is a ANTLR-specific base class that is generated at build time from the ANTLR grammar of Spark SQL is available in the Apache Spark repository at SqlBase.g4.
SqlBaseBaseVisitor
is an AbstractParseTreeVisitor in ANTLR.
Visit Callbacks¶
visitAnalyze¶
Creates an AnalyzeColumnStatement or an AnalyzeTableStatement
logical operator
ANALYZE TABLE multipartIdentifier partitionSpec? COMPUTE STATISTICS
(identifier | FOR COLUMNS identifierSeq | FOR ALL COLUMNS)?
ANTLR labeled alternative: #analyze
visitDeleteFromTable¶
Creates a DeleteFromTable
ANTLR labeled alternative: #deleteFromTable
visitDescribeRelation¶
Creates a DescribeColumnStatement
or DescribeRelation
(DESC | DESCRIBE) TABLE? option=(EXTENDED | FORMATTED)?
multipartIdentifier partitionSpec? describeColName?
ANTLR labeled alternative: #describeRelation
visitExists¶
Creates an Exists expression
ANTLR labeled alternative: #exists
visitExplain¶
Creates a ExplainCommand
ANTLR rule: explain
visitFirst¶
Creates a First aggregate function expression
FIRST '(' expression (IGNORE NULLS)? ')'
ANTLR labeled alternative: #first
visitFromClause¶
Creates a LogicalPlan
Supports multiple comma-separated relations (that all together build a condition-less INNER JOIN) with optional LATERAL VIEW.
A relation can be one of the following or a combination thereof:
- Table identifier
- Inline table using
VALUES exprs AS tableIdent
- Table-valued function (currently only
range
is supported)
ANTLR rule: fromClause
visitFunctionCall¶
Creates one of the following:
-
UnresolvedFunction for a bare function (with no window specification)
-
UnresolvedWindowExpression for a function evaluated in a windowed context with a
WindowSpecReference
-
WindowExpression for a function over a window
ANTLR rule: functionCall
import spark.sessionState.sqlParser
scala> sqlParser.parseExpression("foo()")
res0: org.apache.spark.sql.catalyst.expressions.Expression = 'foo()
scala> sqlParser.parseExpression("foo() OVER windowSpecRef")
res1: org.apache.spark.sql.catalyst.expressions.Expression = unresolvedwindowexpression('foo(), WindowSpecReference(windowSpecRef))
scala> sqlParser.parseExpression("foo() OVER (CLUSTER BY field)")
res2: org.apache.spark.sql.catalyst.expressions.Expression = 'foo() windowspecdefinition('field, UnspecifiedFrame)
visitInlineTable¶
Creates a UnresolvedInlineTable unary logical operator (as the child of SubqueryAlias for tableAlias
)
VALUES expression (',' expression)* tableAlias
expression
can be as follows:
-
CreateNamedStruct expression for multiple-column tables
-
Any Catalyst expression for one-column tables
tableAlias
can be specified explicitly or defaults to colN
for every column (starting from 1
for N
).
ANTLR rule: inlineTable
visitInsertIntoTable¶
Creates a InsertIntoTable (indirectly)
A 3-element tuple with a TableIdentifier
, optional partition keys and the exists
flag disabled
INSERT INTO TABLE? tableIdentifier partitionSpec?
ANTLR labeled alternative: #insertIntoTable
Note
insertIntoTable
is part of insertInto
that is in turn used only as a helper labeled alternative in singleInsertQuery and multiInsertQueryBody ANTLR rules.
visitInsertOverwriteTable¶
Creates a InsertIntoTable (indirectly)
A 3-element tuple with a TableIdentifier
, optional partition keys and the exists
flag
INSERT OVERWRITE TABLE tableIdentifier (partitionSpec (IF NOT EXISTS)?)?
In a way, visitInsertOverwriteTable
is simply a more general version of the visitInsertIntoTable with the exists
flag on or off based on existence of IF NOT EXISTS
. The main difference is that dynamic partitions are used with no IF NOT EXISTS
.
ANTLR labeled alternative: #insertOverwriteTable
Note
insertIntoTable
is part of insertInto
that is in turn used only as a helper labeled alternative in singleInsertQuery and multiInsertQueryBody ANTLR rules.
visitMergeIntoTable¶
Creates a MergeIntoTable
ANTLR labeled alternative: #mergeIntoTable
visitMultiInsertQuery¶
Creates a logical operator with a InsertIntoTable (and UnresolvedRelation leaf operator)
FROM relation (',' relation)* lateralView*
INSERT OVERWRITE TABLE ...
FROM relation (',' relation)* lateralView*
INSERT INTO TABLE? ...
ANTLR rule: multiInsertQueryBody
visitNamedExpression¶
Creates one of the following Catalyst expressions:
- Alias (for a single alias)
MultiAlias
(for a parenthesis enclosed alias list)- a bare Expression
ANTLR rule: namedExpression
visitNamedQuery¶
Creates a SubqueryAlias
visitQuerySpecification¶
Creates OneRowRelation or LogicalPlan
OneRowRelation
visitQuerySpecification
creates a OneRowRelation
for a SELECT
without a FROM
clause.
val q = sql("select 1")
scala> println(q.queryExecution.logical.numberedTreeString)
00 'Project [unresolvedalias(1, None)]
01 +- OneRowRelation$
ANTLR rule: querySpecification
visitPredicated¶
Creates an Expression
ANTLR rule: predicated
visitRelation¶
Creates a LogicalPlan for a FROM
clause.
ANTLR rule: relation
visitRepairTable¶
Creates a RepairTableStatement for the following SQL statement:
MSCK REPAIR TABLE multipartIdentifier
ANTLR labeled alternative: #repairTable
visitShowCurrentNamespace¶
Creates a ShowCurrentNamespaceStatement for the following SQL statement:
SHOW CURRENT NAMESPACE
ANTLR labeled alternative: #showCurrentNamespace
visitShowTables¶
Creates a ShowTables for the following SQL statement:
SHOW TABLES ((FROM | IN) multipartIdentifier)?
(LIKE? pattern=STRING)?
ANTLR labeled alternative: #showTables
visitSingleDataType¶
Creates a DataType
ANTLR rule: singleDataType
visitSingleExpression¶
Creates an Expression
Takes the named expression and relays to visitNamedExpression
ANTLR rule: singleExpression
visitSingleInsertQuery¶
Creates a LogicalPlan with a InsertIntoTable
INSERT INTO TABLE? tableIdentifier partitionSpec? #insertIntoTable
INSERT OVERWRITE TABLE tableIdentifier (partitionSpec (IF NOT EXISTS)?)? #insertOverwriteTable
ANTLR labeled alternative: #singleInsertQuery
visitSortItem¶
Creates a SortOrder unary expression
sortItem
: expression ordering=(ASC | DESC)? (NULLS nullOrder=(LAST | FIRST))?
;
// queryOrganization
ORDER BY order+=sortItem (',' order+=sortItem)*
SORT BY sort+=sortItem (',' sort+=sortItem)*
// windowSpec
(ORDER | SORT) BY sortItem (',' sortItem)*)?
ANTLR rule: sortItem
visitSingleStatement¶
Creates a LogicalPlan from a single SQL statement
ANTLR rule: singleStatement
visitStar¶
Creates a UnresolvedStar
ANTLR labeled alternative: #star
visitSubqueryExpression¶
Creates a ScalarSubquery
ANTLR labeled alternative: #subqueryExpression
visitUse¶
Creates a UseStatement for the following SQL statement:
USE NAMESPACE? multipartIdentifier
ANTLR labeled alternative: #use
visitWindowDef¶
Creates a WindowSpecDefinition
// CLUSTER BY with window frame
'(' CLUSTER BY partition+=expression (',' partition+=expression)*) windowFrame? ')'
// PARTITION BY and ORDER BY with window frame
'(' ((PARTITION | DISTRIBUTE) BY partition+=expression (',' partition+=expression)*)?
((ORDER | SORT) BY sortItem (',' sortItem)*)?)
windowFrame? ')'
ANTLR rule: windowDef
Parsing Handlers¶
withAggregationClause¶
withAggregationClause(
ctx: AggregationClauseContext,
selectExpressions: Seq[NamedExpression],
query: LogicalPlan): LogicalPlan
Adds one of the following logical operators:
-
GroupingSets for
GROUP BY … GROUPING SETS (…)
-
Aggregate for
GROUP BY … (WITH CUBE | WITH ROLLUP)?
withGenerate¶
Adds a Generate with a UnresolvedGenerator and join flag enabled for LATERAL VIEW
(in SELECT
or FROM
clauses).
withHavingClause¶
withHavingClause(
ctx: HavingClauseContext,
plan: LogicalPlan): LogicalPlan
Creates an UnresolvedHaving
withHints¶
Adds a Hint for /*+ hint */
in SELECT
queries.
Note
Note +
(plus) between /*
and */
hint
is of the format name
or name (param1, param2, ...)
.
/*+ BROADCAST (table) */
withInsertInto¶
Creates one of the following logical operators:
withInsertInto
is used for visitMultiInsertQuery and visitSingleInsertQuery
withJoinRelations¶
Adds a Join for a FROM clause and relation alone.
The following join types are supported:
INNER
(default)CROSS
LEFT
(with optionalOUTER
)LEFT SEMI
RIGHT
(with optionalOUTER
)FULL
(with optionalOUTER
)ANTI
(optionally prefixed withLEFT
)
The following join criteria are supported:
ON booleanExpression
USING '(' identifier (',' identifier)* ')'
Joins can be NATURAL
(with no join criteria)
withQuerySpecification¶
Adds a query specification to a logical operator
For transform SELECT
(with TRANSFORM
, MAP
or REDUCE
qualifiers), withQuerySpecification
does...FIXME
For regular SELECT
(no TRANSFORM
, MAP
or REDUCE
qualifiers), withQuerySpecification
adds (in that order):
. Generate unary logical operators (if used in the parsed SQL text)
. Filter unary logical plan (if used in the parsed SQL text)
. GroupingSets or Aggregate unary logical operators (if used in the parsed SQL text)
. Project
and/or Filter
unary logical operators
. WithWindowDefinition unary logical operator (if used in the parsed SQL text)
. UnresolvedHint unary logical operator (if used in the parsed SQL text)
withPredicate¶
-
NOT? IN '(' query ')'
adds an In predicate expression with a ListQuery subquery expression -
NOT? IN '(' expression (',' expression)* ')'
adds an In predicate expression
withQueryResultClauses¶
Important
This section needs your help
withRepartitionByExpression¶
withRepartitionByExpression(
ctx: QueryOrganizationContext,
expressions: Seq[Expression],
query: LogicalPlan): LogicalPlan
withRepartitionByExpression
simply throws a ParseException
:
DISTRIBUTE BY is not supported
withRepartitionByExpression
is used when AstBuilder
is requested to withQueryResultClauses (for DISTRIBUTE BY
and CLUSTER BY
SQL clauses).
withSample¶
Important
This section needs your help
withSelectQuerySpecification¶
Important
This section needs your help
withWindows¶
Adds a WithWindowDefinition for window aggregates (given WINDOW
definitions).
Used for withQueryResultClauses and withQuerySpecification with windows
definition.
WINDOW identifier AS windowSpec
(',' identifier AS windowSpec)*
aliasPlan
Method¶
aliasPlan(
alias: ParserRuleContext,
plan: LogicalPlan): LogicalPlan
aliasPlan
...FIXME
aliasPlan
is used when...FIXME
mayApplyAliasPlan
Method¶
mayApplyAliasPlan(
tableAlias: TableAliasContext,
plan: LogicalPlan): LogicalPlan
mayApplyAliasPlan
...FIXME
mayApplyAliasPlan
is used when...FIXME