Skip to content

UnsupportedOperationChecker

UnsupportedOperationChecker checks whether the logical plan of a streaming query uses supported operations only.

UnsupportedOperationChecker is used when the internal spark.sql.streaming.unsupportedOperationCheck Spark property is enabled.

Note

UnsupportedOperationChecker comes actually with two methods, i.e. checkForBatch and <>, whose names reveal the different flavours of Spark SQL (as of 2.0), i.e. batch and streaming, respectively.

The Spark Structured Streaming gitbook is solely focused on <> method.

checkForStreaming Method

checkForStreaming(
  plan: LogicalPlan,
  outputMode: OutputMode): Unit

checkForStreaming asserts that the following requirements hold:

  1. <>

  2. <> (on the grouping expressions)

  3. <>

checkForStreaming...FIXME

checkForStreaming finds all streaming aggregates (i.e. Aggregate logical operators with streaming sources).

Note

Aggregate logical operator represents Dataset.groupBy and Dataset.groupByKey operators (and SQL's GROUP BY clause) in a logical query plan.

[[only-one-streaming-aggregation-allowed]] checkForStreaming asserts that there is exactly one streaming aggregation in a streaming query.

Otherwise, checkForStreaming reports a AnalysisException:

Multiple streaming aggregations are not supported with streaming DataFrames/Datasets

[[streaming-aggregation-append-mode-requires-watermark]] checkForStreaming asserts that watermark was defined for a streaming aggregation with Append output mode (on at least one of the grouping expressions).

Otherwise, checkForStreaming reports a AnalysisException:

Append output mode not supported when there are streaming aggregations on streaming DataFrames/DataSets without watermark

CAUTION: FIXME

checkForStreaming counts all FlatMapGroupsWithState logical operators (on streaming Datasets with isMapGroupsWithState flag disabled).

Note

FlatMapGroupsWithState.isMapGroupsWithState flag is disabled when...FIXME

[[multiple-flatMapGroupsWithState]] checkForStreaming asserts that multiple FlatMapGroupsWithState logical operators are only used when:

  • outputMode is Append output mode

  • outputMode of the FlatMapGroupsWithState logical operators is also Append output mode

CAUTION: FIXME Reference to an example in flatMapGroupsWithState

Otherwise, checkForStreaming reports a AnalysisException:

Multiple flatMapGroupsWithStates are not supported when they are not all in append mode or the output mode is not append on a streaming DataFrames/Datasets

CAUTION: FIXME

checkForStreaming is used when StreamingQueryManager is requested to create a StreamingQueryWrapper (for starting a streaming query), but only when the internal spark.sql.streaming.unsupportedOperationCheck configuration property is enabled.


Last update: 2020-11-28