[[operators]] In Spark Structured Streaming, a streaming join is a streaming query that was described (build) using the high-level streaming operators:
Streaming joins can be stateless or <
Joins of a streaming query and a batch query (stream-static joins) are stateless and no state management is required
Joins of two streaming queries (<
>) are stateful and require streaming state (with an optional < >).
Spark Structured Streaming supports stream-stream joins with the following:
Stream-stream equi-joins are planned as <
ShuffleExchangeExec physical operators (per <
=== [[join-state-watermark]] Join State Watermark for State Removal
Stream-stream joins may optionally define Join State Watermark for state removal (cf. <
A join state watermark can be specified on the following:
A join state watermark can be specified on key state, value state or both.
=== [[IncrementalExecution]] IncrementalExecution -- QueryExecution of Streaming Queries
Under the covers, the <
Join logical operators.
TIP: Read up on https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-LogicalPlan-Join.html[Join Logical Operator] in https://bit.ly/spark-sql-internals[The Internals of Spark SQL] online book.
In Spark Structured Streaming IncrementalExecution is responsible for planning streaming queries for execution.
Further Reading Or Watching¶
Stream-stream Joins in the official documentation of Apache Spark for Structured Streaming
Introducing Stream-Stream Joins in Apache Spark 2.3 by Databricks
(video) Deep Dive into Stateful Stream Processing in Structured Streaming by Tathagata Das