From Basic to Advanced Aggregates

Spark SQL & "Spark Streams"

Apache Spark 2.2 @BeeScalaConf

@jaceklaskowski / StackOverflow / GitHub
Books: Mastering Apache Spark / Spark Structured Streaming

https://bit.ly/mastering-apache-spark

©Jacek Laskowski 2017 / @jaceklaskowski / jacek@japila.pl

https://bit.ly/spark-structured-streaming

©Jacek Laskowski 2017 / @jaceklaskowski / jacek@japila.pl

Mobile formats available (PDF, Mobi, ePub) DRM-free

©Jacek Laskowski 2017 / @jaceklaskowski / jacek@japila.pl

StackOverflow

©Jacek Laskowski 2017 / @jaceklaskowski / jacek@japila.pl
©Jacek Laskowski 2017 / @jaceklaskowski / jacek@japila.pl

Agenda / Spark SQL


  Basic Aggregation

  Windowed Aggregation

  Multi-Dimensional Aggregation

...and 20 mins of our time went away
©Jacek Laskowski 2017 / @jaceklaskowski / jacek@japila.pl

Agenda / From Batch to Streaming


  Spark Structured Streaming's Internals

...and 25 mins of our time went away
©Jacek Laskowski 2017 / @jaceklaskowski / jacek@japila.pl

Agenda / Spark Structured Streaming Spark Streams


  Streaming Dataset API for aggregates

  groupBy Operator — Untyped Streaming Aggregation

  Demo: groupBy Streaming Aggregation with Append Output Mode

  groupByKey Operator — Streaming Aggregation

  window Function — Stream Time Windows

  withWatermark Operator — Event Time Watermark

  mapGroupsWithState Operator — Stateful Streaming Aggregation (with Explicit State Logic)

  flatMapGroupsWithState Operator — Arbitrary Stateful Streaming Aggregation (with Explicit State Logic)

...and 40 mins of our time went away
©Jacek Laskowski 2017 / @jaceklaskowski / jacek@japila.pl

Questions?


  Read Spark Structured Streaming gitbook

  Read Mastering Apache Spark 2 gitbook

  Follow @jaceklaskowski on twitter

  Upvote my questions and answers on StackOverflow