spark-workshop

Apache Spark™ and Scala Workshops

Agendas

  1. Big Data Solutions using Apache Spark (2 days)Stockholm, Sweden, 2-3 Jun 2022
  2. Apache Spark™ Advanced for PySpark Developers Workshop (2 days online) — taught online, 25-26 January 2022
  3. Apache Spark™ Advanced for Developers Workshop (5 half-days online) — taught online, 14-18 September 2020
  4. Apache Spark™ Advanced for Developers Workshop (5 days) — taught in Vilnius, Lithuania, 13-17 January 2020
  5. Apache Spark™ Advanced for Developers Workshop (5 days) for Experienced Scala Developers with significant experience in Apache Spark™ — taught in Villeneuve-Loubet, France, 25-29 November 2019
  6. Apache Spark™ with Scala for PySpark Developers Workshop (5 days) — taught in Pittsburgh PA, US, 15-19 April 2019
  7. Apache Spark™ Advanced for Developers Workshop (5 days) for Experienced Scala Developers with significant experience in Apache Spark™ — taught in Villeneuve-Loubet, France, 8-12 April 2019
  8. Apache Spark™ Advanced for Developers Workshop (5 days) for Experienced Scala Developers with significant experience in Apache Spark™ — taught in Villeneuve-Loubet, France, 19-23 November 2018
  9. Apache Spark™ Developer Certification Workshop (5 days) — taught in Roubaix, France, 5-9 November 2018
  10. Apache Spark for Experienced Oracle and SQL Developers Workshop Agenda (5 days) — taught in Gdansk, Poland
  11. Introduction to Scala Workshop Agenda (5 half-days online)
  12. Advanced Apache Spark for Developers Workshop Agenda (5 days) for Experienced Scala Developers with significant experience in Apache Spark — taught in Villeneuve-Loubet, France three times
  13. Spark Structured Streaming in Apache Spark 2.2 Workshop (1 day) for Software Developers — taught in Ljubljana, Slovenia
  14. Graduate Scala and Spark Workshop Agenda (5 days) for Junior Python and Java Developers — taught in London, UK
  15. Spark SQL 2.2 Workshop Agenda (3 days) for Data Engineers, Business Analysts and Architects - taught in-class once in Warsaw, Poland
  16. Spark SQL / Scala Workshop Agenda (5 days) for Data Engineers, Business Analysts and Architects - taught twice in Collegeville, PA, USA
  17. Packt Live :: Streaming Analytics with Apache Spark - 2-hour webinar about Spark SQL, Spark MLlib, Structured Streaming and Apache Kafka
  18. Apache Spark 2 Workshop Agenda (4 days) - taught in Geneva, Switzerland
    • featuring Spark SQL, Structured Streaming, Spark MLlib, Spark Streaming, Spark Architecture, web UI, Apache Kafka, Scala, sbt, IntelliJ IDEA, Databricks
  19. Apache Spark 2 Workshop Agenda (5 half-days) - taught mostly online and in-class once (in Warsaw, Poland)
    • featuring Spark SQL, Spark MLlib, Spark Structured Streaming, web UI, Apache Kafka, Scala, sbt, IntelliJ IDEA, Databricks
  20. 2-Day Workshop Agenda - taught in London, UK at Apache Spark 2 Workshop
  21. 2-Day Workshop Agenda - held in Ljubljana, Slovenia
  22. 3-Day Workshop Agenda
  23. 4-Day Workshop Agenda - taught in Karlskrona, Sweden and online twice (5 half-days)
  24. 5-Day Spark Ecosystem Workshop Agenda - taught online twice
  25. 5-Day Spark Administration and Monitoring Workshop Agenda - taught in Villeneuve-Loubet, France
  26. 1-Day Spark SQL 2 / Scala Workshop Using Databricks Cloud Agenda for Software Developers and Data Analysts - taught in Toronto, Canada
  27. 4-Day Spark 2 / Scala Workshop Agenda for Software Developers and Data Analysts - taught in Toronto, Canada

Unit 1. Spark SQL for Large-Scale Structured Data Processing

  1. Spark SQL
  2. DataSource API
  3. Columns and Dataset Operators
  4. Standard and User-Defined Functions
  5. Basic Aggregation
  6. Joins
  7. Working with Missing Data
  8. Windowed Aggregation
  9. Multi-Dimensional Aggregation
  10. Caching and Persistence
  11. The Internals of Structured Query Execution
  12. Join Optimization With Bucketing
  13. Developing Custom Data Source
  14. Spark SQL Exercises

Unit 2. Spark Structured Streaming for Large-Scale Stream Processing

  1. Structured Streaming
  2. Fault Tolerance and Checkpointing
  3. Monitoring Streaming Queries
  4. Structured Streaming’s Internals
  5. Stateful Stream Processing

Unit 3. Spark MLlib for Large-Scale Distributed Machine Learning

  1. Machine Learning with Spark MLlib
  2. ML Pipelines

Unit 4. Large-Scale Distributed Data Processing with Apache Spark (aka Spark Core)

  1. Introduction to Apache Spark
  2. The Core of Apache Spark
  3. Into Apache Spark 2 Using spark-shell (and Databricks)
  4. web UI
  5. Spark and Cluster Managers
  6. Spark History Server
  7. Monitoring Spark using SparkListeners
  8. Debugging Spark Applications
  9. Spark Thrift JDBC/ODBC Server
  10. Dynamic Allocation of Executors

Unit 5. Scala Programming Language for Object-Oriented and Functional Programming

  1. Scala — Just Enough to Write Spark Applications
  2. Real-Life Scala Project
  3. Scala Exercises
  4. sbt — Interactive Build Tool for Apache Spark

Conference Talks

  1. From Basic to Advanced Aggregate Operators in Apache Spark 2.2 (SQL and Streams) by Examples @ BeeScala Conference, Nov 24th, 2017 Ljubljana Slovenia
  2. HackOn(Data) in Toronto ON - Solutions Review

Exercises

  1. Using TaskCompletionListener, TaskFailureListener, TaskContext

Attic / Deprecated Material

  1. Spark Streaming
  2. Spark Streaming’s Stateful Operators
  3. Agenda — Day 1
  4. Agenda — Day 2
  5. Agenda — Day 3