Apache Spark™ and Scala Workshops
Agendas
- Big Data Solutions using Apache Spark (2 days) — Stockholm, Sweden, 2-3 Jun 2022
- Apache Spark™ Advanced for PySpark Developers Workshop (2 days online) — taught online, 25-26 January 2022
- Apache Spark™ Advanced for Developers Workshop (5 half-days online) — taught online, 14-18 September 2020
- Apache Spark™ Advanced for Developers Workshop (5 days) — taught in Vilnius, Lithuania, 13-17 January 2020
- Apache Spark™ Advanced for Developers Workshop (5 days) for Experienced Scala Developers with significant experience in Apache Spark™ — taught in Villeneuve-Loubet, France, 25-29 November 2019
- Apache Spark™ with Scala for PySpark Developers Workshop (5 days) — taught in Pittsburgh PA, US, 15-19 April 2019
- Apache Spark™ Advanced for Developers Workshop (5 days) for Experienced Scala Developers with significant experience in Apache Spark™ — taught in Villeneuve-Loubet, France, 8-12 April 2019
- Apache Spark™ Advanced for Developers Workshop (5 days) for Experienced Scala Developers with significant experience in Apache Spark™ — taught in Villeneuve-Loubet, France, 19-23 November 2018
- Apache Spark™ Developer Certification Workshop (5 days) — taught in Roubaix, France, 5-9 November 2018
- Apache Spark for Experienced Oracle and SQL Developers Workshop Agenda (5 days) — taught in Gdansk, Poland
- Introduction to Scala Workshop Agenda (5 half-days online)
- Advanced Apache Spark for Developers Workshop Agenda (5 days) for Experienced Scala Developers with significant experience in Apache Spark — taught in Villeneuve-Loubet, France three times
- Spark Structured Streaming in Apache Spark 2.2 Workshop (1 day) for Software Developers — taught in Ljubljana, Slovenia
- Graduate Scala and Spark Workshop Agenda (5 days) for Junior Python and Java Developers — taught in London, UK
- Spark SQL 2.2 Workshop Agenda (3 days) for Data Engineers, Business Analysts and Architects - taught in-class once in Warsaw, Poland
- Spark SQL / Scala Workshop Agenda (5 days) for Data Engineers, Business Analysts and Architects - taught twice in Collegeville, PA, USA
- Packt Live :: Streaming Analytics with Apache Spark - 2-hour webinar about Spark SQL, Spark MLlib, Structured Streaming and Apache Kafka
- Apache Spark 2 Workshop Agenda (4 days) - taught in Geneva, Switzerland
- featuring Spark SQL, Structured Streaming, Spark MLlib, Spark Streaming, Spark Architecture, web UI, Apache Kafka, Scala, sbt, IntelliJ IDEA, Databricks
- Apache Spark 2 Workshop Agenda (5 half-days) - taught mostly online and in-class once (in Warsaw, Poland)
- featuring Spark SQL, Spark MLlib, Spark Structured Streaming, web UI, Apache Kafka, Scala, sbt, IntelliJ IDEA, Databricks
- 2-Day Workshop Agenda - taught in London, UK at Apache Spark 2 Workshop
- 2-Day Workshop Agenda - held in Ljubljana, Slovenia
- 3-Day Workshop Agenda
- 4-Day Workshop Agenda - taught in Karlskrona, Sweden and online twice (5 half-days)
- 5-Day Spark Ecosystem Workshop Agenda - taught online twice
- 5-Day Spark Administration and Monitoring Workshop Agenda - taught in Villeneuve-Loubet, France
- 1-Day Spark SQL 2 / Scala Workshop Using Databricks Cloud Agenda for Software Developers and Data Analysts - taught in Toronto, Canada
- 4-Day Spark 2 / Scala Workshop Agenda for Software Developers and Data Analysts - taught in Toronto, Canada
Unit 1. Spark SQL for Large-Scale Structured Data Processing
- Spark SQL
- DataSource API
- Columns and Dataset Operators
- Standard and User-Defined Functions
- Basic Aggregation
- Joins
- Working with Missing Data
- Windowed Aggregation
- Multi-Dimensional Aggregation
- Caching and Persistence
- The Internals of Structured Query Execution
- Join Optimization With Bucketing
- Developing Custom Data Source
- Spark SQL Exercises
Unit 2. Spark Structured Streaming for Large-Scale Stream Processing
- Structured Streaming
- Fault Tolerance and Checkpointing
- Monitoring Streaming Queries
- Structured Streaming’s Internals
- Stateful Stream Processing
Unit 3. Spark MLlib for Large-Scale Distributed Machine Learning
- Machine Learning with Spark MLlib
- ML Pipelines
Unit 4. Large-Scale Distributed Data Processing with Apache Spark (aka Spark Core)
- Introduction to Apache Spark
- The Core of Apache Spark
- Into Apache Spark 2 Using spark-shell (and Databricks)
- web UI
- Spark and Cluster Managers
- Spark History Server
- Monitoring Spark using SparkListeners
- Debugging Spark Applications
- Spark Thrift JDBC/ODBC Server
- Dynamic Allocation of Executors
Unit 5. Scala Programming Language for Object-Oriented and Functional Programming
- Scala — Just Enough to Write Spark Applications
- Real-Life Scala Project
- Scala Exercises
- sbt — Interactive Build Tool for Apache Spark
Conference Talks
- From Basic to Advanced Aggregate Operators in Apache Spark 2.2 (SQL and Streams) by Examples @ BeeScala Conference, Nov 24th, 2017 Ljubljana Slovenia
- HackOn(Data) in Toronto ON - Solutions Review
Exercises
- Using TaskCompletionListener, TaskFailureListener, TaskContext
Attic / Deprecated Material
- Spark Streaming
- Spark Streaming’s Stateful Operators
- Agenda — Day 1
- Agenda — Day 2
- Agenda — Day 3