Spark 2.0 / Scala Workshop Day 3

Jacek Laskowski / @jaceklaskowski / GitHub / Mastering Apache Spark Notes

Modules - Day 1 (1 of 2)

  • The Elements of Apache Spark (aka Why Spark)
  • Tools Intro - IntelliJ IDEA and Scala Worksheet
  • Introduction to Scala (aka Why Scala)
  • Spark Setup and Your First Spark Application
  • Exercise: My First Scala Application (using IntelliJ IDEA)

Modules - Day 1 (2 of 2)

  • Introduction to sbt and sbt-assembly
  • Exercise: Running Scala Application from Command Line
  • Introduction to Spark SQL using spark-shell and Spark's web U
  • Exercise: My First Spark SQL Application (using IntelliJ IDEA)
  • Introduction to spark-submit and run-example
  • Exercise: spark-submit Your Spark App / run-example SparkPi

Agenda - Day 2 (1 of 2)

  • Introduction to Spark SQL
  • SparkSession — The Entry Point
  • DataSource API — Loading and Writing Datasets
  • Exercise: My First Spark SQL Application (using IntelliJ IDEA)
  • Spark Architecture and Cluster Managers
  • Exercise: spark-submit Your Spark App / run-example SparkPi

Agenda - Day 2 (2 of 2)

  • Using Functions and Operators in Spark SQL
  • Exercise: Executing Queries from Command-Line / CSV
  • Web UI
  • Scala functions and UDFs in Spark SQL
  • Exercise: Scala functions and UDFs

Agenda - Day 3 (1 of 2) / Today

  • Spark Properties and Spark History Server
  • Exercise: Executing Queries from Command-Line / CSV
  • Spark SQL Aggregation and Window Operators
  • Exercise: Using Aggregations and Windows with datasets in CSV files

Agenda - Day 3 (2 of 2) / Today

  • Introduction to Spark MLlib
  • ML Pipelines
  • Exercise: Using LogisticRegression in a Standalone Application
  • Spark Standalone cluster on Azure Template

Questions?