Into Apache Spark 2

Using spark-shell

...and Databricks

@jaceklaskowski / StackOverflow / GitHub
Notebooks: Mastering Apache Spark / Spark Structured Streaming

Spark Shell

  1. Interactive Spark Development Environment
  2. Scala REPL + few Spark imports and values
  3. Excellent tool to explore and learn Spark
  4. Use spark-shell to open it
  5. Use TAB early and often
  6. Switch to Mastering Apache Spark 2

SparkSession

  1. The entry point to Spark SQL
    • ...and Spark in general these days
  2. spark-shell creates an instance as spark for you
  3. version gives you the version of Spark you use
  4. Used to create Datasets/DataFrames
    • spark.range(numberOfRecords)
  5. Switch to Mastering Apache Spark 2

Databricks Cloud

  1. Web-based spark-shell in the cloud
  2. No need to install anything but the web browser
  3. Visit Databricks Cloud Community Edition
  4. Excellent tool for data scientists

Spark Submit

  1. Tool to submit Spark applications
  2. Use spark-submit shell script
  3. spark-shell is a Spark application submitted for execution using spark-submit
  4. Command-line options bring the full experience
  5. Switch to Mastering Apache Spark 2

Tools

  • spark-shell - an interactive shell for Apache Spark
  • spark-submit - a tool to manage Spark applications (i.e. submit, kill, status)
  • spark-sql - an interactive shell for Spark SQL and Hive queries
  • web UI - a web interface to monitor Spark computations
  • Scaladoc - Spark's Scala API documentation

Questions?