Apache Spark 2

Ecosystem 5 Days (beta^2)

@jaceklaskowski / StackOverflow / GitHub / Mastering Apache Spark 2

https://github.com/jaceklaskowski

https://bit.ly/mastering-apache-spark

Among contributors to Apache Spark 1.6

Among contributors to Apache Spark 2

Among contributors to Apache Spark 2.1

Ranked #96 in Spark contributors

http://stackoverflow.com/users/1305344/jacek-laskowski

https://twitter.com/jaceklaskowski

Agenda - Day 1

  1. The Elements of Apache Spark (aka Why Spark)
  2. My First Spark Application
    • IntelliJ IDEA, sbt and spark-submit
  3. Deploying Spark Applications to a Cluster
    • Spark Standalone
    • Hadoop YARN
    • Apache Mesos

Agenda - Day 2-5

  1. Apache Spark 2
    • Spark SQL (incl. Structured Streaming)
    • Spark MLlib
    • Spark Streaming
  2. DC/OS & Apache Mesos
  3. Apache Kafka
  4. PostgreSQL
  5. Spark JobServer
  6. Hadoop YARN

Prerequisities

  1. Some programming experience using modern programming language (preferably on JVM)
    • Scala, Python, Java, F#
  2. Installed
  3. Downloaded
  4. Willingness to ask PLENTY of questions

Questions?