Apache Spark™ and Scala
for Experienced Oracle and SQL Developers
5 Days
@jaceklaskowski
/
StackOverflow
/
GitHub
Books:
Mastering Apache Spark
/
Spark Structured Streaming
Jacek Laskowski
is an independent consultant
Specializing in
Spark
, Kafka, Kafka Streams, Scala
Development | Consulting | Training
Among contributors to
Spark
(since
1.6.0
)
Contact me at
jacek@japila.pl
Follow
@JacekLaskowski
on twitter
for more
#ApacheSpark
Jacek is best known by his
Gitbooks
:
Mastering Apache Spark
Mastering Spark SQL
Spark Structured Streaming
Mastering Kafka Streams
Apache Kafka Notebook
Agenda
Day 1
— Just Enough Scala (with IntelliJ IDEA)
Day 2
— Foundations of Spark SQL
Day 3
— Aggregations and Joins
Day 4
— Advanced Apache Spark and Monitoring
Day 5
— Advanced Spark SQL and Spark MLlib
Day 1 — Just Enough Scala
With IntelliJ IDEA
Scala — Just Enough to Develop Spark Applications
Getting familiar with the syntax and
Scala REPL
sbt console
My First Scala Standalone Application
IntelliJ IDEA
,
sbt package
and
spark-submit
Running Scala applications using
java -jar
sbt-assembly
plugin
Example
: Changing column names in Dataset
Seq.foldLeft
and
Dataset.withColumnRenamed
Prerequisities
Be prepared to get the most out of the workshop
Prerequisities / Programming Experience
Some programming experience using modern programming language, e.g. Scala, Python, Java, F#
Prerequisities / To Be Installed
Java Platform, Standard Edition (Java SE) 8
IntelliJ IDEA
Community Edition with
Scala plugin
sbt
Prerequisities / To Be Downloaded
The latest version of
Apache Spark
In-Class Preparations
Make Instructor's Life Slightly Easier. Thanks!
Introduce Yourself
First name
What do you expect from the workshop?
Where do you want to be with Spark after 5 days?
Addendum
Write down your name on paper and put it in front of you (stick to your laptop?)
Is lunch at
12:45pm
OK?