Day 2 / May 10 (Tue)¶
Introduction to Hadoop YARN¶
Read the following documents. Get familiar with the basics.
Exercise: Spark on YARN¶
- Read Running Spark on YARN
- Use the Spark SQL application that you created yesterday (that loads CSV files from a HDFS directory) and deploy it to your local Hadoop YARN cluster
Code Review¶
- https://github.com/JKulczynski/Docker-CommandLine-App
- https://github.com/rafalkac02/directory-traverser
(optional) Exercise: Spark on YARN on Docker¶
- Read Launching Applications Using Docker Containers
- Deploy the Spark SQL application to the Hadoop YARN cluster on Docker