Day 1 / May 9 (Mon)¶
Introduction to Apache Hadoop 3.3.2¶
Read the following documents. Get familiar with the basics.
- Apache Hadoop
- Release 3.3.2 available
- Hadoop Commands Guide
- FileSystem Shell
- The Hadoop FileSystem API Definition
Exercise: Setting Up Hadoop Cluster¶
Hadoop: Setting up a Single Node Cluster which shows you how to set up a single-node Hadoop installation.
We are interested in Pseudo-Distributed Mode.
Please note that you should download a binary distribution (e.g., hadoop-3.3.2.tar.gz
).
Code Review¶
Introduction to HDFS¶
Read the following documents:
Exercise: Spark SQL and HDFS¶
Create a Spark SQL application that loads CSV files from a HDFS directory
- Use
hdfs://
URI - Review Load Spark data locally Incomplete HDFS URI et al.
Tips¶
./sbin/start-dfs.sh
./bin/hdfs dfs -mkdir /files
./bin/hdfs dfs -put README.txt /files/
./bin/hdfs dfs -ls /files
spark.read.text("hdfs://localhost:9000/files/").show