Day 1 / May 9 (Mon)¶

Introduction to Apache Hadoop 3.3.2¶

Read the following documents. Get familiar with the basics.

Hadoop: Setting up a Single Node Cluster which shows you how to set up a single-node Hadoop installation.

Please note that you should download a binary distribution (e.g., hadoop-3.3.2.tar.gz).

Read the following documents:

Create a Spark SQL application that loads CSV files from a HDFS directory

./sbin/start-dfs.sh

./bin/hdfs dfs -mkdir /files
./bin/hdfs dfs -put README.txt /files/

./bin/hdfs dfs -ls /files

spark.read.text("hdfs://localhost:9000/files/").show