Day 3 / Apr 21 (Thu)¶

Continuing our journey into Apache Kafka. Today we're using Spark SQL and Spark Structured Streaming to process records from Kafka topics.

Morning Exercise¶

Exercise: Word Count Per Record

Theory¶

Structured Streaming + Kafka Integration Guide (Kafka broker version 0.10.0 or higher)

kafka-console-consumer with some options:

./bin/kafka-console-consumer.sh \
    --property print.key=true \
    --property key.separator=" -> " \
    --bootstrap-server :9092 \
    --topic output \
    --from-beginning \
    --value-deserializer org.apache.kafka.common.serialization.IntegerDeserializer

Practice¶

Write a Spark application that loads Kafka records (from a topic given by args(0)) and displays them to the console
1. Create a brand new project in IntelliJ IDEA
2. Push the project to Github
Part 1. Spark SQL and show records (using DataFrame.show)

Part 2. Spark Structured Streaming and show records (using format("console"))
Modify the above Spark application to accept 2 command-line arguments topicIn and topicOut to load records from and save them to, appropriately. The application should change record values to their UPPERCASE variant.

Push the project to Github once finished or at the end of the day (whatever happens earlier). Report it on slack.

Day 3 / Apr 21 (Thu)¶

Morning Exercise¶

Theory¶

Practice¶

Learning Resources¶