Day 3 / Apr 21 (Thu)¶
Continuing our journey into Apache Kafka. Today we're using Spark SQL and Spark Structured Streaming to process records from Kafka topics.
Morning Exercise¶
Theory¶
Structured Streaming + Kafka Integration Guide (Kafka broker version 0.10.0 or higher)
kafka-console-consumer
with some options:
./bin/kafka-console-consumer.sh \
--property print.key=true \
--property key.separator=" -> " \
--bootstrap-server :9092 \
--topic output \
--from-beginning \
--value-deserializer org.apache.kafka.common.serialization.IntegerDeserializer
Practice¶
-
Write a Spark application that loads Kafka records (from a topic given by
args(0)
) and displays them to the console- Create a brand new project in IntelliJ IDEA
- Push the project to Github
Part 1. Spark SQL and show records (using
DataFrame.show
)Part 2. Spark Structured Streaming and show records (using
format("console")
) -
Modify the above Spark application to accept 2 command-line arguments
topicIn
andtopicOut
to load records from and save them to, appropriately. The application should change record values to their UPPERCASE variant.Push the project to Github once finished or at the end of the day (whatever happens earlier). Report it on slack.