Develop a standalone Spark Structured Streaming application (using IntelliJ IDEA) that uses Kafka as the data source and sink.
Please read KafkaSource to learn about the data source and what is required to have it available in Spark Structured Streaming applications.
Use sbt package
and spark-submit
to run the application.
Module: Spark Structured Streaming
Duration: 45 mins
You will be using Apache Kafka as an external data source. Please download and install it first.
You can run a single-broker Kafka cluster using the following commands (in different consoles):
./bin/zookeeper-server-start.sh config/zookeeper.properties
./bin/kafka-server-start.sh config/server.properties
With the single-broker Kafka cluster is up and running, use Kafka Console Producer and Consumer tools to produce and consume messages from a Kafka topic.
./bin/kafka-console-producer.sh --broker-list :9092 --topic input
./bin/kafka-console-consumer.sh --bootstrap-server :9092 --topic input
kafka
streaming source and console
streaming sinkkafka
streaming source and sink