Develop a standalone Spark Structured Streaming application (using IntelliJ IDEA) that uses Kafka as the data source and sink.
Please read KafkaSource to learn about the data source and what is required to have it available in Spark Structured Streaming applications.
Use sbt package and spark-submit to run the application.
Module: Spark Structured Streaming
Duration: 45 mins
You will be using Apache Kafka as an external data source. Please download and install it first.
You can run a single-broker Kafka cluster using the following commands (in different consoles):
./bin/zookeeper-server-start.sh config/zookeeper.properties./bin/kafka-server-start.sh config/server.propertiesWith the single-broker Kafka cluster is up and running, use Kafka Console Producer and Consumer tools to produce and consume messages from a Kafka topic.
./bin/kafka-console-producer.sh --broker-list :9092 --topic input./bin/kafka-console-consumer.sh --bootstrap-server :9092 --topic inputkafka streaming source and console streaming sinkkafka streaming source and sink