Skip to content

Kafka Data Source

Kafka Data Source allows Spark SQL (and Spark Structured Streaming) to load data from and write data to topics in Apache Kafka.

Kafka Data Source is available as kafka format.

The entry point is KafkaSourceProvider.

Note

Apache Kafka is a storage of records in a format-independent and fault-tolerant durable way.

Learn more about Apache Kafka in the official documentation or in Mastering Apache Kafka.

Kafka Data Source supports options to fine-tune structured queries.

Reading Data from Kafka Topics

In order to load Kafka records use kafka as the input data source format.

val records = spark.read.format("kafka").load

Alternatively, use org.apache.spark.sql.kafka010.KafkaSourceProvider.

val records = spark
  .read
  .format("org.apache.spark.sql.kafka010.KafkaSourceProvider")
  .load

Writing Data to Kafka Topics

In order to save a DataFrame to Kafka topics use kafka as the output data source format.

import org.apache.spark.sql.DataFrame
val records: DataFrame = ...
records.format("kafka").save

Last update: 2020-11-09