Kafka Consumers,
Consumer Groups,
and Partition Rebalancing
@jaceklaskowski
/
StackOverflow
/
GitHub
/
LinkedIn
The "Internals" Books:
books.japila.pl
## Agenda 1. [Kafka Consumers](#/kafka-consumers) 1. [Consumer Groups](#/consumer-groups) 1. [Partition Rebalancing](#/partition-rebalancing)
## Kafka Consumers * **Kafka Consumer** is an independent Kafka client that consumes records from a Kafka cluster * Uses [KafkaConsumer](https://kafka.apache.org/27/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html) API * **Subscribes** to one or many topics * Handles failures of Kafka brokers and adapts as topic partitions migrate within the cluster * Maintains TCP connections to the necessary brokers to fetch data * Allows groups of consumers to load balance consumption using **consumer groups**
discussed next
## Consumer Groups
(1 of 2)
* **Consumer Group** is a group of Kafka consumers that has divided the work of consuming and processing records among themselves * Conceptually, a consumer group is a single logical subscriber that happens to be made up of multiple processes (brokers) * Kafka consumers with the same **group.id** * Each partition (of subscribed topics) is assigned to **exactly one consumer** in a consumer group * Provides scalability and fault tolerance for processing
## Consumer Groups
(2 of 2)
* Membership in a consumer group is maintained dynamically * If a member fails or a new consumer joins the group, the partitions will be reassigned to all group members (at **partition rebalancing**
discussed next
) * Members can either be running on the same machine or can be distributed over many machines * Any number of consumer groups for a given topic is acceptable (without duplicating data) * Semantics similar to a queue in traditional messaging systems * Record delivery is balanced among the member of a group
## Partition Rebalancing
(1 of 2)
* **Partition Rebalancing** (aka **rebalancing a group**) is a process of balancing the partitions (of subscribed topics) between members of a consumer group * E.g. a topic with 4 partitions and a consumer group with 2 processes will give each consumer would consume from 2 partitions * Also used when new partitions are added to a subscribed topic or when a new topic matching a subscribed regex is created
## Partition Rebalancing
(2 of 2)
* Changes in group membership or topic subscription will automatically be detected through periodic metadata refreshes * When group reassignment happens automatically, consumers can be notified through a [ConsumerRebalanceListener](https://kafka.apache.org/33/javadoc/org/apache/kafka/clients/consumer/ConsumerRebalanceListener.html) so they can finish necessary application-level logic (e.g. state cleanup, manual offset commits)
## Recap 1. [Kafka Consumers](#/kafka-consumers) 1. [Consumer Groups](#/consumer-groups) 1. [Partition Rebalancing](#/partition-rebalancing)
# Questions? * Read [The Internals of Apache Kafka](https://books.japila.pl/kafka-internals/) * Read [The Internals of Kafka Streams](https://books.japila.pl/kafka-streams-internals) * Read [The Internals of ksqlDB](https://books.japila.pl/ksqldb-internals/) * Follow [@jaceklaskowski](https://twitter.com/jaceklaskowski) on twitter (DMs open) * Upvote [my questions and answers on StackOverflow](http://stackoverflow.com/users/1305344/jacek-laskowski) * Contact me at **jacek@japila.pl**