Apache Kafka

The Essentials

@jaceklaskowski / StackOverflow / GitHub / LinkedIn
The "Internals" Books: books.japila.pl

Records

  1. Record (aka message or event) is the unit of data in Kafka
  2. Array of bytes (in no particular format)
  3. Record has a key and a value
    • Both could be null
  4. Records are categorized into topics

Topics

  1. Records are categorized into topics
    • Think a table or a directory
  2. Producers publish messages to topics while consumers consume them
  3. Topics are partitioned
    • Namespaces of one or many partitions
  4. kafka-topics shell script manages Kafka topics

Partitions

  1. Topics are partitioned into one or more partitions
  2. Partitions hold zero, one or many records
  3. Ordered (by offsets) immutable sequence of records
  4. A partition is a single ordered log
  5. Stored durably on disk
  6. Records are added to partitions in append-only fashion
  7. Partitions are replicated among brokers as replicas
  8. In-sync replicas (ISRs)

Replicas and In-Sync Replicas

  1. Replica is a copy of a partition
  2. Replication factor is the number of replicas of a topic
    • There can be one or many replicas
    • Allows for automatic failover when a broker fails
  3. One replica is the leader while others are followers
    • Leader handles writes from producers, and the followers merely copy the leader's log
  4. In-Sync Replica is a replica that has enough records to be considered in partition leader election
  5. Use kafka-topics --describe to list the details of a topic (incl. replicas and in-sync replicas)

Offsets

  1. Offset is a unique sequential numerical position of a record (in a partition of a topic)
    • A message in a partition has a unique offset
  2. Offsets start from 0
  3. Offsets are unique per partition only
    • Not across partitions

Kafka Topics and Partitions

(distributed commit log)

From Official Documentation of Apache Kafka

Kafka Topics and Partitions (cntd)


From Kafka: The Definitive Guide

Brokers

  1. Kafka Broker is a Kafka server that manages records
    • Receives messages, assigns offsets, and commits messages to storage on disk
  2. Kafka Cluster consists of one or more brokers
    • Uses Zookeeper as the source of truth

Producers

  1. Kafka clients that publish records to a Kafka cluster
  2. Send messages to topics
    • Can optionally specify partitions
  3. KafkaProducer API for Java

Consumers

  1. Kafka clients that consumes records from a Kafka cluster
  2. Subscribe to receive messages from topics
  3. Read messages in the order they were produced
    • Per partition only
  4. KafkaConsumer API for Java

Kafka Producers and Consumers

From Apache Kafka: Next Generation Distributed Messaging System

Kafka Producers and Consumers (cntd)

From Official Documentation of Apache Kafka

Retention

  1. Retention of messages in topics is how long messages are stored in topics
    • Durable message retention
    • For some period of time, e.g. 7 days
    • Until a topic reaches a certain size in bytes, e.g. 1 gigabyte
  2. Once these limits are reached, messages are expired and deleted
  3. Can be selected on a per-topic basis

Features of Kafka

  1. Thousands of Producers
  2. Thousand of Consumers
  3. Client Independence
  4. High Throughput
  5. Message Persistence
  6. Disk-based Retention
  7. Scalability
  8. High Performance