Kafka For System Design

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/23

There's no tags or description

Looks like no tags are added yet.

Last updated 5:04 AM on 11/25/25

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

24 Terms

New cards

Brokers

These are the servers (physical or virtual) that make up a Kafka cluster. They are responsible for holding the queues

New cards

Partitions

The actual queues in Kafka are called partitions. They are ordered, immutable sequences of messages (append-only log files) existing on disk

(w/ the 2D array analogy they are the rows)

New cards

Topics

A logical grouping of partitions. While partitions are the physical storage units, topics are the logical categories used in code to organize data (e.g., you publish to and consume from a "topic")
(w/ 2D array analogy, it is the array itself.)

New cards

Producers

Services that write to a topic

New cards

Consumers

services that read from a topic.

New cards

Message/Record Structure

A message consists of four attributes: a Key (determines ordering/partitioning), a Value (the data), a Timestamp, and Headers (metadata like HTTP headers)

New cards

Record Key (Partition Key)

an attribute of a kafka record (message) that is a small piece of data (often a string or a number) that Kafka uses to decide which partition a message goes to.

New cards

Record Value (Message Value )

the actual payload of the kafka record/message

New cards

Record Timestamp

when the kafka record/message was sent by the producer or when the broker wrote the message (depends on configuration)

New cards

Record Headers

are optional key–value pairs attached to the message, similar to HTTP headers. Its Extra metadata (like HTTP headers) that helps with routing, tracing, and processing, without polluting the main data.

New cards

Partition Assignment

If a key is provided, Kafka hashes it (using MurmurHash) and performs a modulo operation against the number of partitions to deterministically assign the message to a specific partition. If no key is provided, it uses round-robin assignment

New cards

Offsets (in a partition)

basically the position (index) of a record within a partition’s log. As messages are appended to the log, they are assigned a sequential ID called an offset (0, 1, 2, etc.). Consumers track their progress by committing the offset of the last message they successfully processed
(w/ 2D array example it the is the columns in the 2D array)

New cards

Consumer Group

A way to group consumers so that each event is processed by only one consumer in the group, allowing for parallel processing without duplication

New cards

Schema Registry

a separate service where you store and manage the data structure (schema) for your messages (for example, Avro/JSON/Protobuf). Instead of each service guessing the message shape, they all look up the schema in the registry, so producers and consumers agree on the same fields and types, and you can safely evolve the schema over time (add fields, change versions, etc.).

New cards

Replication in Kafka

To ensure durability, partitions have a Leader Replica (handles all reads/writes) and Follower Replicas (passively back up data). If a leader fails, a follower can take over

New cards

Active-Active Topology for availability

multiple nodes/instances/sites all serving traffic at the same time. The traffic is load balanced across the instances. If one node dies, others are already active and can keep serving requests → this gives high availability and often better scalability.

New cards

When to use Kafka (5 Reasons)

Async Processing (kafka acts as a buffer)
In-Order Processing (When serial execution is required)
Decoupling (when independent scaling is needed)
Stream Processing (processing real time data in large volumes)
Pub/Sub (When a single message needs to be distributed to multiple different consumers simultaneously)

New cards

Hot Partition

When one partition is overwhelmed usually due to a bad partition key.

New cards

Compound Key (Composite key)

a key where multiple values are combined into one, usually to uniquely identify something (like userId + deviceId).
This can increase the number of distinct keys and help spread load, but it doesn’t automatically guarantee that you’ll avoid hot keys.

New cards

Consumer Recovery

If a consumer fails, it (or a group member) resumes from the last committed offset. It is crucial to commit offsets after work is completed to avoid data loss

New cards

Idempotent Producer

makes sure that, even if the producer has to retry sends because of errors, each message is written to a given partition at most once and in order for the lifetime of that producer instance. Kafka’s idempotent producer tags messages with a producer ID and per-partition sequence number; the broker remembers the last sequence it accepted and drops any retries with the same sequence, so each message is written to a partition at most once even if the producer has to retry.

enable.idempotence=true

New cards

Batching

Aggregating messages into fewer network requests to increase throughput

New cards

Consumer Retries

Kafka does not support consumer retries out of the box. The recommended pattern is to use a Retry Topic for failed messages and eventually a Dead Letter Queue (DLQ) if they fail repeatedly

New cards

Dead Letter Queue

a special queue (or topic) where “bad” messages go when they can’t be processed correctly. a DLQ is usually implemented as a separate Kafka topic that receives messages your consumer couldn’t handle.