Kafka For System Design

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/23

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

24 Terms

1
New cards

Brokers

These are the servers (physical or virtual) that make up a Kafka cluster. They are responsible for holding the queues

2
New cards

Partitions

The actual queues in Kafka are called partitions. They are ordered, immutable sequences of messages (append-only log files) existing on disk 

(w/ the 2D array analogy they are the rows)

3
New cards

Topics

A logical grouping of partitions. While partitions are the physical storage units, topics are the logical categories used in code to organize data (e.g., you publish to and consume from a "topic")
(w/ 2D array analogy, it is the array itself.)

4
New cards

Producers

Services that write to a topic

5
New cards

Consumers

services that read from a topic.

6
New cards

Message/Record Structure

A message consists of four attributes: a Key (determines ordering/partitioning), a Value (the data), a Timestamp, and Headers (metadata like HTTP headers)

7
New cards

Record Key (Partition Key)

an attribute of a kafka record (message) that is a small piece of data (often a string or a number) that Kafka uses to decide which partition a message goes to.

8
New cards

Record Value (Message Value )

the actual payload of the kafka record/message

9
New cards

Record Timestamp

when the kafka record/message was sent by the producer or when the broker wrote the message (depends on configuration)

10
New cards

Record Headers

are optional key–value pairs attached to the message, similar to HTTP headers. Its Extra metadata (like HTTP headers) that helps with routing, tracing, and processing, without polluting the main data.

11
New cards

Partition Assignment

If a key is provided, Kafka hashes it (using MurmurHash) and performs a modulo operation against the number of partitions to deterministically assign the message to a specific partition. If no key is provided, it uses round-robin assignment

12
New cards

Offsets (in a partition)

basically the position (index) of a record within a partition’s log. As messages are appended to the log, they are assigned a sequential ID called an offset (0, 1, 2, etc.). Consumers track their progress by committing the offset of the last message they successfully processed
(w/ 2D array example it the is the columns in the 2D array)

13
New cards

Consumer Group

A way to group consumers so that each event is processed by only one consumer in the group, allowing for parallel processing without duplication

14
New cards

Schema Registry

a separate service where you store and manage the data structure (schema) for your messages (for example, Avro/JSON/Protobuf). Instead of each service guessing the message shape, they all look up the schema in the registry, so producers and consumers agree on the same fields and types, and you can safely evolve the schema over time (add fields, change versions, etc.).

15
New cards

Replication in Kafka 

To ensure durability, partitions have a Leader Replica (handles all reads/writes) and Follower Replicas (passively back up data). If a leader fails, a follower can take over

16
New cards

Active-Active Topology for availability

multiple nodes/instances/sites all serving traffic at the same time. The traffic is load balanced across the instances. If one node dies, others are already active and can keep serving requests → this gives high availability and often better scalability

17
New cards

When to use Kafka (5 Reasons)

  1. Async Processing (kafka acts as a buffer)

  2. In-Order Processing (When serial execution is required)

  3. Decoupling (when independent scaling is needed)

  4. Stream Processing (processing real time data in large volumes)

  5. Pub/Sub (When a single message needs to be distributed to multiple different consumers simultaneously)

18
New cards

Hot Partition

When one partition is overwhelmed usually due to a bad partition key. 

19
New cards

Compound Key (Composite key)

a key where multiple values are combined into one, usually to uniquely identify something (like userId + deviceId).
This can increase the number of distinct keys and help spread load, but it doesn’t automatically guarantee that you’ll avoid hot keys.

20
New cards

Consumer Recovery 

If a consumer fails, it (or a group member) resumes from the last committed offset. It is crucial to commit offsets after work is completed to avoid data loss

21
New cards

Idempotent Producer

makes sure that, even if the producer has to retry sends because of errors, each message is written to a given partition at most once and in order for the lifetime of that producer instance. Kafka’s idempotent producer tags messages with a producer ID and per-partition sequence number; the broker remembers the last sequence it accepted and drops any retries with the same sequence, so each message is written to a partition at most once even if the producer has to retry.

enable.idempotence=true

22
New cards

Batching 

Aggregating messages into fewer network requests to increase throughput

23
New cards

Consumer Retries

Kafka does not support consumer retries out of the box. The recommended pattern is to use a Retry Topic for failed messages and eventually a Dead Letter Queue (DLQ) if they fail repeatedly

24
New cards

Dead Letter Queue

a special queue (or topic) where “bad” messages go when they can’t be processed correctly. a DLQ is usually implemented as a separate Kafka topic that receives messages your consumer couldn’t handle.