1/23
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
Brokers
These are the servers (physical or virtual) that make up a Kafka cluster. They are responsible for holding the queues
Partitions
The actual queues in Kafka are called partitions. They are ordered, immutable sequences of messages (append-only log files) existing on disk
(w/ the 2D array analogy they are the rows)
Topics
A logical grouping of partitions. While partitions are the physical storage units, topics are the logical categories used in code to organize data (e.g., you publish to and consume from a "topic")
(w/ 2D array analogy, it is the array itself.)
Producers
Services that write to a topic
Consumers
services that read from a topic.
Message/Record Structure
A message consists of four attributes: a Key (determines ordering/partitioning), a Value (the data), a Timestamp, and Headers (metadata like HTTP headers)
Record Key (Partition Key)
an attribute of a kafka record (message) that is a small piece of data (often a string or a number) that Kafka uses to decide which partition a message goes to.
Record Value (Message Value )
the actual payload of the kafka record/message
Record Timestamp
when the kafka record/message was sent by the producer or when the broker wrote the message (depends on configuration)
Record Headers
are optional key–value pairs attached to the message, similar to HTTP headers. Its Extra metadata (like HTTP headers) that helps with routing, tracing, and processing, without polluting the main data.
Partition Assignment
If a key is provided, Kafka hashes it (using MurmurHash) and performs a modulo operation against the number of partitions to deterministically assign the message to a specific partition. If no key is provided, it uses round-robin assignment
Offsets (in a partition)
basically the position (index) of a record within a partition’s log. As messages are appended to the log, they are assigned a sequential ID called an offset (0, 1, 2, etc.). Consumers track their progress by committing the offset of the last message they successfully processed
(w/ 2D array example it the is the columns in the 2D array)
Consumer Group
A way to group consumers so that each event is processed by only one consumer in the group, allowing for parallel processing without duplication
Schema Registry
a separate service where you store and manage the data structure (schema) for your messages (for example, Avro/JSON/Protobuf). Instead of each service guessing the message shape, they all look up the schema in the registry, so producers and consumers agree on the same fields and types, and you can safely evolve the schema over time (add fields, change versions, etc.).
Replication in Kafka
To ensure durability, partitions have a Leader Replica (handles all reads/writes) and Follower Replicas (passively back up data). If a leader fails, a follower can take over
Active-Active Topology for availability
multiple nodes/instances/sites all serving traffic at the same time. The traffic is load balanced across the instances. If one node dies, others are already active and can keep serving requests → this gives high availability and often better scalability.
When to use Kafka (5 Reasons)
Async Processing (kafka acts as a buffer)
In-Order Processing (When serial execution is required)
Decoupling (when independent scaling is needed)
Stream Processing (processing real time data in large volumes)
Pub/Sub (When a single message needs to be distributed to multiple different consumers simultaneously)
Hot Partition
When one partition is overwhelmed usually due to a bad partition key.
Compound Key (Composite key)
a key where multiple values are combined into one, usually to uniquely identify something (like userId + deviceId).
This can increase the number of distinct keys and help spread load, but it doesn’t automatically guarantee that you’ll avoid hot keys.
Consumer Recovery
If a consumer fails, it (or a group member) resumes from the last committed offset. It is crucial to commit offsets after work is completed to avoid data loss
Idempotent Producer
makes sure that, even if the producer has to retry sends because of errors, each message is written to a given partition at most once and in order for the lifetime of that producer instance. Kafka’s idempotent producer tags messages with a producer ID and per-partition sequence number; the broker remembers the last sequence it accepted and drops any retries with the same sequence, so each message is written to a partition at most once even if the producer has to retry.
enable.idempotence=trueBatching
Aggregating messages into fewer network requests to increase throughput
Consumer Retries
Kafka does not support consumer retries out of the box. The recommended pattern is to use a Retry Topic for failed messages and eventually a Dead Letter Queue (DLQ) if they fail repeatedly
Dead Letter Queue
a special queue (or topic) where “bad” messages go when they can’t be processed correctly. a DLQ is usually implemented as a separate Kafka topic that receives messages your consumer couldn’t handle.