1/18
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Keyspace
a top-level container for your tables, similar to a database in an RDBMS. It also defines replication settings (how data is copied across nodes) for all tables inside it.
Table
A table defines the schema (columns and types) and stores the rows of data that follow that schema.
Row
a single record in a table — one set of column values that belong together.
Column
A column is a named field (a piece of data that has a label - its column name ) in the table schema. Each row has a value (or no value) for that column.
Primary Key
The primary key is defined when creating a table and uniquely identifies each row. It is made of:
One or more partition key columns, and
Zero or more clustering columns.
The partition key part is always first; any clustering columns come after it. A primary key can be just one column (simple key) or multiple columns (composite key).
Wide Column / Column Family
data model where rows in the same partition can have many columns and can vary in width. In practice, Cassandra lets you add new columns to a table schema later; existing rows just treat those new columns as empty. T means we can alter the table in the future to add more columns without impacting existing rows in the table.
Partition
the logical group of rows that share the same partition key value. Inside that partition, rows are ordered by the clustering columns. The same partition is stored on one or more nodes as replicas, depending on the replication factor.
Partition Key / Partition column
the column (or set of columns) used to decide which partition the row belongs to and which node(s) store that partition. Cassandra hashes the partition key value to a token; each node owns certain token ranges, so the token decides which nodes hold that partition.
Clustering Key / Clustering column
the part of the primary key that comes after the partition key and controls the order of rows inside a partition. There can be zero, one, or many clustering columns. ie. The column in the table that is used to sort the rows in a partition.
Partitioner
the hash function Cassandra uses to turn a partition key into a token, which is then used to decide data placement across nodes. The default is the Murmur3 partitioner.
token ring
the ordered circle of all possible tokens. Cassandra hashes each partition key to a token on this ring, and each node owns one or more ranges on the ring. That’s how the cluster decides which node stores which partitions. Cassandra applies a partitioner (e.g., Murmur3) to partition key which gets a 64-bit token in the range
[-2^63, 2^63-1]
consistent hashing
means hashing partition keys onto a ring of tokens where each node owns a token range. Because of this ring setup, when nodes are added or removed, only the data in the affected ranges move, instead of redistributing everything.
node replication
copies of the data (partitions) are stored on multiple nodes. When one node goes down, the data is still exists, depending on the replication factor.
replication factor
controls how many nodes hold a copy, so if one node goes down, other nodes still have the data.
gossip protocol
peer-to-peer communication system Cassandra uses so nodes can regularly share information about cluster membership and node health (who is up, down, etc.). It supports replication and coordination, but the actual data replication happens via the normal write path, not by gossiping full data.
tunable consistency
means you can choose how many replicas must respond for a read or write to be considered successful. Common levels are:
ONE – wait for one replica
QUORUM – wait for a majority of replicas
ALL – wait for every replica
More replicas = stronger consistency but higher latency. (There are more levels, but these three are the core idea.)
Commit Log
an append-only log on disk that records every write operation. Its job is durability: if a node crashes, Cassandra can replay the commit log to rebuild what was in memtables but not yet written out to SSTables.
SSTable (Sorted String Table )
is an immutable, on-disk file that stores actual table data in a sorted, efficient format. When a memtable fills up, Cassandra writes it to disk as a new SSTable; reads then combine data from SSTables (and any current memtable) to answer queries.
Memtable
is an in-memory table where Cassandra keeps recent writes. When you insert or update data, it first goes into the commit log (for durability) and into a memtable (for fast access), and later the memtable is flushed to disk as an SSTable.