Cassandra For System Design

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/18

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

19 Terms

1
New cards

Keyspace

a top-level container for your tables, similar to a database in an RDBMS. It also defines replication settings (how data is copied across nodes) for all tables inside it.

2
New cards

Table

A table defines the schema (columns and types) and stores the rows of data that follow that schema.

3
New cards

Row

a single record in a table — one set of column values that belong together.

4
New cards

Column

A column is a named field (a piece of data that has a label - its column name ) in the table schema. Each row has a value (or no value) for that column.

5
New cards

Primary Key

The primary key is defined when creating a table and uniquely identifies each row. It is made of:

  • One or more partition key columns, and

  • Zero or more clustering columns.

The partition key part is always first; any clustering columns come after it. A primary key can be just one column (simple key) or multiple columns (composite key).

6
New cards

Wide Column / Column Family

data model where rows in the same partition can have many columns and can vary in width. In practice, Cassandra lets you add new columns to a table schema later; existing rows just treat those new columns as empty. T means we can alter the table in the future to add more columns without impacting existing rows in the table.

7
New cards

Partition

the logical group of rows that share the same partition key value. Inside that partition, rows are ordered by the clustering columns. The same partition is stored on one or more nodes as replicas, depending on the replication factor.

8
New cards

Partition Key / Partition column

the column (or set of columns) used to decide which partition the row belongs to and which node(s) store that partition. Cassandra hashes the partition key value to a token; each node owns certain token ranges, so the token decides which nodes hold that partition.

9
New cards

Clustering Key / Clustering column

the part of the primary key that comes after the partition key and controls the order of rows inside a partition. There can be zero, one, or many clustering columns. ie. The column in the table that is used to sort the rows in a partition.

10
New cards

Partitioner

the hash function Cassandra uses to turn a partition key into a token, which is then used to decide data placement across nodes. The default is the Murmur3 partitioner.

11
New cards

token ring

the ordered circle of all possible tokens. Cassandra hashes each partition key to a token on this ring, and each node owns one or more ranges on the ring. That’s how the cluster decides which node stores which partitions. Cassandra applies a partitioner (e.g., Murmur3) to partition key which gets a 64-bit token in the range

[-2^63, 2^63-1]

12
New cards

consistent hashing

means hashing partition keys onto a ring of tokens where each node owns a token range. Because of this ring setup, when nodes are added or removed, only the data in the affected ranges move, instead of redistributing everything.

13
New cards

node replication

copies of the data (partitions) are stored on multiple nodes. When one node goes down, the data is still exists, depending on the replication factor.

14
New cards

replication factor

controls how many nodes hold a copy, so if one node goes down, other nodes still have the data.

15
New cards

gossip protocol

peer-to-peer communication system Cassandra uses so nodes can regularly share information about cluster membership and node health (who is up, down, etc.). It supports replication and coordination, but the actual data replication happens via the normal write path, not by gossiping full data.

16
New cards

tunable consistency

means you can choose how many replicas must respond for a read or write to be considered successful. Common levels are:

  • ONE – wait for one replica

  • QUORUM – wait for a majority of replicas

  • ALL – wait for every replica

    More replicas = stronger consistency but higher latency. (There are more levels, but these three are the core idea.)

17
New cards

Commit Log

an append-only log on disk that records every write operation. Its job is durability: if a node crashes, Cassandra can replay the commit log to rebuild what was in memtables but not yet written out to SSTables.

18
New cards

SSTable (Sorted String Table )

is an immutable, on-disk file that stores actual table data in a sorted, efficient format. When a memtable fills up, Cassandra writes it to disk as a new SSTable; reads then combine data from SSTables (and any current memtable) to answer queries.

19
New cards

Memtable

is an in-memory table where Cassandra keeps recent writes. When you insert or update data, it first goes into the commit log (for durability) and into a memtable (for fast access), and later the memtable is flushed to disk as an SSTable.