System Design

0.0(0)

Studied by 0 people

Call with Kai

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/59

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

60 Terms

New cards

ACID

Atomicity, Consistency, Isolation, and Durability

New cards

Atomicity

All or nothing, transaction either totally succeeds or is rolled back

New cards

Consistency

Database always follows all defined rules and constraints, transation is not allowed to complete if it violates a constraint

New cards

Isolation

Isolation levels determine how transactions can interact with data that's being modified by other concurrent transactions. Transactions should not affect each other, but full isolation has performance hit.

New cards

Durability

Once a transaction is committed, that data is guaranteed to have been written to disk and sync'd, protecting against crashes or power failures.

New cards

Default postgest index

B-tree, works great for

Exact matches (WHERE email = 'user@example.com')

Range queries (WHERE created_at > '2024-01-01')

Sorting (ORDER BY username if the ORDER BY column match the index columns' order)

New cards

GIN index

Inverted index, study more

New cards

Elasticsearch

New cards

How is data written to postgres

1. Write-ahead log written to disk,

2. Update buffer cache in mem

3. Batch update writes cache back to disk

New cards

how to scale writes sql, lookpup

New cards

Allowing a user to read data they just wrote in a sharded system

read-your-writes consitency

New cards

Contention in Postgres

Normal isolation level is read committed, reads only read committed data. Can still lead to non-repeatable reads if transaction reads data, then a commit is made.

Two ways to solve contention:

Row-level locking: transaction locks row ensuring no use during transaction

Serializable isolation: Makes all db transactions behave as if they were executed one after another. Comes at cost to performance, and applications need logic to retry on conflict.

Row-level locking is preferred when you know exactly which rows need to be locked. Use serializable isolation for cases where the transaction is too complex to reason about which locks are needed.

Optimistic Concurrent Control?? need to research and add

New cards

When to consider database other than postgres.

1. Extreme Write Throughput

2. Global Multi-Region Requirements

Postgres can’t really handle simultaneous writes from multiple primaries

3. Simple Key-Value Access Patterns - Dont use it with simple key- value needs

New cards

how to scale reads, lookpup

New cards

How much data can a single database store?

Somewhere in range 10 - 50 TiB, instances might be able to handle higher end of that but maintenance, backup, and recovery starts becoming an issue.

New cards

How big can a single sql table be?

Tables start getting unwieldy past 100M rows

New cards

What are read/write limits for single postgres db?

~50k reads a sec for light reads, well indexed tables etc,

~5k reads a sec for heavy join reads

~10k writes a sec

New cards

Biggest hit to postgres performance

Performance drops significantly when working set exceeds available RAM. Normally 64gb to 256gb depending on instance

New cards

Postgres latency

Reads: 1-5ms for cached data, 5-30ms for disk

Writes: 5-15ms for commit latency

New cards

Latency: Reading 1mb sequentially from memory

0.25ms

New cards

Latency: Reading 1mb sequentially from SSD

1ms, 4x slower than memory

New cards

Latency: Reading 1mb sequentially from HDD

20ms, 20x slower than ssd

New cards

Latency: Round trip network latency CA to Netherlands

150ms

New cards

Storage: A two-hour movie

1gb

New cards

Storage: A small book of plain text

1mb

New cards

Storage: A high-resolution photo

1mb

New cards

Storage: A medium-resolution image (or a site layout graphic)

100kb

New cards

Storage: 1 page of plain text

2 KB

New cards

Storage: 1 character

1 byte

New cards

Storage: Size of Wikipedia

~ 150gb text content

New cards

How much space do 5 million 1 KB log entries take?

~5 GB (5,000,000 KB ≈ 5,000 MB ≈ 5 GB).

New cards

You have 100 million rows, each ~200 bytes. Roughly how big is the table?

~20 GB (100M × 200 B = 20,000,000,000 B)

New cards

How many 4 KB rows fit in 64 GB?

64 GB → 64,000 MB (×1000)
64,000 MB → 64,000,000 KB (×1000)
Each row = 4 KB
Rows = 64,000,000 ÷ 4 = 16,000,000

New cards

How many 100 B rows fit in 64 GB?

64 GB → 64,000 MB (×1000)
64,000 MB → 64,000,000 KB (×1000)
64,000,000 KB → 64,000,000,000 B (×1000)
Each row = 100 B
Rows = 64,000,000,000 ÷ 100 = 640,000,000

New cards

How many 5 MB photos fit in 250 GB?

250 GB → 250,000 MB (×1000)
Each photo = 5 MB
Photos = 250,000 ÷ 5 = 50,000

New cards

How many 20 KB log entries fit in 2 TB?

100,000,000 log entries (100 million)
2 TB → 2,000 GB (×1000)
2,000 GB → 2,000,000 MB (×1000)
2,000,000 MB → 2,000,000,000 KB (×1000)
Each log entry = 20 KB
Logs = 2,000,000,000 ÷ 20 = 100,000,000

New cards

What does API Gateway do?

Request validation
API Gateway applies middleware (auth, rate limiting, ssl offloading etc.)
Routing

New cards

System design latency, how long for simple action, fetching list etc

~100ms

New cards

System design latency, complicated action, checkout etc

~1 second

New cards

Postgres column thats derived from data in the table

generated column,

New cards

When to use elasticsearch over postgres.

More sophisticated relevancy scoring
Faceted search capabilities
Fuzzy matching and "search as you type" features
Distributed search across very large datasets
Advanced analytics and aggregations

New cards

read-through cache strategy

New cards

Cache Invalidation and Consistency

New cards

Load Balancing

Round robin etc

New cards

How many messages could a single kafka broker handle?

About 1 million per second

New cards

Steps in system design answer

Functional Requirements

Non-functional Requirements

Core Entities

API or System Interface

High Level Design

Deep Dives

New cards

Rate limit algorithms

Fixed Window Counter - easy, efficient, but suffers from boundary effect

Sliding window log - accurate, memory-intensive

Token Bucket - buckets refills at a constant rate and reqs use tokens from it, good balance handling accuracy and bursty traffic

New cards

How many reads / writes can redis instance handle?

~100k ops/s

New cards

Redis read / write latency

0.5 - 1ms reads, 1 - 2 ms writes

New cards

Redis cache single node data storage limit

~1TB

New cards

Latency between instances within datacenter

Less than 1ms

New cards

Latency for a cross country tcp handshake

~60 - 80 ms

New cards

How to reduce latency between microservice applications

Keep TCP connections open, use connection pools

New cards

How can you make a service thats often sending small bits of data to another service more efficient?

Request batching

New cards

How to handle hot key

request coalescing - combining multiple requests for the same key into a single request.

Cache key fanout spreads a single hot key across multiple cache entries. Say 10 keys feed:taylor-swift:1, feed:taylor-swift:2, etc. Invalidation becomes harder and increases mem usage.

New cards

probabilistic early refresh

Serving cached data while refreshing it in the background. This refreshes cache entries before they expire, but not all at once. When your cache entry is fresh (just created), requests simply use it. But as it gets older, each request has a tiny chance of triggering a background refresh.

New cards

What is the number of seconds in a day

~100,000, actually 84,000

New cards

How many seconds in a year

30 million

New cards

Debezium is an example of what

database change data capture (CDC)