System Design

0.0(0)
studied byStudied by 0 people
full-widthCall with Kai
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/59

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

60 Terms

1
New cards

ACID

Atomicity, Consistency, Isolation, and Durability

2
New cards

Atomicity

All or nothing, transaction either totally succeeds or is rolled back

3
New cards

Consistency

Database always follows all defined rules and constraints, transation is not allowed to complete if it violates a constraint

4
New cards

Isolation

Isolation levels determine how transactions can interact with data that's being modified by other concurrent transactions. Transactions should not affect each other, but full isolation has performance hit.

5
New cards

Durability

Once a transaction is committed, that data is guaranteed to have been written to disk and sync'd, protecting against crashes or power failures.

6
New cards

Default postgest index

B-tree, works great for

Exact matches (WHERE email = 'user@example.com')

Range queries (WHERE created_at > '2024-01-01')

Sorting (ORDER BY username if the ORDER BY column match the index columns' order)

7
New cards

GIN index

Inverted index, study more

8
New cards

Elasticsearch

9
New cards

How is data written to postgres

1. Write-ahead log written to disk,

2. Update buffer cache in mem

3. Batch update writes cache back to disk

10
New cards

how to scale writes sql, lookpup

11
New cards

Allowing a user to read data they just wrote in a sharded system

read-your-writes consitency

12
New cards

Contention in Postgres

Normal isolation level is read committed, reads only read committed data. Can still lead to non-repeatable reads if transaction reads data, then a commit is made. 

Two ways to solve contention: 

Row-level locking: transaction locks row ensuring no use during transaction

Serializable isolation: Makes all db transactions behave as if they were executed one after another. Comes at cost to performance, and applications need logic to retry on conflict. 

Row-level locking is preferred when you know exactly which rows need to be locked. Use serializable isolation for cases where the transaction is too complex to reason about which locks are needed.

Optimistic Concurrent Control?? need to research and add

13
New cards

When to consider database other than postgres.

1. Extreme Write Throughput

2. Global Multi-Region Requirements

Postgres can’t really handle simultaneous writes from multiple primaries

3. Simple Key-Value Access Patterns - Dont use it with simple key- value needs

14
New cards

how to scale reads, lookpup

15
New cards

How much data can a single database store?

Somewhere in range 10 - 50 TiB, instances might be able to handle higher end of that but maintenance, backup, and recovery starts becoming an issue.

16
New cards

How big can a single sql table be? 

Tables start getting unwieldy past 100M rows

17
New cards

What are read/write limits for single postgres db?

~50k reads a sec for light reads, well indexed tables etc,

~5k reads a sec for heavy join reads

~10k writes a sec

18
New cards

Biggest hit to postgres performance

Performance drops significantly when working set exceeds available RAM. Normally 64gb to 256gb depending on instance

19
New cards

Postgres latency

Reads: 1-5ms for cached data, 5-30ms for disk

Writes: 5-15ms for commit latency

20
New cards

Latency: Reading 1mb sequentially from memory

0.25ms

21
New cards

Latency: Reading 1mb sequentially from SSD

1ms, 4x slower than memory

22
New cards

Latency: Reading 1mb sequentially from HDD

20ms, 20x slower than ssd

23
New cards

Latency: Round trip network latency CA to Netherlands

150ms

24
New cards

Storage: A two-hour movie

1gb

25
New cards

Storage: A small book of plain text

1mb

26
New cards

Storage: A high-resolution photo

1mb

27
New cards

Storage: A medium-resolution image (or a site layout graphic)

100kb

28
New cards

Storage: 1 page of plain text

2 KB

29
New cards

Storage: 1 character

1 byte

30
New cards

Storage: Size of Wikipedia

~ 150gb text content

31
New cards

How much space do 5 million 1 KB log entries take?

~5 GB (5,000,000 KB ≈ 5,000 MB ≈ 5 GB).

32
New cards

You have 100 million rows, each ~200 bytes. Roughly how big is the table?

~20 GB (100M × 200 B = 20,000,000,000 B)

33
New cards

How many 4 KB rows fit in 64 GB?

  • 64 GB → 64,000 MB (×1000)

  • 64,000 MB → 64,000,000 KB (×1000)

  • Each row = 4 KB

  • Rows = 64,000,000 ÷ 4 = 16,000,000

34
New cards

How many 100 B rows fit in 64 GB?

  • 64 GB → 64,000 MB (×1000)

  • 64,000 MB → 64,000,000 KB (×1000)

  • 64,000,000 KB → 64,000,000,000 B (×1000)

  • Each row = 100 B

  • Rows = 64,000,000,000 ÷ 100 = 640,000,000

35
New cards

How many 5 MB photos fit in 250 GB?

  • 250 GB → 250,000 MB (×1000)

  • Each photo = 5 MB

  • Photos = 250,000 ÷ 5 = 50,000

36
New cards

How many 20 KB log entries fit in 2 TB?

  • 100,000,000 log entries (100 million)

  • 2 TB → 2,000 GB (×1000)

  • 2,000 GB → 2,000,000 MB (×1000)

  • 2,000,000 MB → 2,000,000,000 KB (×1000)

  • Each log entry = 20 KB

  • Logs = 2,000,000,000 ÷ 20 = 100,000,000

37
New cards

What does API Gateway do?

  1. Request validation

  2. API Gateway applies middleware (auth, rate limiting, ssl offloading etc.)

  3. Routing

38
New cards

System design latency, how long for simple action, fetching list etc

~100ms

39
New cards

System design latency, complicated action, checkout etc

~1 second

40
New cards

Postgres column thats derived from data in the table

generated column,

41
New cards

When to use elasticsearch over postgres.

  • More sophisticated relevancy scoring

  • Faceted search capabilities

  • Fuzzy matching and "search as you type" features

  • Distributed search across very large datasets

  • Advanced analytics and aggregations

42
New cards

read-through cache strategy

43
New cards

Cache Invalidation and Consistency

44
New cards

Load Balancing

Round robin etc 

45
New cards

How many messages could a single kafka broker handle? 

About 1 million per second

46
New cards

Steps in system design answer

Functional Requirements

Non-functional Requirements

Core Entities

API or System Interface

High Level Design

Deep Dives

47
New cards

Rate limit algorithms 

Fixed Window Counter - easy, efficient, but suffers from boundary effect

Sliding window log - accurate, memory-intensive

Token Bucket - buckets refills at a constant rate and reqs use tokens from it, good balance handling accuracy and bursty traffic

48
New cards

How many reads / writes can redis instance handle?

~100k ops/s

49
New cards

Redis read / write latency

0.5 - 1ms reads, 1 - 2 ms writes

50
New cards

Redis cache single node data storage limit

~1TB

51
New cards

Latency between instances within datacenter

Less than 1ms

52
New cards

Latency for a cross country tcp handshake

~60 - 80 ms

53
New cards

How to reduce latency between microservice applications

Keep TCP connections open, use connection pools

54
New cards

How can you make a service thats often sending small bits of data to another service more efficient?

Request batching

55
New cards

How to handle hot key

request coalescing - combining multiple requests for the same key into a single request.

Cache key fanout spreads a single hot key across multiple cache entries. Say 10 keys feed:taylor-swift:1, feed:taylor-swift:2, etc. Invalidation becomes harder and increases mem usage.

56
New cards

probabilistic early refresh

Serving cached data while refreshing it in the background. This refreshes cache entries before they expire, but not all at once. When your cache entry is fresh (just created), requests simply use it. But as it gets older, each request has a tiny chance of triggering a background refresh.

57
New cards

What is the number of seconds in a day 

~100,000, actually 84,000

58
New cards
59
New cards

How many seconds in a year

30 million

60
New cards

Debezium is an example of what

database change data capture (CDC)