1/59
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
ACID
Atomicity, Consistency, Isolation, and Durability
Atomicity
All or nothing, transaction either totally succeeds or is rolled back
Consistency
Database always follows all defined rules and constraints, transation is not allowed to complete if it violates a constraint
Isolation
Isolation levels determine how transactions can interact with data that's being modified by other concurrent transactions. Transactions should not affect each other, but full isolation has performance hit.
Durability
Once a transaction is committed, that data is guaranteed to have been written to disk and sync'd, protecting against crashes or power failures.
Default postgest index
B-tree, works great for
Exact matches (WHERE email = 'user@example.com')
Range queries (WHERE created_at > '2024-01-01')
Sorting (ORDER BY username if the ORDER BY column match the index columns' order)
GIN index
Inverted index, study more
Elasticsearch
How is data written to postgres
1. Write-ahead log written to disk,
2. Update buffer cache in mem
3. Batch update writes cache back to disk
how to scale writes sql, lookpup
Allowing a user to read data they just wrote in a sharded system
read-your-writes consitency
Contention in Postgres
Normal isolation level is read committed, reads only read committed data. Can still lead to non-repeatable reads if transaction reads data, then a commit is made.
Two ways to solve contention:
Row-level locking: transaction locks row ensuring no use during transaction
Serializable isolation: Makes all db transactions behave as if they were executed one after another. Comes at cost to performance, and applications need logic to retry on conflict.
Row-level locking is preferred when you know exactly which rows need to be locked. Use serializable isolation for cases where the transaction is too complex to reason about which locks are needed.
Optimistic Concurrent Control?? need to research and add
When to consider database other than postgres.
1. Extreme Write Throughput
2. Global Multi-Region Requirements
Postgres can’t really handle simultaneous writes from multiple primaries
3. Simple Key-Value Access Patterns - Dont use it with simple key- value needs
how to scale reads, lookpup
How much data can a single database store?
Somewhere in range 10 - 50 TiB, instances might be able to handle higher end of that but maintenance, backup, and recovery starts becoming an issue.
How big can a single sql table be?
Tables start getting unwieldy past 100M rows
What are read/write limits for single postgres db?
~50k reads a sec for light reads, well indexed tables etc,
~5k reads a sec for heavy join reads
~10k writes a sec
Biggest hit to postgres performance
Performance drops significantly when working set exceeds available RAM. Normally 64gb to 256gb depending on instance
Postgres latency
Reads: 1-5ms for cached data, 5-30ms for disk
Writes: 5-15ms for commit latency
Latency: Reading 1mb sequentially from memory
0.25ms
Latency: Reading 1mb sequentially from SSD
1ms, 4x slower than memory
Latency: Reading 1mb sequentially from HDD
20ms, 20x slower than ssd
Latency: Round trip network latency CA to Netherlands
150ms
Storage: A two-hour movie
1gb
Storage: A small book of plain text
1mb
Storage: A high-resolution photo
1mb
Storage: A medium-resolution image (or a site layout graphic)
100kb
Storage: 1 page of plain text
2 KB
Storage: 1 character
1 byte
Storage: Size of Wikipedia
~ 150gb text content
How much space do 5 million 1 KB log entries take?
~5 GB (5,000,000 KB ≈ 5,000 MB ≈ 5 GB).
You have 100 million rows, each ~200 bytes. Roughly how big is the table?
~20 GB (100M × 200 B = 20,000,000,000 B)
How many 4 KB rows fit in 64 GB?
64 GB → 64,000 MB (×1000)
64,000 MB → 64,000,000 KB (×1000)
Each row = 4 KB
Rows = 64,000,000 ÷ 4 = 16,000,000
How many 100 B rows fit in 64 GB?
64 GB → 64,000 MB (×1000)
64,000 MB → 64,000,000 KB (×1000)
64,000,000 KB → 64,000,000,000 B (×1000)
Each row = 100 B
Rows = 64,000,000,000 ÷ 100 = 640,000,000
How many 5 MB photos fit in 250 GB?
250 GB → 250,000 MB (×1000)
Each photo = 5 MB
Photos = 250,000 ÷ 5 = 50,000
How many 20 KB log entries fit in 2 TB?
100,000,000 log entries (100 million)
2 TB → 2,000 GB (×1000)
2,000 GB → 2,000,000 MB (×1000)
2,000,000 MB → 2,000,000,000 KB (×1000)
Each log entry = 20 KB
Logs = 2,000,000,000 ÷ 20 = 100,000,000
What does API Gateway do?
Request validation
API Gateway applies middleware (auth, rate limiting, ssl offloading etc.)
Routing
System design latency, how long for simple action, fetching list etc
~100ms
System design latency, complicated action, checkout etc
~1 second
Postgres column thats derived from data in the table
generated column,
When to use elasticsearch over postgres.
More sophisticated relevancy scoring
Faceted search capabilities
Fuzzy matching and "search as you type" features
Distributed search across very large datasets
Advanced analytics and aggregations
read-through cache strategy
Cache Invalidation and Consistency
Load Balancing
Round robin etc
How many messages could a single kafka broker handle?
About 1 million per second
Steps in system design answer
Functional Requirements
Non-functional Requirements
Core Entities
API or System Interface
High Level Design
Deep Dives
Rate limit algorithms
Fixed Window Counter - easy, efficient, but suffers from boundary effect
Sliding window log - accurate, memory-intensive
Token Bucket - buckets refills at a constant rate and reqs use tokens from it, good balance handling accuracy and bursty traffic
How many reads / writes can redis instance handle?
~100k ops/s
Redis read / write latency
0.5 - 1ms reads, 1 - 2 ms writes
Redis cache single node data storage limit
~1TB
Latency between instances within datacenter
Less than 1ms
Latency for a cross country tcp handshake
~60 - 80 ms
How to reduce latency between microservice applications
Keep TCP connections open, use connection pools
How can you make a service thats often sending small bits of data to another service more efficient?
Request batching
How to handle hot key
request coalescing - combining multiple requests for the same key into a single request.
Cache key fanout spreads a single hot key across multiple cache entries. Say 10 keys feed:taylor-swift:1, feed:taylor-swift:2, etc. Invalidation becomes harder and increases mem usage.
probabilistic early refresh
Serving cached data while refreshing it in the background. This refreshes cache entries before they expire, but not all at once. When your cache entry is fresh (just created), requests simply use it. But as it gets older, each request has a tiny chance of triggering a background refresh.
What is the number of seconds in a day
~100,000, actually 84,000
How many seconds in a year
30 million
Debezium is an example of what
database change data capture (CDC)