Kalshi System Design Practice

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/202

There's no tags or description

Looks like no tags are added yet.

Last updated 9:14 PM on 5/21/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

203 Terms

New cards

What are the 6 steps of the system design interview framework?

1) Requirements (~5 min), 2) Core Entities (~2 min), 3) API/Interface (~5 min), 4) Data Flow (optional, ~5 min), 5) High Level Design (~10-15 min), 6) Deep Dives (~10 min)

New cards

What are functional requirements?

What the system should DO — 'users should be able to...' statements. Example for Twitter: post tweets, follow users, see a feed. Pick the top ~3

New cards

What are non-functional requirements?

How well the system should do it — qualities like latency, availability, scale. Should be quantified. Bad: 'low latency'. Good: 'render feeds in under 200ms.' Pick top 3-5

New cards

What's the most important non-functional question to ask first?

Consistency vs availability (CAP). Then think about: scale (DAU, QPS), latency targets, durability needs, security, regional/geographic needs

New cards

Should you do back-of-the-envelope math upfront?

Usually no — only do math when it directly drives a design decision (e.g. 'can this fit on one machine?'). Tell the interviewer you'll do math when needed instead of computing storage/QPS for show

New cards

What are Core Entities?

A short bulleted list of the nouns/resources your system handles (e.g. for Twitter: User, Tweet, Follow). First draft only — fields come later in High-Level Design

New cards

What is the High-Level Design step?

Drawing boxes and arrows: clients, load balancer, app servers, DB, cache, queues. Build it endpoint-by-endpoint from your API. Keep it simple — add complexity in Deep Dives

New cards

What's the biggest mistake during High-Level Design?

Adding complexity too early and never finishing. Build a simple working design first, note where caches/queues could go later, then add complexity in Deep Dives

New cards

What's the goal of Deep Dives?

Harden the design to meet non-functional requirements, address edge cases, fix bottlenecks. Example: if requirement is 100M DAU, deep dive into horizontal scaling, caching, sharding

New cards

What are the 3 networking layers that matter for interviews?

Layer 3 Network (IP — addressing/routing), Layer 4 Transport (TCP/UDP — reliable vs fast delivery), Layer 7 Application (HTTP, WebSockets, gRPC — what apps speak)

New cards

What's the difference between TCP and UDP?

TCP = reliable, ordered, connection-based. The workhorse of the internet. UDP = fast, connectionless, no guarantees ('spray and pray'). Use UDP for video streaming, gaming, VoIP where speed matters more than reliability

New cards

What's the default protocol you should assume in interviews?

TCP. Interviewers expect it without you mentioning it. Only call out UDP if you specifically need it (real-time media, gaming, telemetry where loss is OK)

New cards

Why does geography matter for latency?

Speed of light is the floor. NY to London = ~80ms minimum through fiber, before any processing. For global low-latency systems you need regional deployments or CDNs

New cards

What is DNS?

Domain Name System — translates human names (google.com) to IP addresses. Also doubles as a basic load balancer by rotating IPs returned to different clients

New cards

What's the difference between HTTP and HTTPS?

HTTPS adds TLS encryption on top of HTTP. Contents are protected in transit. Always use HTTPS for public sites. Note: encryption doesn't mean you can trust the request body — always validate user identity from auth tokens, not body fields

New cards

Why is HTTP described as 'stateless'?

Each request is independent — the server doesn't remember prior ones. Good for system design because stateless servers are easy to scale horizontally (any server can handle any request)

New cards

What are Server-Sent Events (SSE)?

A way for the server to push many messages to a client over a single HTTP connection — one-way, server-to-client. Example: live sports scores, notifications, auction prices. Simpler than WebSockets

New cards

What are WebSockets and when do you use them?

A persistent, two-way TCP-like connection between client and server. Use when you need bidirectional real-time messaging (chat, multiplayer games, collaborative editing). Avoid them when SSE or polling would suffice — they're expensive to maintain at scale

New cards

What's the rule of thumb for real-time updates?

Start with HTTP polling (simplest). Upgrade to SSE if you need server push. Use WebSockets only for true bidirectional needs. Don't reach for WebSockets just because the problem says 'real-time'

New cards

What are the 'two hops' of real-time updates?

Hop 1: Server → Client (which protocol? polling, SSE, WebSockets, WebRTC). Hop 2: Event source → Server (how does the server learn about the update? polling a DB, consistent hashing, or pub/sub). You need to solve both

New cards

What is long polling?

Client makes an HTTP request, server holds it open until new data arrives (or a timeout), then responds. Client immediately opens a new request. Near real-time with zero special infrastructure. Tradeoff: extra latency between consecutive updates because the client must 'call back' each time

New cards

When is long polling a good fit?

When you need near-real-time but updates are infrequent — like waiting for a payment to process. Simple upgrade over basic polling. Also good when SSE/WebSocket infrastructure isn't available

New cards

What is the SSE reconnection mechanism?

SSE includes a 'last event ID'. If the connection drops, the browser's EventSource auto-reconnects and sends the last ID it received. The server uses that ID to replay missed events. Built into the spec

New cards

What's a common pattern for mixing SSE and HTTP for writes?

Use SSE for server-push (receiving updates) and plain HTTP POST/PUT for writes (sending data). You get real-time reads without the complexity of full WebSocket bidirectional connections

New cards

What's a WebSocket reference architecture pattern?

Terminate WebSockets early in a dedicated WebSocket service that handles connection management. The rest of the system stays stateless. The WS service is rarely redeployed (minimizing connection churn) and all other services communicate with it via internal APIs

New cards

What happens to WebSocket connections during server deployments?

Connections get severed. Clients must reconnect (and may miss updates). Solutions: drain connections gracefully before shutdown, use the WebSocket service pattern to minimize redeployments, and always implement client reconnection logic with missed-message recovery

New cards

What is the server-side 'pull' model for updates?

The simplest server-side approach: events are written to a database, and the server simply reads from the DB when the client polls. No special infrastructure. Tradeoff: not truly real-time. Works when you can tolerate seconds of delay

New cards

What is the server-side 'push via consistent hashing' model?

Each user is deterministically assigned to a specific server (via hash of user ID). When an update needs to reach User C, you hash their ID to find their server and send it there. That server forwards to the client's connection

New cards

How does consistent hashing assignment work for real-time connections?

Client connects to any server → server hashes client ID → looks up which server 'owns' that client → redirects client there. A coordination service (ZooKeeper/etcd) tracks server assignments. When scaling, only users in affected ring segments need to move

New cards

When should you use consistent hashing vs pub/sub for server-side push?

Consistent hashing: when each connection has expensive server-side state (e.g. a loaded document in Google Docs). Pub/sub: when connections are lightweight and you just need to forward messages. Pub/sub is simpler and more common

New cards

What is the pub/sub model for server-side push?

Endpoint servers are stateless — clients connect to any one. On connect, the server subscribes to the client's topic in a pub/sub service (Redis, Kafka). Updates are published to topics and the pub/sub service broadcasts to all subscribed endpoint servers, which forward to clients

New cards

What are the advantages of pub/sub for real-time updates?

Easy load balancing (use 'least connections'), no need for specific client→server routing, minimal state on endpoint servers, efficient fan-out. The pub/sub service (e.g. Redis Cluster) handles the hard distribution work

New cards

What's the tradeoff of pub/sub for real-time updates?

The pub/sub service becomes a single point of failure and potential bottleneck. You also don't automatically know when clients connect/disconnect. Scale by sharding subscriptions (Redis Cluster). Latency overhead is minimal (~10ms)

New cards

How do you handle reconnection and missed messages in real-time systems?

Track what each client has received (sequence numbers or last-event-ID). On reconnect, replay everything missed. Often implemented with Redis Streams or a per-user message queue. Heartbeat mechanisms help detect 'zombie' connections faster

New cards

How do you handle the celebrity fan-out problem in real-time?

Don't write the update to millions of individual feeds. Cache it once and distribute through layers: regional servers pull the update and push to their local clients. Hierarchical distribution reduces load on any single component

New cards

How do you maintain message ordering across distributed servers?

For most product interviews: funnel related messages through a single server or partition so you can stamp them with a consistent timestamp. For advanced cases: vector clocks or logical timestamps, but this is usually overkill for product SWE interviews

New cards

What is WebRTC and when is it used?

Peer-to-peer protocol primarily for browser-to-browser audio/video calls and conferencing. Niche — only use it for video calling. Don't try to use it for general collaborative apps in interviews

New cards

What's the difference between long polling and WebSockets?

Long polling = client repeatedly opens HTTP requests that the server holds open until data arrives. Simpler than WebSockets but less efficient. WebSockets keep one persistent connection alive for both sides to push messages freely

New cards

What's the default API style for interviews?

REST. Maps HTTP verbs (GET/POST/PUT/PATCH/DELETE) to resources identified by URLs. Use it unless you have a specific reason for GraphQL or gRPC

New cards

How do REST verbs map to operations?

GET = read (idempotent, no body). POST = create. PUT = full replace (idempotent). PATCH = partial update. DELETE = remove (idempotent). Idempotent means safe to retry

New cards

What is REST resource modeling?

Design around nouns, not actions. Bad: POST /updateUser. Good: PUT /users/{id}. Resources should be plural nouns and map to your Core Entities. Example: /events/{id}/tickets for tickets belonging to an event

New cards

Where should request data live in a REST API?

Path parameter = required identifier (/events/123). Query parameter = optional filter/sort (/events?city=NYC). Body = the payload you're creating/updating (booking details, post content)

New cards

What's the difference between path parameters and query parameters?

Path = the resource doesn't make sense without it (/users/{id}). Query = optional filter, could be omitted (/users?role=admin). If a query needs the field, make it a path param

New cards

What is GraphQL and when should you use it?

A query language where clients ask for exactly the fields they need from one endpoint. Best when clients have very different data needs (mobile vs web) and over/under-fetching is a real problem. Adds complexity — don't default to it

New cards

What is the N+1 problem in GraphQL?

When fetching a list with nested relationships triggers one query for the list plus N more for each item. Example: 1 query for events + 100 queries for each event's venue. Solved with batching/DataLoader patterns

New cards

What is gRPC and when do you use it?

A binary RPC framework using Protocol Buffers over HTTP/2. ~10x faster than JSON-over-HTTP. Use for internal microservice communication where speed and type safety matter. Avoid for public APIs (browsers don't support it well)

New cards

What's a common pattern for mixing API styles?

REST for external/public APIs (any client can use it), gRPC for internal service-to-service calls (faster, type-safe). Best of both worlds

New cards

What's the difference between cursor-based and offset-based pagination?

Offset (?offset=20&limit=10) = simple, supports 'jump to page X', but breaks when items shift. Cursor (?cursor=abc123) = stable for real-time data where new items appear, but no 'jump to page N'. Use cursor for feeds, offset for static lists

New cards

What's the difference between authentication and authorization?

Authentication = WHO you are (verify identity, often via JWT or session). Authorization = WHAT you can do (check permissions, often via roles). 'Auth' usually refers to both together

New cards

When do you use JWTs vs API keys?

JWT (JSON Web Token) for user sessions in web/mobile apps — self-contained, stateless, carries user identity. API key for service-to-service or third-party developers — long random strings, mapped to permissions in your DB

New cards

What is RBAC?

Role-Based Access Control — group permissions into roles (admin, customer, manager) and assign roles to users. Cleaner than granting individual permissions per user. In interviews, just call out which endpoints require which roles

New cards

What is rate limiting and where do you implement it?

Restricting how many requests a client can make per time window. Prevents abuse and accidental overload. Implemented at the API gateway or middleware. Return HTTP 429 when limits exceeded. Common patterns: per-user, per-IP, per-endpoint

New cards

What is idempotency and why does it matter?

An idempotent operation can be repeated safely with the same result. Critical for retries: if a payment request times out and you retry, idempotency prevents double-charging. Use idempotency keys (e.g. request ID) to deduplicate

New cards

What is retry with exponential backoff?

When a request fails, retry after increasing wait times (1s, 2s, 4s, 8s...). Prevents hammering a struggling service. The interview magic phrase. Add 'jitter' (randomness) so all clients don't retry at the exact same moment

New cards

What is a circuit breaker?

A pattern that stops calling a failing service to give it time to recover. After N failures, the 'circuit opens' and requests instantly fail without trying. After a timeout, it tests with one request. Prevents cascading failures across systems

New cards

What is a cascading failure?

When one service's failure causes others to fail in a chain. Classic case: DB goes down, app servers pile up retries, retries pin down the DB so it can't recover. The 'thundering herd' problem. Circuit breakers and backoff help prevent this

New cards

What is horizontal vs vertical scaling?

Vertical = bigger machine (more CPU/RAM). Horizontal = more machines. Vertical is simpler but capped by hardware. Horizontal is more common in interviews. Modern hardware is powerful — don't underestimate vertical scaling

New cards

What does a load balancer do?

Distributes incoming requests across multiple backend servers so no single server is overwhelmed. Also monitors server health and routes around failed ones. Examples: AWS ELB, NGINX, HAProxy

New cards

What's the difference between L4 and L7 load balancers?

L4 (transport): looks only at TCP/IP, fast, maintains persistent connections. Use for WebSockets. L7 (application): inspects HTTP content, can route based on URL/headers/cookies, more flexible. Use for normal HTTP APIs

New cards

What are common load balancing algorithms?

Round Robin (rotate), Random, Least Connections (route to server with fewest active connections — good for WebSockets), Least Response Time, IP Hash (same client → same server, for session stickiness)

New cards

What's the rule for L4 vs L7 in interviews?

If you have WebSockets or other persistent connections, use L4. Otherwise use L7 for the flexibility (routing by URL, header-based routing, etc.)

New cards

What is health checking?

The load balancer periodically pings each backend to check if it's alive. Failed checks → server is removed from the pool until it recovers. Can be TCP-level (is it accepting connections?) or HTTP-level (does /health return 200?)

New cards

What types of databases exist and when do you pick each?

Relational (Postgres) = default, structured data + relationships + ACID. Document (MongoDB) = flexible/evolving schemas. Key-value (Redis, DynamoDB) = fast lookups by exact key. Wide-column (Cassandra) = write-heavy/time-series. Graph (Neo4j) = relationship-heavy traversals (rarely needed)

New cards

What's the safe default DB choice for interviews?

PostgreSQL (relational). Handles 90% of problems well, has JSON support if you need flexibility, scales further than most candidates think (Facebook ran on MySQL). Use NoSQL only with a specific reason

New cards

Why avoid the 'SQL vs NoSQL' debate in interviews?

Broad statements ('I need NoSQL for scale') are yellow flags. They overlap heavily — Postgres can store JSON and scale horizontally, NoSQL can model relationships. Instead, just say 'I'm using X because of feature Y for this specific problem'

New cards

What does ACID stand for?

Atomicity (all-or-nothing), Consistency (data stays valid), Isolation (concurrent transactions don't interfere), Durability (committed data survives crashes). Standard for relational DBs like Postgres/MySQL

New cards

What is a database transaction?

Group multiple operations into a single atomic unit — either all succeed or all fail. Example: deducting inventory AND creating an order together, so you never have an order for an item that wasn't reserved

New cards

What 3 things should drive your schema design?

1) Data volume (will it fit on one machine?), 2) Access patterns (how will it be queried? — most important factor), 3) Consistency requirements (does it need strong consistency or is eventual OK?)

New cards

What's the difference between normalization and denormalization?

Normalized = each piece of data in exactly one place (users table separate from orders table). Avoids inconsistency but requires joins. Denormalized = duplicate data for faster reads (copy username into each order). Faster reads but updates are painful

New cards

What's the safe default approach?

Start normalized. Denormalize specific hot paths only when you've identified a read performance problem. Even then, often better to put a cache in front of the normalized data instead of denormalizing the source of truth

New cards

What are foreign keys and what do they enforce?

A column that references the primary key of another table (posts.user_id → users.id). Enforces referential integrity — prevents orphaned records like a post referencing a deleted user. Some companies drop them at scale for write speed and enforce integrity in app code

New cards

What are the common relationship types?

One-to-many (1 user has many posts), many-to-many (users like many posts, posts liked by many users — usually a join table), one-to-one (rare — often means the tables should just be merged)

New cards

What's a good primary key choice?

A system-generated ID (user_id = 12345), not business data like email. Business data changes; system IDs stay stable forever

New cards

What is a database index and why use it?

A data structure that makes queries fast by avoiding full table scans. Like a book's index. Without one, finding a user by email in 10M rows scans all 10M. With an email index, jump straight to the row in milliseconds

New cards

What is a B-tree index?

The default index type for most relational DBs. Sorted tree structure that supports both exact lookups (find user X) AND range queries (orders between dates A and B). Slower than hash for exact match, but more versatile

New cards

What is a hash index and its tradeoff?

Maps keys to values via a hash function. Faster than B-tree for exact-match lookups, but can't do range queries at all. Less common as the default

New cards

What are specialized indexes for?

Full-text indexes (e.g. Elasticsearch) for keyword search in text. Geospatial indexes (e.g. PostGIS, Redis GEO) for 'find restaurants within 5km'. Use when normal indexes can't answer your query pattern

New cards

What is a composite (multi-column) index?

Index on multiple columns together, e.g. (user_id, created_at). Lets you efficiently query 'user X's recent posts'. Order matters — leftmost columns are the most useful for filtering

New cards

How do indexes affect writes?

They slow writes down because every insert/update has to update the index too. So don't index everything — only the columns you actually query on frequently

New cards

What is a cache and what's the typical speedup?

Stores frequently-accessed data in fast memory (Redis) to skip slow disk lookups. Typical: ~1ms cache hit vs 20-50ms database query. Also reduces load on the DB

New cards

What is the cache-aside pattern?

The default caching pattern. On read: check cache → if hit, return → if miss, query DB, write to cache with TTL, return. Application controls everything. Memorize this one

New cards

What is write-through caching?

Write to cache AND DB at the same time (synchronously). Reads always fresh, but writes are slower. Less common — requires special caching infrastructure

New cards

What is write-behind (write-back) caching?

Write to cache only; cache asynchronously flushes to DB later. Very fast writes, but risk of data loss if cache crashes before flushing. Use for analytics/metrics where loss is acceptable

New cards

What is read-through caching?

The cache itself fetches from DB on miss (app never talks to DB directly). CDNs work this way. Less common for app-level caching — cache-aside is preferred

New cards

What are common cache eviction policies?

LRU (Least Recently Used — default for most caches), LFU (Least Frequently Used — good for trending content), FIFO (first in, first out — rarely used), TTL (just set an expiration time, often combined with LRU)

New cards

What is a cache stampede (thundering herd)?

A popular cache entry expires and thousands of requests simultaneously try to rebuild it, hammering the DB. Fixes: request coalescing (only one request rebuilds, others wait), cache warming (refresh popular keys before they expire)

New cards

What is the hot key problem?

One specific cache entry gets disproportionately huge traffic (e.g. Taylor Swift's profile). That one Redis node gets overloaded while others sit idle. Fixes: replicate hot keys across multiple nodes, use in-process cache for super-hot keys, rate-limit abusive callers

New cards

What is cache invalidation and why is it hard?

Keeping the cache in sync with the DB. When data changes in the DB, the cached copy becomes stale. Strategies: invalidate on writes (delete the cache entry), use short TTLs and accept some staleness, or combine both

New cards

What's a common caching mistake?

Caching everything. Only cache data that's read often and changes rarely. Caching data that changes every request adds latency for no benefit. Profile first, cache the hot paths

New cards

What is a CDN?

Content Delivery Network — distributed cache servers around the world that serve content from edge locations close to users. Best for static media (images, videos, JS). Can also cache API responses and dynamic content if cacheable

New cards

What is in-process caching and when is it useful?

Caching data in the app server's own memory instead of calling Redis. Faster (no network call) but each server has its own copy — no sharing, no consistency. Good for tiny hot data like feature flags, config values, super-hot keys

New cards

What is sharding?

Splitting your data across multiple independent database servers when one machine can't keep up. Each shard holds a subset of the data. Used when you hit limits on storage, write throughput, or read throughput (even with replicas)

New cards

What's the difference between partitioning and sharding?

Partitioning = splitting data within ONE machine (e.g. one big table into smaller partitions). Sharding = splitting data ACROSS multiple machines. People use the terms loosely

New cards

What makes a good shard key?

1) High cardinality (many unique values, not a boolean), 2) Even distribution (avoid 90% in one country), 3) Aligns with query patterns (queries hit one shard, not all). Example: user_id is great for user-centric apps

New cards

What are the 3 sharding strategies?

Hash-based (default — hash the shard key, distributes evenly), Range-based (assign ranges like users 1-1M to shard 1 — good for multi-tenant, bad for time-based writes), Directory-based (lookup table — flexible but adds latency and single point of failure)

New cards

What is the celebrity/hot-shard problem in sharding?

Even with a good shard key, some keys are inherently more active (Taylor Swift's user_id gets 1000x more reads). That shard becomes the bottleneck. Fix: dedicate a shard to celebrities, use compound keys, or shard splitting

New cards

Why are cross-shard queries expensive?

They have to hit every shard, wait for all responses, then merge. 'Top 10 trending posts' might query 64 shards. Mitigations: cache the results, denormalize so related data lives on one shard, precompute via background jobs

New cards

What's the biggest sharding mistake in interviews?

Sharding too early. A tuned single Postgres + read replicas handles way more than people think (tens of TB, tens of thousands of writes/sec). Justify with numbers before proposing sharding

New cards

What problems does sharding create?

Cross-shard transactions are nearly impossible (need 2PC or sagas — both painful), hot spots, expensive resharding (moving data when adding shards), harder operational complexity

100

New cards

What problem does consistent hashing solve?

With naive hash % N: adding one server to a 10-server cluster means ~90% of keys map to a different server — you have to move 90% of data. With consistent hashing, only ~10% moves