1/118
A comprehensive set of vocabulary flashcards covering key terms and definitions from the lecture on system-design fundamentals.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
System Design
The blueprint for combining databases, APIs, caches, load balancers, and other components to build software that performs, scales, and adapts reliably at large user and data volumes.
Importance of System Design for Developers
Crucial for system-design interviews and demonstrates an engineer’s ability to build robust, scalable systems beyond simply writing code.
Common System-Design Trade-offs
Speed vs. cost, consistency vs. availability, simplicity vs. flexibility (or scalability), and performance vs. durability.
System-Design Interview Soft Skills
Problem-solving, big-picture thinking, trade-off analysis, clear communication, and adaptability to changing constraints.
Functional Requirements
User-visible capabilities expressed as “Users can …” statements that describe what the system must do.
Non-Functional Requirements
Qualities the system must exhibit—performance, availability, scalability, reliability, consistency, durability, security—that support functional needs.
Structured System-Design Approach
1) Gather functional requirements, 2) identify non-functional requirements, 3) design API endpoints, 4) draft high-level architecture, 5) refine for non-functional goals, 6) deep-dive on critical components.
System Design Master Template
Reusable pattern: write path → message queue → DB & cache via async workers; read path → cache first; stateless services behind a load balancer for scalable, resilient systems.
Components (in System Design)
Reusable building blocks—microservices, databases, caches, message queues—that are combined to solve design problems.
Monolith
A single codebase/process that is easy to start but hard to scale and isolate failures.
Microservices
Many small, independent services that can deploy and scale separately, isolate faults, and map to domain boundaries.
Advantages of Microservices
Independent deployment, fault isolation, independent scaling, and smaller codebases aligned with business domains.
Key Microservice Characteristics
Service independence, API-based communication (REST/gRPC), database per service, and independent scaling.
Service Discovery
A mechanism that allows services to find each other dynamically at runtime, e.g., via a registry like Consul or Eureka.
Data Consistency Across Microservices
Maintained using two-phase commit, saga, event sourcing, or eventual consistency with asynchronous messages.
Fault Isolation (Microservices)
Containment of failures within one service so they don’t cascade and bring down the entire system.
Microservice Implementation Stacks
Spring Boot (Java), Node.js with Express, and Go with Gin.
Relational Database Use Cases
Chosen when data needs strong integrity, relationships, and ACID transactions, e.g., financial records, user profiles, inventory.
Table (Relational Database)
A structured set of rows (records) and columns (fields) representing an entity.
Primary Key
A column or set of columns whose values uniquely identify each row in a table.
Foreign Key
A column in one table that references the primary key of another, creating a relationship.
Database Relationships
One-to-one, one-to-many, and many-to-many relationships between tables.
Popular Relational Databases
PostgreSQL, MySQL, and SQLite.
NoSQL
Databases designed for unstructured or rapidly changing schemas, large horizontal scale, or minimal complex relationships.
NoSQL Categories
Key-value, document, column-family, and graph databases.
Key-Value Store
Stores key–value pairs; ideal for caching, sessions, or user preferences (e.g., Redis).
Document Database
Stores semi-structured JSON/BSON documents allowing nested data; suited for content management or user profiles (e.g., MongoDB).
Column-Family Store
Stores rows with wide, flexible columns grouped into families for analytics or time-series data (e.g., Cassandra).
Graph Database
Represents data as nodes and edges; optimized for traversing relationships like social networks or fraud detection (e.g., Neo4j).
Example NoSQL Implementations
Redis (key-value), MongoDB (document), Cassandra (column-family), Neo4j (graph), DynamoDB (key-value/document).
NoSQL vs. Relational Databases
NoSQL offers schema flexibility, horizontal scaling, and eventual consistency, whereas relational DBs provide fixed schemas, strong consistency, and rich joins.
Object Storage
Flat, distributed storage that keeps data as objects with metadata and unique IDs, optimized for large unstructured files.
Parts of an Object (Object Storage)
Data blob, metadata, and a globally unique identifier.
Object Storage Use Cases
Static asset hosting, backups/archives, and big-data analytics.
Major Object-Storage Services
Amazon S3, Google Cloud Storage, and Azure Blob Storage.
Caching
Technique that reduces latency and backend load by storing frequently accessed data closer to the client or server.
Cache Hit vs. Cache Miss
Hit: data found in cache; Miss: data absent, fetched from source, then cached.
Cache Eviction Policies
Least Recently Used (LRU), First-In-First-Out (FIFO), and Least Frequently Used (LFU).
Cache Invalidation Strategies
Time-To-Live (TTL), event-based invalidation on data change, and manual refresh/invalidate calls.
Write-Through Caching
Writes go to cache and database synchronously.
Write-Behind Caching
Writes update cache first, then database asynchronously.
Write-Around Caching
Writes bypass cache, go directly to DB; cache populated on subsequent read.
In-Memory vs. Disk-Based Cache
In-memory (e.g., Redis) offers micro-second access but is volatile; disk-based (e.g., Varnish) persists data but is slower.
Client-Side vs. Server-Side Cache
Client-side resides on user device/browser; server-side sits near the backend shared by all users.
Common In-Memory Cache Tools
Redis and Memcached.
CDN (Content Delivery Network)
A distributed network of edge servers that cache and deliver static (and some dynamic) content from locations closer to users.
Latency Reduction via CDN
Routes requests to the nearest edge node; serves cached copies to avoid long round-trips to origin.
CDN Request Flow
Cache hit: edge node returns asset; Cache miss: edge fetches from origin, caches it, then serves to user.
TTL (in CDN Caching)
Time To Live—the duration an asset remains cached before revalidation with the origin.
CDN Providers
Cloudflare, Amazon CloudFront, and Akamai.
Message Queue
A system that decouples producers and consumers, smooths traffic spikes, prevents overload, and guarantees delivery during service outages.
Producer (Message Queue)
A service that sends messages or tasks to the queue.
Consumer (Message Queue)
A service that pulls messages from the queue and processes them.
FIFO Ordering (Queues)
Guarantees that messages are delivered and processed in the exact order they were produced.
Acknowledgment (Message Queue)
Signal sent by a consumer to confirm successful processing so the queue can delete the message.
Dead Letter Queue
A secondary queue that stores messages that repeatedly fail processing for later inspection.
Point-to-Point Messaging
Each message is consumed by a single receiver.
Publish–Subscribe (Pub/Sub)
Messaging pattern where messages are broadcast to multiple subscribers.
Message Queue Tools
RabbitMQ, Apache Kafka, and AWS SQS.
Scalability Benefits of Queues
Buffers traffic spikes and enables asynchronous, parallel processing.
E-commerce Queue Use Case
Order service emits order event; inventory, payment, and shipping services consume concurrently via Pub/Sub.
API Gateway
A single entry point that routes, secures, and aggregates client API requests for multiple backend services.
API Gateway Cross-Cutting Concerns
Authentication/authorization, rate limiting, request/response logging, and caching.
Security via API Gateway
Hides internal services, reduces exposed endpoints, and centralizes authentication and throttling.
Response Aggregation (Gateway)
Gateway fetches data from multiple services and combines it into a single payload before returning it to the client.
API Gateway Implementations
AWS API Gateway, Kong Gateway, and NGINX API Gateway.
Readable API Paths
Intuitive endpoint names (e.g., /tweet) that improve clarity for developers and interviewers.
One-Endpoint-Per-Requirement Rule
Mapping one endpoint to each functional requirement keeps APIs simple and aligned with user actions.
Twitter Functional Requirements (Example)
Post tweets, view tweets, view feed, follow users, like tweets, comment on tweets.
Tweet Post Endpoint Example
POST /tweet with userid & content; returns tweetid and status.
Horizontal Scaling
Adding more instances of a service or database to distribute load rather than upgrading a single machine.
Load Balancer Role
Distributes incoming traffic across instances, performs health checks, and ensures high availability.
Cache in Feed Generation
Precomputes and stores recent tweets so the feed service fetches from memory instead of querying the database every time.
Message Queue for Tweet Ingestion
Decouples the write path, persists tweets if a service is down, smooths spikes, and ensures eventual delivery to feed and DB.
Distributed Database Durability
Replicates data across nodes/regions so it persists even if one server fails; supports automated backups.
Fan-out-on-Write
Pushes a tweet to followers’ feeds at write time.
Fan-out-on-Read
Computes tweets for large-follower accounts on demand to avoid write storms.
Computing Twitter Trends
Uses local sliding-window aggregations per region, then global aggregation; results are cached with TTL.
Scalable Tweet Search Architecture
Tweets sent to indexing service via queue, indexed into sharded search engine (Elasticsearch); queries ranked and cached.
Sharding
Partitioning data horizontally across multiple machines based on a key to scale beyond one node.
Load Balancing for Concurrent Users
Spreads user requests across replicated servers, keeping each within its QPS limits.
Sharding for Data Volume
Distributes large data sets so no single machine must store or process everything.
Asynchronous Writes via Queue
Makes writes fast and responsive by returning immediately after enqueueing; backend processes in background.
Eventual Consistency
Guarantee that all replicas converge to the same value over time, tolerating temporary stale reads.
Decomposition
Splitting a monolithic application into microservices organized around business capabilities.
Vertical Scaling
Increasing resources (CPU, RAM) of a single machine to handle more load.
Consistent Hashing
Technique mapping keys to a ring of nodes so node additions/removals cause minimal data movement.
Caching for Scalability
Reduces repeated database reads, lowers latency, and saves compute cost.
Message Queue Buffering
Queue absorbs traffic bursts and processes at a rate the database can handle, preventing overload.
Read/Write Separation (Leader-Replica)
Leader handles writes; replicas serve reads, improving throughput while maintaining consistency.
CQRS
Command Query Responsibility Segregation—separates models/stores for writes (commands) and reads (queries).
Master Template Architecture Elements
Stateless write service → message queue → async workers update DB & cache; read service → cache; DB as source of truth.
Read Path (Master Template)
Client → load balancer → read service → cache → (if miss) DB via cache updater.
Write Path (Master Template)
Client → load balancer → write service → enqueue message → workers update DB and cache.
Stateless Services
Services that hold no session data, enabling simple horizontal scaling and failover.
ACID Properties
Atomicity, Consistency, Isolation, Durability—transaction guarantees in relational databases.
TTL (Time To Live)
Expiration timer after which cached or stored data is considered stale and eligible for removal.
Versioning (Object Storage)
Storing multiple versions of an object to recover from deletions or overwrite mistakes.
Lifecycle Policies (Object Storage)
Automated rules that transition objects to cheaper tiers or delete them after certain age/access patterns.
Inverted Index
Data structure mapping terms to lists of documents or tweet IDs containing them, enabling full-text search.