1/29
Flashcards covering key concepts from the ByteByteGo system design notes (Question and Answer format).
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is the difference between Big Endian and Little Endian byte ordering?
Big Endian stores the most significant byte at the lowest memory address; Little Endian stores the least significant byte first. Endianness matters when data moves between systems with different architectures; Big Endian is common in network protocols and some older architectures, while Little Endian is used by Intel x86.
What is Event Sourcing and what serves as the source of truth in this paradigm?
Event Sourcing persists a sequence of events as the source of truth; the event store is the source of truth. Examples include The New York Times article/event history, Change Data Capture (CDC) transforming table changes into events, and microservice event transmission (e.g., cart events) tracking state changes.
What is the Thunder Herd problem in caching, and how can it be mitigated?
Thunder herd occurs when many cache keys expire at the same time, causing a flood of queries to the database. Mitigations include adding randomization to expiry times and ensuring only core data hits the database while non-core data is delayed until cache is restored.
What is cache penetration and how can you prevent it?
Cache penetration happens when a key is missing from both the cache and the database. Prevention strategies include caching a null value for non-existent keys and using a Bloom filter to check key existence before querying the database.
What is cache breakdown and how can you mitigate it?
Cache breakdown happens when a hot key expires and many requests hit the database. Mitigations include avoiding expiration for hot keys, using techniques like pre-warming or soft-expiration, and coordinating rebuilds to prevent a burst of DB load.
What is a cache crash and how can you handle it?
Cache crash occurs when the cache is down and all requests hit the database. Solutions include circuit breakers and clustering the cache to improve availability and resilience.
What is the Linux Filesystem Hierarchy Standard (FHS) and why is it important?
FHS standardizes the Linux filesystem layout to provide a consistent directory structure across distributions. Not all distros follow it strictly, but understanding it helps navigate the system. Practice with commands like cd and ls from the root / and view the standard tree structure.
Name some recommended materials for cracking technical interviews as mentioned in the notes.
LeetCode, Cracking the Coding Interview, Neetcode, System Design Interview books (Alex Xu & Sahn Lam), Grokking the System Design, and Design Data-intensive Applications.
What are the four Git storage locations on a machine, and which one does git tag operate on?
Working directory, Staging area, Local repository, Remote repository. The git tag command operates on the local repository (tags live there; you push them to remotes if needed).
List the top use cases for UDP mentioned in the notes.
Live video streaming, DNS queries, Market data multicast, and IoT device communications.
What channels are included in a typical push notification system diagram?
In-app notifications, Email notifications, SMS/OTP notifications, and Social media pushes.
What is a key difference between REST APIs and GraphQL?
REST uses multiple endpoints with standard HTTP methods; GraphQL provides a single endpoint where clients request exactly the data they need, often aggregating data from multiple sources. GraphQL supports mutations and subscriptions but can complicate caching.
What are the five phases of a data pipeline as described in the notes?
Collect, Ingest, Store, Compute, Consume.
How do API and SDK differ?
API defines rules and protocols for how software components interact (endpoints, requests, responses). SDK is a packaged set of tools, libraries, sample code, and documentation to build applications for a platform.
Name the essential monitoring aspects covered in the cloud monitoring cheat sheet.
Data Collection, Data Storage, Data Analysis, Alerting, Visualization, Reporting and Compliance, and Automation.
What are the four GraphQL adoption patterns described?
Client-based GraphQL, GraphQL with Backend-for-Frontend (BFFs), Monolithic GraphQL, and GraphQL Federation.
What are the four main stages of Netflix API architecture evolution mentioned in the notes?
Monolith, Direct Access, Gateway Aggregation Layer, Federated Gateway (GraphQL Federation).
What does HTTP status code 401 represent and how does it relate to authentication vs authorization?
401 Unauthorized indicates the client must authenticate or that authentication failed. It is related to authentication; authorization is a separate concern about access rights.
What is XSS and how do Reflective and Stored XSS differ?
XSS is the injection of malicious scripts into web pages. Reflective XSS executes when the user interacts with the injected script; Stored XSS persists on the server and affects multiple users until cleaned. Mitigations include input validation, output encoding, and Content Security Policy (CSP).
What is Semantic Versioning (SemVer) and what do MAJOR, MINOR, and PATCH represent?
SemVer uses MAJOR.MINOR.PATCH. MAJOR changes are incompatible API changes; MINOR adds backward-compatible features; PATCH adds backward-compatible bug fixes.
What are the main steps when an SQL statement is executed (parse, transform, optimize, execute)?
Parse and validate the statement; transform into an internal representation (relational algebra); optimize and create an execution plan using indexes; execute the plan and return results.
What are the three file permission types and the three ownership types in Linux?
Permissions: Read (r), Write (w), Execute (x). Ownership: Owner, Group, Other.
What two design choices largely contribute to Kafka's high performance?
Sequential I/O and zero-copy data transfer (minimal data copies between application, kernel, and network).
What is a common way to avoid Kafka message loss on the producer/consumer side?
Configure acks and retries; balance asynchronous (high throughput) and synchronous commits for the final offset; ensure proper replication configuration for durability.
What is a VPN and what are its four steps in your own words?
A VPN creates a secure, encrypted tunnel over an untrusted network. Steps: establish a secure tunnel between client and VPN server; encrypt transmitted data; mask the client’s IP address; route internet traffic through the VPN server.
Name three foundational Kubernetes design patterns mentioned (at least two).
Foundational patterns include Health Probe Pattern, Predictable Demands Pattern, and Automated Placement Pattern (others include Init Container and Sidecar patterns).
What are the common HTTP verbs and their typical semantics (GET, POST, PUT, DELETE)?
GET: retrieve a resource (idempotent); POST: create a new resource (not idempotent); PUT: update or create a resource (idempotent); DELETE: remove a resource (idempotent).
What are the common cache eviction strategies listed (at least three)?
LRU (Least Recently Used), MRU (Most Recently Used), LFU (Least Frequently Used), FIFO (First In First Out), TTL (Time-to-Live), Two-Tiered Caching, Random Replacement (RR).
What is sharding and what are three sharding strategies?
Sharding is partitioning data into smaller pieces (shards) across multiple servers. Strategies include Range-based, Hash-based (Key/Hash-based), and Directory-based sharding.
What are the six common system design tradeoffs listed in the notes?
Cost vs. Performance, Reliability vs. Scalability, Performance vs. Availability, Security vs. Flexibility, Development Speed vs. Quality, and others implied by the framework (e.g., REST vs GraphQL, Batch vs Stream, Strong vs Eventual Consistency).