1/87
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is the primary purpose of a Content Delivery Network (CDN)?
A. Encrypt user data during transmission
B. Deliver content to users from the closest server to reduce latency
C. Host dynamic databases
D. Replace the need for an origin server
Correct Answer: B. Deliver content to users from the closest server to reduce latency
🧠 Explanation:
A CDN works by caching content on distributed servers (called edge locations) that are geographically closer to users. This reduces the time it takes for content like images, videos, or HTML files to load, which improves user experience and reduces server load.
Which of the following is a correct use case for a CDN?
A. Storing user passwords securely
B. Hosting server-side business logic
C. Caching static assets like images and JavaScript files
D. Running complex database queries
Correct Answer: C. Caching static assets like images and JavaScript files
🧠 Explanation:
CDNs are most commonly used to cache and serve static content such as CSS, JS, images, and video. This allows faster delivery since the data doesn’t have to be fetched from the main server each time.
How can CDNs improve API performance?
A. By dynamically generating API responses at the edge
B. By caching frequently accessed API responses
C. By replacing the API server completely
D. By encrypting all API traffic
Correct Answer: B. By caching frequently accessed API responses
Explanation:
Even though APIs often serve dynamic content, many responses (e.g., a blog post, a leaderboard) don’t change frequently. CDNs can cache these responses, reducing the number of requests that hit the origin server and improving response times.
Which of the following best describes how eviction works in a CDN?
A. Data is removed randomly every 24 hours
B. Cached content stays forever unless manually deleted
C. Cached content is removed based on policies like TTL or invalidation rules
D. All content is cleared once a new deployment is made
Correct Answer: C. Cached content is removed based on policies like TTL or invalidation rules
Explanation:
Like regular caches, CDNs use eviction policies to manage what stays in cache. TTL (time-to-live) defines how long content is kept before being refreshed, and cache invalidation allows manual or automated removal when content changes.
Why is using a CDN important for global applications like Instagram?
A. It encrypts passwords and cookies
B. It provides faster database indexing
C. It helps deliver user-generated content (like profile pictures) quickly worldwide
D. It processes business logic closer to the user
Correct Answer: C. It helps deliver user-generated content (like profile pictures) quickly worldwide
Explanation:
For platforms like Instagram, user media (profile pics, videos, thumbnails) can be cached and served from edge servers near the user, minimizing latency and improving load times, especially for users far from the origin server.
What is the primary benefit of using a distributed cache?
A. Encrypting user data
B. Reducing data size
C. Lowering latency and reducing database load
D. Increasing storage capacity
Correct Answer: C. Lowering latency and reducing database load
Explanation:
Distributed caches store frequently used or expensive-to-compute data in memory across multiple servers. This reduces the number of queries to the database and speeds up data retrieval, lowering latency.
Which of the following scenarios is the best fit for a distributed cache?
A. Storing passwords
B. Performing backups
C. Caching results of expensive queries
D. Running cron jobs
Correct Answer: C. Caching results of expensive queries
Explanation:
If a query (like fetching a social media feed) is slow but doesn't change often, caching its result reduces load and speeds up responses.
Which of these is not a common eviction policy for distributed caches?
A. Least Recently Used (LRU)
B. Least Frequently Used (LFU)
C. Most Recently Updated (MRU)
D. First In, First Out (FIFO)
Correct Answer: C. Most Recently Updated (MRU)
Explanation:
MRU is not a standard eviction policy. Common ones include LRU, LFU, and FIFO, which prioritize eviction based on access patterns or insertion order.
Which of the following best describes a write-through cache?
A. Data is written to cache first, and the database is updated later
B. Data is only written to the cache
C. Data is written to both cache and database at the same time
D. Data is written directly to the database, skipping the cache
Correct Answer: C. Data is written to both cache and database at the same time
Explanation:
In a write-through strategy, data is written to the cache and database simultaneously, which ensures consistency but may slow down writes slightly.
When designing a distributed cache system, why is cache invalidation important?
A. To delete unused tables from the database
B. To prevent SQL injection attacks
C. To ensure cached data is up-to-date with the source of truth
D. To back up memory to disk
Correct Answer: C. To ensure cached data is up-to-date with the source of truth
Explanation:
Cache invalidation ensures that when underlying data changes (e.g., a concert venue update), the stale cached version is removed or updated.
What is the main tradeoff of a write-back caching strategy?
A. Slower write speed
B. No consistency issues
C. Risk of data loss if cache is not persisted
D. Always up-to-date data
Correct Answer: C. Risk of data loss if cache is not persisted
Explanation:
Write-back caches write data to cache first and sync to the database later. This is fast but risky—if the cache crashes before syncing, data may be lost.
You are building a leaderboard of top events. What data structure in Redis would you use in the cache?
A. Hash
B. Sorted Set
C. List
D. String
Correct Answer: B. Sorted Set
Explanation:
Redis sorted sets store elements with scores and keep them sorted. Perfect for leaderboards or top-k queries like “top events by popularity.”
Which of these is a benefit of using Redis over Memcached?
A. Redis supports more complex data structures like lists and sorted sets
B. Redis requires less memory
C. Redis has faster write speed
D. Redis supports SQL joins
Correct Answer: A. Redis supports more complex data structures like lists and sorted sets
Explanation:
Unlike Memcached (which only supports strings), Redis supports advanced data structures such as lists, sets, sorted sets, and more, making it more flexible.
Why might you use a distributed cache to store user session data?
A. To avoid using browser cookies
B. To make the UI faster
C. To reduce load on your database during high user traffic
D. To increase password security
Correct Answer: C. To reduce load on your database during high user traffic
Explanation:
Storing user sessions in a cache allows for quick lookups without hitting the database, which is critical when supporting millions of concurrent users.
When would a write-around cache strategy be preferred?
A. When you want every write to immediately reflect in the cache
B. When you want to avoid filling the cache with rarely-read data
C. When cache data must always be in sync with the database
D. When you want to ensure the cache is the source of truth
Correct Answer: B. When you want to avoid filling the cache with rarely-read data
Explanation:
Write-around skips writing to the cache and only writes to the database. This avoids polluting the cache with data that might never be read again.
What is the primary purpose of a distributed lock in system design?
A. Encrypt user data
B. Prevent multiple systems from accessing the same resource at the same time
C. Reduce server storage costs
D. Speed up database reads
Correct Answer: B. Prevent multiple systems from accessing the same resource at the same time
Explanation:
A distributed lock ensures that only one process or server can act on a resource at a time, which prevents conflicts, race conditions, and inconsistent states.
Which technology is commonly used to implement distributed locks?
A. MySQL
B. Kafka
C. Redis
D. Elasticsearch
Correct Answer: C. Redis
Explanation:
Redis, with its atomic operations and TTL (time-to-live), is a popular choice for distributed locks. It ensures safe, temporary locking using keys.
What does setting an expiration time on a distributed lock help prevent?
A. Slow queries
B. Security breaches
C. Locks getting stuck if the process crashes
D. Cache misses
Correct Answer: C. Locks getting stuck if the process crashes
Explanation:
If a process that holds a lock crashes, the lock can remain forever unless it expires automatically. This helps avoid unintentional deadlocks.
In which scenario would a distributed lock be most appropriate?
A. Sorting a list of products
B. Generating a user profile picture
C. Holding a concert ticket in a shopping cart during checkout
D. Autocomplete search suggestions
Correct Answer: C. Holding a concert ticket in a shopping cart during checkout
Explanation:
This is a classic use case — we want to ensure only one user can hold or buy the ticket at a time. Distributed locks help enforce this behavior.
What problem can occur when two processes are waiting for each other to release a lock?
A. Race condition
B. Memory leak
C. Deadlock
D. Data duplication
Correct Answer: C. Deadlock
Explanation:
A deadlock happens when two or more processes are waiting on each other to release a lock, and none of them can proceed.
What is Redlock?
A. A database sharding technique
B. A Kafka messaging queue
C. A distributed locking algorithm using multiple Redis nodes
D. A hashing algorithm for security
Correct Answer: C. A distributed locking algorithm using multiple Redis nodes
Explanation:
Redlock is a distributed lock algorithm created by Redis' creator. It ensures that a lock is safely acquired even across multiple Redis servers.
What happens if two servers try to acquire the same distributed lock at the same time?
A. Both get the lock and proceed
B. Neither proceeds
C. The one that succeeds first gets the lock; the other fails
D. They split the work
Correct Answer: C. The one that succeeds first gets the lock; the other fails
Explanation:
Distributed locks rely on atomic operations. Only one process will successfully acquire the lock; others will either retry or fail.
How can distributed locks help prevent duplicated scheduled jobs (cron jobs) across servers?
A. By reducing job priority
B. By locking the task so only one server runs it
C. By using JWTs for authentication
D. By running the job faster
Correct Answer: B. By locking the task so only one server runs it
Explanation:
Distributed locks are useful when multiple servers might run the same job at the same time. Locking ensures only one server runs the job.
What is one best practice to avoid deadlocks when using distributed locks?
A. Always retry failed requests
B. Use locks only in the frontend
C. Acquire all locks in a consistent, pre-defined order
D. Avoid locking resources
Correct Answer: C. Acquire all locks in a consistent, pre-defined order
Explanation:
To avoid deadlocks, always acquire multiple locks in the same order across all processes. Random or nested lock acquisition patterns can cause deadlocks.
Which of the following best describes locking granularity?
A. The speed at which a lock is acquired
B. The size of memory used by the lock
C. The scope of the resource being locked (single item vs group of items)
D. The number of locks a server can handle
Correct Answer: C. The scope of the resource being locked (single item vs group of items)
Explanation:
Locking granularity refers to whether you're locking one item (like a ticket) or a group (like an entire stadium section). Finer granularity gives more concurrency but can be more complex.
What is the main purpose of using a stream in system design?
A. To store large files
B. To process and retain events in real-time for multiple consumers
C. To encrypt user data securely
D. To back up application logs
Correct Answer: B. To process and retain events in real-time for multiple consumers
Explanation:
Streams are designed to ingest, store, and process continuous flows of data in real-time. They are ideal for systems that need to react quickly to events like user actions or financial transactions.
What is event sourcing?
A. A technique for indexing database rows
B. A design pattern that stores every change as a state snapshot
C. A way to store application changes as a sequence of events
D. A method to generate SQL queries from user input
Correct Answer: C. A way to store application changes as a sequence of events
Explanation:
In event sourcing, every change in application state is recorded as an immutable event. These events can be replayed later to rebuild state or perform audits.
Which of the following scenarios is NOT a typical use case for a stream?
A. Replaying historical events to rebuild system state
B. Supporting real-time analytics on user actions
C. Persisting user profile pictures
D. Enabling chat applications to broadcast messages in real-time
Correct Answer: C. Persisting user profile pictures
Explanation:
Streams are optimized for event-based, real-time data, not for storing binary/static assets like images. That’s typically a job for object storage or a CDN.
Which feature allows multiple independent consumers to read the same stream in parallel?
A. Windowing
B. Partitioning
C. Replication
D. Consumer groups
Correct Answer: D. Consumer groups
Explanation:
Consumer groups allow different consumers to read and process the same data independently. Each group maintains its own read position.
What is windowing in stream processing used for?
A. Encrypting real-time events
B. Batching events based on time or count
C. Deleting old data
D. Sorting logs alphabetically
Correct Answer: B. Batching events based on time or count
Explanation:
Windowing helps group events that occur within a specific time range or after a certain number of events, enabling operations like hourly averages or rolling counts.
Why is partitioning important in stream processing?
A. It ensures that all events are sorted alphabetically
B. It enables horizontal scaling by distributing workload
C. It allows data encryption across nodes
D. It simplifies event replay
Correct Answer: B. It enables horizontal scaling by distributing workload
Explanation:
Partitioning spreads events across multiple machines, so multiple consumers can process different partitions in parallel, improving scalability.
What problem does replication solve in stream architectures?
A. Duplicate event consumption
B. Slow query performance
C. Data loss due to server failure
D. High network latency
Correct Answer: C. Data loss due to server failure
Explanation:
Replication ensures that data is copied across multiple servers. If one fails, another can take over without losing any events, ensuring fault tolerance.
.What is the advantage of streams over message queues?
A. Streams are faster for batch processing
B. Streams retain messages and allow re-reading from a specific position
C. Message queues have better real-time performance
D. Streams can only deliver messages once
Correct Answer: B. Streams retain messages and allow re-reading from a specific position
Explanation:
Unlike queues (which typically delete messages after delivery), streams persist messages and let consumers re-read from any point, enabling more flexible processing.
In event sourcing, how do you reconstruct the current state of an application?
A. By reading the current database row
B. By applying a cache invalidation strategy
C. By replaying all the events from the stream
D. By reloading the front-end application
Correct Answer: C. By replaying all the events from the stream
Explanation:
In event sourcing, you reconstruct state by replaying events in the order they occurred. This makes it easy to understand how the system arrived at its current state.
Which technology is best suited for building a stream-based system with event sourcing?
A. MySQL
B. Redis
C. Kafka
D. Memcached
Correct Answer: C. Kafka
Explanation:
Kafka is a high-throughput, distributed streaming platform that supports event sourcing, replay, partitioning, and multiple consumers, making it a top choice for such systems.
What is the primary purpose of using a queue in a system architecture?
A. Encrypt user data for security
B. Speed up front-end rendering
C. Buffer bursty traffic and distribute workloads
D. Store data permanently
Correct Answer: C. Buffer bursty traffic and distribute workloads
Explanation:
Queues absorb traffic spikes and allow background workers to process tasks at their own pace, helping smooth load and distribute tasks across systems.
What happens when a queue is added to a system with tight latency requirements (e.g., < 500ms)?
A. It helps meet the latency target
B. It has no effect
C. It may cause the latency target to be missed
D. It decreases latency significantly
Correct Answer: C. It may cause the latency target to be missed
Explanation:
Queues introduce asynchronous processing, which may delay responses and break strict latency guarantees in real-time systems.
Which of the following best describes a Dead Letter Queue (DLQ)?
A. A queue for old messages
B. A queue that stores messages that failed processing
C. A queue for expired API tokens
D. A queue that stores duplicate messages
Correct Answer: B. A queue that stores messages that failed processing
Explanation:
Dead Letter Queues are used to catch unprocessable messages after all retry attempts fail, allowing developers to inspect and debug issues.
Why is backpressure important in a queuing system?
A. To guarantee FIFO ordering
B. To prevent message loss
C. To throttle message production when the system is overwhelmed
D. To ensure that only one consumer processes a message
Correct Answer: C. To throttle message production when the system is overwhelmed
Explanation:
Backpressure prevents queues from overflowing by signaling producers to slow down or stop until capacity is available, protecting system stability.
What does FIFO stand for and why is it important in queues?
A. Find In Fast Order — Ensures fast processing
B. First In First Out — Ensures ordering of messages
C. Fast Input Fast Output — Ensures high speed
D. Fully Indexed For Output — Ensures consistency
Correct Answer: B. First In First Out — Ensures ordering of messages
Explanation:
FIFO ensures that messages are processed in the order they were received, which is important in many real-time and transactional systems.
Which scenario is NOT an ideal use case for a queue?
A. Buffering photo uploads for background processing
B. Managing peak-hour ride requests in ride-sharing apps
C. Delivering high-frequency stock price updates with low latency
D. Distributing compute-intensive tasks to multiple servers
Correct Answer: C. Delivering high-frequency stock price updates with low latency
Explanation:
Queues introduce latency and are not suitable for real-time, low-latency use cases like live stock tickers. Streams or websockets would be better.
What is a retry mechanism in the context of message queues?
A. A feature that reverses processed messages
B. A way to validate message contents
C. A feature that attempts message delivery again if it fails initially
D. A way to shuffle messages before delivery
Correct Answer: C. A feature that attempts message delivery again if it fails initially
Explanation:
Retry mechanisms help ensure resilience by attempting to reprocess failed messages a configurable number of times before moving them to a DLQ.
What role does partitioning play in scaling a queue system?
A. Ensures encryption of messages
B. Allows duplicate message detection
C. Distributes messages across workers for horizontal scalability
D. Compresses messages for faster delivery
Correct Answer: C. Distributes messages across workers for horizontal scalability
Explanation:
Partitioning breaks a queue into smaller segments, each processed by different consumers, improving throughput and scalability.
How do queues decouple producers and consumers?
A. By encrypting messages end-to-end
B. By directly sending messages to all consumers
C. By allowing producers to send messages without needing consumers to be online
D. By ensuring consumers wait for producers to confirm receipt
Correct Answer: C. By allowing producers to send messages without needing consumers to be online
Explanation:
Queues enable asynchronous communication, so producers can send and forget, while consumers can process at their own pace.
Which queueing technologies are most commonly used in modern distributed systems?
A. Redis and Elasticsearch
B. Kafka and AWS SQS
C. MySQL and MongoDB
D. Apache Spark and Hadoop
Correct Answer: B. Kafka and AWS SQS
Explanation:
Kafka is a distributed log/streaming platform and SQS is a fully managed AWS queue service — both are popular in modern distributed systems.
1. What is the main purpose of a load balancer in a distributed system?
A. Encrypt incoming traffic
B. Serve static files directly
C. Distribute incoming traffic across multiple servers
D. Store backup copies of user data
Correct Answer: C. Distribute incoming traffic across multiple servers
Explanation:
A load balancer helps evenly distribute requests across multiple servers (horizontal scaling), preventing overload on any single machine and improving system availability and scalability.
In what scenario would you most likely choose a Layer 4 (L4) load balancer over Layer 7 (L7)?
A. When routing based on URL path
B. When handling persistent WebSocket connections
C. When routing based on cookies
D. When compressing HTTP responses
Correct Answer: B. When handling persistent WebSocket connections
Explanation:
L4 load balancers operate at the transport layer, making them better suited for persistent connections like WebSockets that require low-level control of the TCP connection.
Which of the following statements is true about how to represent load balancers in system design interviews?
A. Always draw a load balancer in front of every service
B. Never mention load balancers
C. Mention or draw them only when necessary, such as in front of entry points or when sticky sessions are needed
D. Replace all database references with load balancers
Correct Answer: C. Mention or draw them only when necessary, such as in front of entry points or when sticky sessions are needed
Explanation:
In interviews, load balancers are often abstracted, and you don’t need to draw one everywhere. Just mention them when it’s important for routing logic, session persistence, or traffic distribution.
What is one key advantage of using a Layer 7 load balancer over a Layer 4 load balancer?
A. It supports higher network throughput
B. It can route traffic based on application-level data like URL or headers
C. It handles TCP-level retries more efficiently
D. It requires less memory on the server
Correct Answer: B. It can route traffic based on application-level data like URL or headers
Explanation:
L7 load balancers work at the application layer, so they can inspect requests and make decisions based on content (e.g., route /api
requests to one service and /images
to another).
Which of the following is not a commonly used load balancer technology?
A. AWS Elastic Load Balancer
B. NGINX
C. HAProxy
D. MongoDB
Correct Answer: D. MongoDB
Explanation:
MongoDB is a NoSQL database, not a load balancer. The other options (AWS ELB, NGINX, and HAProxy) are all popular software or managed load balancer tools.
1. What is the primary role of an API gateway in a microservice architecture?
A. Directly storing user data in a database
B. Routing requests to the correct backend service
C. Encrypting all data sent from the client
D. Replacing the need for a load balancer
Correct Answer: B. Routing requests to the correct backend service
Explanation:
An API gateway acts as the front door to your system. It routes incoming client requests (e.g., GET /users/123
) to the correct backend service (e.g., the user service). It simplifies client interaction by centralizing and coordinating requests.
Which of the following is a common responsibility of an API gateway besides routing?
A. Hosting frontend code
B. Managing distributed locks
C. Handling authentication and rate limiting
D. Serving as a SQL query engine
Correct Answer: C. Handling authentication and rate limiting
Explanation:
API gateways often handle cross-cutting concerns like authentication, rate limiting, logging, and request transformation, so these responsibilities don’t have to be duplicated across each microservice.
In a system design interview, when is it a good idea to include an API gateway in your design?
A. Only if you're using a NoSQL database
B. Only for frontend-heavy applications
C. In nearly all product design interviews, as the first point of contact
D. Never, it's an implementation detail
Correct Answer: C. In nearly all product design interviews, as the first point of contact
Explanation:
In system design interviews, an API gateway is a strong default choice because it abstracts request routing, enforces policies, and improves maintainability. It shows awareness of microservice best practices and system boundaries.
1. What is the primary reason to use a search optimized database?
A. To reduce storage costs
B. To handle frequent schema changes
C. To perform fast and relevant full-text search
D. To handle complex joins across multiple tables
Correct Answer: C. To perform fast and relevant full-text search
Explanation:
Search optimized databases are designed specifically for efficient full-text search, allowing users to search through large volumes of text data quickly and effectively.
Which data structure is fundamental to making full-text search efficient in search-optimized databases?
A. B-Tree
B. Hash Table
C. Inverted Index
D. Binary Search Tree
Correct Answer: C. Inverted Index
Explanation:
An inverted index maps each word to a list of documents containing it. This allows quick lookup of documents relevant to a search term, making full-text search efficient.
What does “tokenization” mean in the context of full-text search?
A. Encrypting text before indexing
B. Mapping user tokens to documents
C. Breaking down text into individual searchable units (words)
D. Assigning unique IDs to users
Correct Answer: C. Breaking down text into individual searchable units (words)
Explanation:
Tokenization splits text into individual words or tokens so that they can be stored in the inverted index and searched independently.
What is the purpose of “stemming” in a search engine?
A. To remove punctuation from documents
B. To normalize different forms of the same word
C. To encrypt search terms for security
D. To create indexes faster
Correct Answer: B. To normalize different forms of the same word
Explanation:
Stemming reduces words like “running” and “ran” to a root form like “run” so that different variations of a word can match the same index entry.
Which of the following best describes “fuzzy search”?
A. Search that ignores capital letters
B. Search that accepts synonyms
C. Search that can tolerate typos or small differences
D. Search that only returns partial matches
Correct Answer: C. Search that can tolerate typos or small differences
Explanation:
Fuzzy search is useful for finding results even when the search term has misspellings or minor variations, often implemented using edit distance algorithms.
When should you choose Elasticsearch over your traditional relational database for search functionality?
A. When you only need to search for exact IDs
B. When you want minimal infrastructure
C. When you need scalable, high-performance full-text search
D. When you need to enforce foreign key constraints
Correct Answer: C. When you need scalable, high-performance full-text search
Explanation:
Elasticsearch is ideal for systems like social media platforms or e-commerce apps that need advanced full-text search at scale.
Which of the following is a limitation of using a traditional SQL database like Postgres for full-text search?
A. It doesn’t support indexing
B. It requires third-party libraries
C. It may be slower or less feature-rich than dedicated search engines
D. It doesn’t store text data
Correct Answer: C. It may be slower or less feature-rich than dedicated search engines
Explanation:
Postgres supports full-text search using GIN indexes, but for large-scale or advanced search features (like fuzzy search), dedicated tools like Elasticsearch perform better.
Which of the following search features is not typically included in search optimized databases?
A. Real-time document indexing
B. Graph traversal across nodes
C. Fuzzy search support
D. Tokenization and stemming
Correct Answer: B. Graph traversal across nodes
Explanation:
Graph traversal is used in graph databases, not in search-optimized databases like Elasticsearch. These databases specialize in full-text search, not relationship traversal.
What does it mean for a search optimized database to “scale horizontally”?
A. Add more CPUs to a single machine
B. Increase disk space
C. Add more machines to the cluster and distribute data
D. Create new indexes for each document
Correct Answer: C. Add more machines to the cluster and distribute data
Explanation:
Horizontal scaling allows a search engine like Elasticsearch to handle more data and requests by adding nodes and partitioning data (sharding)
What is the most popular search-optimized database used by companies like Netflix, Uber, and Yelp?
A. MongoDB
B. Redis
C. Elasticsearch
D. SQLite
Correct Answer: C. Elasticsearch
Explanation:
Elasticsearch is a widely used search engine based on Apache Lucene. It supports full-text search, analytics, and scalability, making it the industry standard for many companies.
What is the main reason to use blob storage instead of a traditional database for storing images or videos?
A. Blob storage has stronger encryption
B. Blob storage supports SQL queries
C. Blob storage is more cost-effective and efficient for large unstructured files
D. Blob storage is better for small text data
Correct Answer: C. Blob storage is more cost-effective and efficient for large unstructured files
Explanation:
Blob storage is specifically designed to store large objects like images and videos. It's far more scalable and cheaper than using a relational or NoSQL database for the same purpose.
What is a presigned URL used for in the context of blob storage?
A. To keep files permanently hidden
B. To compress files during upload
C. To grant temporary access to upload or download blobs
D. To prevent any access from clients
Correct Answer: C. To grant temporary access to upload or download blobs
Explanation:
Presigned URLs allow clients to directly upload or download a file without going through the backend, and they expire after a set time for security.
What is the best practice for storing large files like videos in an application like YouTube?
A. Store video and metadata together in a relational DB
B. Store video in blob storage and metadata in a separate database
C. Store everything in-memory for faster access
D. Store video as base64 strings in the database
Correct Answer: B. Store video in blob storage and metadata in a separate database
Explanation:
This approach lets you take advantage of blob storage for large file handling and keep searchable metadata (title, tags, uploader, etc.) in a fast, indexable database.
Which of the following is a key benefit of using blob storage with a CDN?
A. It allows file encryption
B. It helps scale the database
C. It delivers content faster to users worldwide
D. It improves upload speed to the origin server
Correct Answer: C. It delivers content faster to users worldwide
Explanation:
A CDN caches blobs at edge locations, so users around the globe can access them quickly, reducing latency.
Which of the following use cases is least suitable for blob storage?
A. Storing video files
B. Storing user profile pictures
C. Storing database transaction logs
D. Storing large document files like PDFs
Correct Answer: C. Storing database transaction logs
Explanation:
Database transaction logs are better suited for specialized storage engines or logging services that can handle fast, sequential writes and recovery semantics.
Which of the following features helps make blob storage highly durable?
A. Tokenization
B. Chunking
C. Replication and erasure coding
D. Indexing and partitioning
Correct Answer: C. Replication and erasure coding
Explanation:
Blob storage services like Amazon S3 use replication and erasure coding to ensure that even if one copy is lost or corrupted, the data can be reconstructed.
Why is chunking used when uploading files to blob storage?
A. To encrypt the file
B. To upload multiple files at once
C. To allow parallel and resumable uploads
D. To create smaller file versions
Correct Answer: C. To allow parallel and resumable uploads
Explanation:
Chunking (e.g., multipart upload) splits a large file into parts, allowing faster, parallel uploads and resume capability if a connection fails.
Which of the following statements about blob storage is true?
A. Blob storage is mainly used for small JSON payloads
B. Blob storage automatically indexes all files
C. Blob storage is ideal for storing large binary files like videos and images
D. Blob storage cannot be accessed directly from the client
Correct Answer: C. Blob storage is ideal for storing large binary files like videos and images
Explanation:
Blob storage is optimized for large, unstructured binary objects, not small structured data or querying purposes.
What role does a traditional database play when paired with blob storage?
A. It stores and serves the blob content directly
B. It indexes and stores references (like URLs) to blobs
C. It compresses blob data
D. It encrypts blob data before upload
Correct Answer: B. It indexes and stores references (like URLs) to blobs
Explanation:
The database stores metadata and pointers (like S3 URLs) so that blobs can be efficiently located and retrieved without storing large files in the DB.
Which of the following services is NOT a blob storage provider?
A. Amazon S3
B. Google Cloud Storage
C. Azure Blob Storage
D. Firebase Firestore
Correct Answer: D. Firebase Firestore
Explanation:
Firestore is a NoSQL document database, not designed for large binary file storage. The others are all major blob storage services.
Which of the following is the most appropriate use case for a relational database?
A. Real-time analytics over millions of log events
B. Storing structured data with ACID guarantees, like user profiles and transactions
C. Storing videos and binary files
D. Managing a dynamic schema with flexible documents
Correct Answer: B. Storing structured data with ACID guarantees, like user profiles and transactions
Explanation: Relational databases are ideal for structured, transactional data that benefits from strong consistency and integrity.
What does ACID stand for in the context of relational databases?
A. Availability, Consistency, Independence, Durability
B. Atomicity, Consistency, Isolation, Durability
C. Accuracy, Complexity, Indexing, Durability
D. Automation, Control, Ingestion, Distribution
Correct Answer: B. Atomicity, Consistency, Isolation, Durability
Explanation: ACID properties ensure safe, consistent, and reliable database transactions.
Which type of index allows fast lookup of documents that contain specific words in full-text search in a relational DB?
A. B-Tree index
B. Inverted index
C. Hash index
D. Spatial index
Correct Answer: B. Inverted index
Explanation: Inverted indexes map words to documents, making them essential for full-text search capabilities.
Why should you be cautious when using SQL joins in high-scale systems?
A. They slow down reads but not writes
B. They require NoSQL databases
C. They can become a major performance bottleneck
D. They are only allowed in PostgreSQL
Correct Answer: C. They can become a major performance bottleneck
Explanation: Joins can be expensive in terms of compute and memory, especially across large tables or unindexed columns.
What is one major difference between relational and NoSQL databases?
A. Relational databases don’t support transactions
B. NoSQL databases cannot be queried
C. Relational databases use fixed schemas, while NoSQL databases can be schema-less
D. NoSQL databases cannot scale horizontally
Correct Answer: C. Relational databases use fixed schemas, while NoSQL databases can be schema-less
Explanation: NoSQL databases are schema-flexible, which is helpful for evolving or irregular data.
When would you prefer DynamoDB over PostgreSQL?
A. When you need multi-table joins
B. When you require strict referential integrity
C. When you need high write throughput and horizontal scalability
D. When you want to use complex stored procedures
Correct Answer: C. When you need high write throughput and horizontal scalability
Explanation: DynamoDB is excellent for write-heavy, scalable applications, especially when data access patterns are well-defined.
Which of the following best describes a transaction in a relational database?
A. A mechanism to roll back a file upload
B. A batch of operations that are always eventually consistent
C. A group of operations that either all succeed or all fail
D. A data structure for storing historical rows
Correct Answer: C. A group of operations that either all succeed or all fail
Explanation: A transaction ensures atomicity, meaning changes are applied fully or not at all.
What is the main benefit of indexing in any type of database?
A. It compresses the data
B. It ensures transaction safety
C. It makes queries faster by avoiding full scans
D. It increases the available disk space
Correct Answer: C. It makes queries faster by avoiding full scans
Explanation: Indexes act like shortcuts, allowing the DB engine to quickly find relevant rows.
What type of NoSQL database is best suited for storing user sessions with fast access using a session ID?
A. Document store
B. Key-value store
C. Column-family store
D. Graph database
Correct Answer: B. Key-value store
Explanation: Key-value stores like Redis or DynamoDB are ideal for simple lookup scenarios like session management.
Which NoSQL database is known for its strong consistency model and serverless architecture on AWS?
A. MongoDB
B. Cassandra
C. Redis
D. DynamoDB
Correct Answer: D. DynamoDB
Explanation: DynamoDB offers strong consistency, on-demand scaling, and is a fully managed NoSQL database on AWS.
In a relational database, which structure stores data in rows and columns?
A. JSON documents
B. BLOBs
C. Tables
D. Nodes
Correct Answer: C. Tables
Explanation: Tables are the core structure in relational databases, organizing data into rows and columns.
Which scenario is best suited for a graph database?
A. Recording sensor data from IoT devices
B. Storing video metadata
C. Performing social network friend-of-a-friend queries
D. Logging API requests
Correct Answer: C. Performing social network friend-of-a-friend queries
Explanation: Graph databases like Neo4j are ideal for modeling relationships and running graph traversal queries.
What is sharding in the context of NoSQL databases?
A. A method of compressing JSON documents
B. A way to build a database index
C. A technique to partition data across servers
D. A way to enforce ACID transactions
Correct Answer: C. A technique to partition data across servers
Explanation: Sharding distributes data across multiple servers to enable horizontal scaling in NoSQL systems.
Which of the following is not typically a strength of a NoSQL database?
A. Horizontal scalability
B. Strict enforcement of foreign keys
C. Flexible data models
D. Schema-less design
Correct Answer: B. Strict enforcement of foreign keys
Explanation: NoSQL databases typically do not enforce foreign keys; instead, application logic manages those relationships.
Which of the following statements about choosing a database in a system design interview is best practice?
A. Always compare SQL and NoSQL databases to show depth of knowledge
B. Choose the database you're familiar with and explain how it solves the problem
C. Always pick a NoSQL database for scale
D. Avoid using databases if you’re using blob storage
Correct Answer: B. Choose the database you're familiar with and explain how it solves the problem
Explanation: Interviewers value practicality and clarity. Use what you know well and focus on how its features match the problem requirements.