Hello Interview - Key Technologies

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/87

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

88 Terms

1
New cards

What is the primary purpose of a Content Delivery Network (CDN)?

A. Encrypt user data during transmission
B. Deliver content to users from the closest server to reduce latency
C. Host dynamic databases
D. Replace the need for an origin server

Correct Answer: B. Deliver content to users from the closest server to reduce latency

🧠 Explanation:

A CDN works by caching content on distributed servers (called edge locations) that are geographically closer to users. This reduces the time it takes for content like images, videos, or HTML files to load, which improves user experience and reduces server load.

2
New cards

Which of the following is a correct use case for a CDN?

A. Storing user passwords securely
B. Hosting server-side business logic
C. Caching static assets like images and JavaScript files
D. Running complex database queries

Correct Answer: C. Caching static assets like images and JavaScript files

🧠 Explanation:
CDNs are most commonly used to cache and serve static content such as CSS, JS, images, and video. This allows faster delivery since the data doesn’t have to be fetched from the main server each time.

3
New cards

How can CDNs improve API performance?

A. By dynamically generating API responses at the edge
B. By caching frequently accessed API responses
C. By replacing the API server completely
D. By encrypting all API traffic

Correct Answer: B. By caching frequently accessed API responses

Explanation:
Even though APIs often serve dynamic content, many responses (e.g., a blog post, a leaderboard) don’t change frequently. CDNs can cache these responses, reducing the number of requests that hit the origin server and improving response times.

4
New cards

Which of the following best describes how eviction works in a CDN?

A. Data is removed randomly every 24 hours
B. Cached content stays forever unless manually deleted
C. Cached content is removed based on policies like TTL or invalidation rules
D. All content is cleared once a new deployment is made

Correct Answer: C. Cached content is removed based on policies like TTL or invalidation rules

Explanation:
Like regular caches, CDNs use eviction policies to manage what stays in cache. TTL (time-to-live) defines how long content is kept before being refreshed, and cache invalidation allows manual or automated removal when content changes.

5
New cards

Why is using a CDN important for global applications like Instagram?

A. It encrypts passwords and cookies
B. It provides faster database indexing
C. It helps deliver user-generated content (like profile pictures) quickly worldwide
D. It processes business logic closer to the user

Correct Answer: C. It helps deliver user-generated content (like profile pictures) quickly worldwide

Explanation:
For platforms like Instagram, user media (profile pics, videos, thumbnails) can be cached and served from edge servers near the user, minimizing latency and improving load times, especially for users far from the origin server.

6
New cards

What is the primary benefit of using a distributed cache?

A. Encrypting user data
B. Reducing data size
C. Lowering latency and reducing database load
D. Increasing storage capacity

Correct Answer: C. Lowering latency and reducing database load

Explanation:
Distributed caches store frequently used or expensive-to-compute data in memory across multiple servers. This reduces the number of queries to the database and speeds up data retrieval, lowering latency.

7
New cards

Which of the following scenarios is the best fit for a distributed cache?

A. Storing passwords
B. Performing backups
C. Caching results of expensive queries
D. Running cron jobs

Correct Answer: C. Caching results of expensive queries

Explanation:
If a query (like fetching a social media feed) is slow but doesn't change often, caching its result reduces load and speeds up responses.

8
New cards

Which of these is not a common eviction policy for distributed caches?

A. Least Recently Used (LRU)
B. Least Frequently Used (LFU)
C. Most Recently Updated (MRU)
D. First In, First Out (FIFO)

Correct Answer: C. Most Recently Updated (MRU)

Explanation:
MRU is not a standard eviction policy. Common ones include LRU, LFU, and FIFO, which prioritize eviction based on access patterns or insertion order.

9
New cards

Which of the following best describes a write-through cache?

A. Data is written to cache first, and the database is updated later
B. Data is only written to the cache
C. Data is written to both cache and database at the same time
D. Data is written directly to the database, skipping the cache

Correct Answer: C. Data is written to both cache and database at the same time

Explanation:
In a write-through strategy, data is written to the cache and database simultaneously, which ensures consistency but may slow down writes slightly.

10
New cards

When designing a distributed cache system, why is cache invalidation important?

A. To delete unused tables from the database
B. To prevent SQL injection attacks
C. To ensure cached data is up-to-date with the source of truth
D. To back up memory to disk

Correct Answer: C. To ensure cached data is up-to-date with the source of truth

Explanation:
Cache invalidation ensures that when underlying data changes (e.g., a concert venue update), the stale cached version is removed or updated.

11
New cards

What is the main tradeoff of a write-back caching strategy?

A. Slower write speed
B. No consistency issues
C. Risk of data loss if cache is not persisted
D. Always up-to-date data

Correct Answer: C. Risk of data loss if cache is not persisted

Explanation:
Write-back caches write data to cache first and sync to the database later. This is fast but risky—if the cache crashes before syncing, data may be lost.

12
New cards

You are building a leaderboard of top events. What data structure in Redis would you use in the cache?

A. Hash
B. Sorted Set
C. List
D. String

Correct Answer: B. Sorted Set

Explanation:
Redis sorted sets store elements with scores and keep them sorted. Perfect for leaderboards or top-k queries like “top events by popularity.”

13
New cards

Which of these is a benefit of using Redis over Memcached?

A. Redis supports more complex data structures like lists and sorted sets
B. Redis requires less memory
C. Redis has faster write speed
D. Redis supports SQL joins

Correct Answer: A. Redis supports more complex data structures like lists and sorted sets

Explanation:
Unlike Memcached (which only supports strings), Redis supports advanced data structures such as lists, sets, sorted sets, and more, making it more flexible.

14
New cards

Why might you use a distributed cache to store user session data?

A. To avoid using browser cookies
B. To make the UI faster
C. To reduce load on your database during high user traffic
D. To increase password security

Correct Answer: C. To reduce load on your database during high user traffic

Explanation:
Storing user sessions in a cache allows for quick lookups without hitting the database, which is critical when supporting millions of concurrent users.

15
New cards

When would a write-around cache strategy be preferred?

A. When you want every write to immediately reflect in the cache
B. When you want to avoid filling the cache with rarely-read data
C. When cache data must always be in sync with the database
D. When you want to ensure the cache is the source of truth

Correct Answer: B. When you want to avoid filling the cache with rarely-read data

Explanation:
Write-around skips writing to the cache and only writes to the database. This avoids polluting the cache with data that might never be read again.

16
New cards

What is the primary purpose of a distributed lock in system design?

A. Encrypt user data
B. Prevent multiple systems from accessing the same resource at the same time
C. Reduce server storage costs
D. Speed up database reads

Correct Answer: B. Prevent multiple systems from accessing the same resource at the same time

Explanation:
A distributed lock ensures that only one process or server can act on a resource at a time, which prevents conflicts, race conditions, and inconsistent states.

17
New cards

Which technology is commonly used to implement distributed locks?

A. MySQL
B. Kafka
C. Redis
D. Elasticsearch

Correct Answer: C. Redis

Explanation:
Redis, with its atomic operations and TTL (time-to-live), is a popular choice for distributed locks. It ensures safe, temporary locking using keys.

18
New cards

What does setting an expiration time on a distributed lock help prevent?

A. Slow queries
B. Security breaches
C. Locks getting stuck if the process crashes
D. Cache misses

Correct Answer: C. Locks getting stuck if the process crashes

Explanation:
If a process that holds a lock crashes, the lock can remain forever unless it expires automatically. This helps avoid unintentional deadlocks.

19
New cards

In which scenario would a distributed lock be most appropriate?

A. Sorting a list of products
B. Generating a user profile picture
C. Holding a concert ticket in a shopping cart during checkout
D. Autocomplete search suggestions

Correct Answer: C. Holding a concert ticket in a shopping cart during checkout

Explanation:
This is a classic use case — we want to ensure only one user can hold or buy the ticket at a time. Distributed locks help enforce this behavior.

20
New cards

What problem can occur when two processes are waiting for each other to release a lock?

A. Race condition
B. Memory leak
C. Deadlock
D. Data duplication

Correct Answer: C. Deadlock

Explanation:
A deadlock happens when two or more processes are waiting on each other to release a lock, and none of them can proceed.

21
New cards

What is Redlock?

A. A database sharding technique
B. A Kafka messaging queue
C. A distributed locking algorithm using multiple Redis nodes
D. A hashing algorithm for security

Correct Answer: C. A distributed locking algorithm using multiple Redis nodes

Explanation:
Redlock is a distributed lock algorithm created by Redis' creator. It ensures that a lock is safely acquired even across multiple Redis servers.

22
New cards

What happens if two servers try to acquire the same distributed lock at the same time?

A. Both get the lock and proceed
B. Neither proceeds
C. The one that succeeds first gets the lock; the other fails
D. They split the work

Correct Answer: C. The one that succeeds first gets the lock; the other fails

Explanation:
Distributed locks rely on atomic operations. Only one process will successfully acquire the lock; others will either retry or fail.

23
New cards

How can distributed locks help prevent duplicated scheduled jobs (cron jobs) across servers?

A. By reducing job priority
B. By locking the task so only one server runs it
C. By using JWTs for authentication
D. By running the job faster

Correct Answer: B. By locking the task so only one server runs it

Explanation:
Distributed locks are useful when multiple servers might run the same job at the same time. Locking ensures only one server runs the job.

24
New cards

What is one best practice to avoid deadlocks when using distributed locks?

A. Always retry failed requests
B. Use locks only in the frontend
C. Acquire all locks in a consistent, pre-defined order
D. Avoid locking resources

Correct Answer: C. Acquire all locks in a consistent, pre-defined order

Explanation:
To avoid deadlocks, always acquire multiple locks in the same order across all processes. Random or nested lock acquisition patterns can cause deadlocks.

25
New cards

Which of the following best describes locking granularity?

A. The speed at which a lock is acquired
B. The size of memory used by the lock
C. The scope of the resource being locked (single item vs group of items)
D. The number of locks a server can handle

Correct Answer: C. The scope of the resource being locked (single item vs group of items)

Explanation:
Locking granularity refers to whether you're locking one item (like a ticket) or a group (like an entire stadium section). Finer granularity gives more concurrency but can be more complex.

26
New cards

What is the main purpose of using a stream in system design?

A. To store large files
B. To process and retain events in real-time for multiple consumers
C. To encrypt user data securely
D. To back up application logs

Correct Answer: B. To process and retain events in real-time for multiple consumers

Explanation:
Streams are designed to ingest, store, and process continuous flows of data in real-time. They are ideal for systems that need to react quickly to events like user actions or financial transactions.

27
New cards

What is event sourcing?

A. A technique for indexing database rows
B. A design pattern that stores every change as a state snapshot
C. A way to store application changes as a sequence of events
D. A method to generate SQL queries from user input

Correct Answer: C. A way to store application changes as a sequence of events

Explanation:
In event sourcing, every change in application state is recorded as an immutable event. These events can be replayed later to rebuild state or perform audits.

28
New cards

Which of the following scenarios is NOT a typical use case for a stream?

A. Replaying historical events to rebuild system state
B. Supporting real-time analytics on user actions
C. Persisting user profile pictures
D. Enabling chat applications to broadcast messages in real-time

Correct Answer: C. Persisting user profile pictures

Explanation:
Streams are optimized for event-based, real-time data, not for storing binary/static assets like images. That’s typically a job for object storage or a CDN.

29
New cards

Which feature allows multiple independent consumers to read the same stream in parallel?

A. Windowing
B. Partitioning
C. Replication
D. Consumer groups

Correct Answer: D. Consumer groups

Explanation:
Consumer groups allow different consumers to read and process the same data independently. Each group maintains its own read position.

30
New cards

What is windowing in stream processing used for?

A. Encrypting real-time events
B. Batching events based on time or count
C. Deleting old data
D. Sorting logs alphabetically

Correct Answer: B. Batching events based on time or count

Explanation:
Windowing helps group events that occur within a specific time range or after a certain number of events, enabling operations like hourly averages or rolling counts.

31
New cards

Why is partitioning important in stream processing?

A. It ensures that all events are sorted alphabetically
B. It enables horizontal scaling by distributing workload
C. It allows data encryption across nodes
D. It simplifies event replay

Correct Answer: B. It enables horizontal scaling by distributing workload

Explanation:
Partitioning spreads events across multiple machines, so multiple consumers can process different partitions in parallel, improving scalability.

32
New cards

What problem does replication solve in stream architectures?

A. Duplicate event consumption
B. Slow query performance
C. Data loss due to server failure
D. High network latency

Correct Answer: C. Data loss due to server failure

Explanation:
Replication ensures that data is copied across multiple servers. If one fails, another can take over without losing any events, ensuring fault tolerance.

33
New cards

.What is the advantage of streams over message queues?

A. Streams are faster for batch processing
B. Streams retain messages and allow re-reading from a specific position
C. Message queues have better real-time performance
D. Streams can only deliver messages once

Correct Answer: B. Streams retain messages and allow re-reading from a specific position

Explanation:
Unlike queues (which typically delete messages after delivery), streams persist messages and let consumers re-read from any point, enabling more flexible processing.

34
New cards

In event sourcing, how do you reconstruct the current state of an application?

A. By reading the current database row
B. By applying a cache invalidation strategy
C. By replaying all the events from the stream
D. By reloading the front-end application

Correct Answer: C. By replaying all the events from the stream

Explanation:
In event sourcing, you reconstruct state by replaying events in the order they occurred. This makes it easy to understand how the system arrived at its current state.

35
New cards

Which technology is best suited for building a stream-based system with event sourcing?

A. MySQL
B. Redis
C. Kafka
D. Memcached

Correct Answer: C. Kafka

Explanation:
Kafka is a high-throughput, distributed streaming platform that supports event sourcing, replay, partitioning, and multiple consumers, making it a top choice for such systems.

36
New cards

What is the primary purpose of using a queue in a system architecture?

A. Encrypt user data for security
B. Speed up front-end rendering
C. Buffer bursty traffic and distribute workloads
D. Store data permanently

Correct Answer: C. Buffer bursty traffic and distribute workloads

Explanation:
Queues absorb traffic spikes and allow background workers to process tasks at their own pace, helping smooth load and distribute tasks across systems.

37
New cards

What happens when a queue is added to a system with tight latency requirements (e.g., < 500ms)?

A. It helps meet the latency target
B. It has no effect
C. It may cause the latency target to be missed
D. It decreases latency significantly

Correct Answer: C. It may cause the latency target to be missed

Explanation:
Queues introduce asynchronous processing, which may delay responses and break strict latency guarantees in real-time systems.

38
New cards

Which of the following best describes a Dead Letter Queue (DLQ)?

A. A queue for old messages
B. A queue that stores messages that failed processing
C. A queue for expired API tokens
D. A queue that stores duplicate messages

Correct Answer: B. A queue that stores messages that failed processing

Explanation:
Dead Letter Queues are used to catch unprocessable messages after all retry attempts fail, allowing developers to inspect and debug issues.

39
New cards

Why is backpressure important in a queuing system?

A. To guarantee FIFO ordering
B. To prevent message loss
C. To throttle message production when the system is overwhelmed
D. To ensure that only one consumer processes a message

Correct Answer: C. To throttle message production when the system is overwhelmed

Explanation:
Backpressure prevents queues from overflowing by signaling producers to slow down or stop until capacity is available, protecting system stability.

40
New cards

What does FIFO stand for and why is it important in queues?

A. Find In Fast Order — Ensures fast processing
B. First In First Out — Ensures ordering of messages
C. Fast Input Fast Output — Ensures high speed
D. Fully Indexed For Output — Ensures consistency

Correct Answer: B. First In First Out — Ensures ordering of messages

Explanation:
FIFO ensures that messages are processed in the order they were received, which is important in many real-time and transactional systems.

41
New cards

Which scenario is NOT an ideal use case for a queue?

A. Buffering photo uploads for background processing
B. Managing peak-hour ride requests in ride-sharing apps
C. Delivering high-frequency stock price updates with low latency
D. Distributing compute-intensive tasks to multiple servers

Correct Answer: C. Delivering high-frequency stock price updates with low latency

Explanation:
Queues introduce latency and are not suitable for real-time, low-latency use cases like live stock tickers. Streams or websockets would be better.

42
New cards

What is a retry mechanism in the context of message queues?

A. A feature that reverses processed messages
B. A way to validate message contents
C. A feature that attempts message delivery again if it fails initially
D. A way to shuffle messages before delivery

Correct Answer: C. A feature that attempts message delivery again if it fails initially

Explanation:
Retry mechanisms help ensure resilience by attempting to reprocess failed messages a configurable number of times before moving them to a DLQ.

43
New cards

What role does partitioning play in scaling a queue system?

A. Ensures encryption of messages
B. Allows duplicate message detection
C. Distributes messages across workers for horizontal scalability
D. Compresses messages for faster delivery

Correct Answer: C. Distributes messages across workers for horizontal scalability

Explanation:
Partitioning breaks a queue into smaller segments, each processed by different consumers, improving throughput and scalability.

44
New cards

How do queues decouple producers and consumers?

A. By encrypting messages end-to-end
B. By directly sending messages to all consumers
C. By allowing producers to send messages without needing consumers to be online
D. By ensuring consumers wait for producers to confirm receipt

Correct Answer: C. By allowing producers to send messages without needing consumers to be online

Explanation:
Queues enable asynchronous communication, so producers can send and forget, while consumers can process at their own pace.

45
New cards

Which queueing technologies are most commonly used in modern distributed systems?

A. Redis and Elasticsearch
B. Kafka and AWS SQS
C. MySQL and MongoDB
D. Apache Spark and Hadoop

Correct Answer: B. Kafka and AWS SQS

Explanation:
Kafka is a distributed log/streaming platform and SQS is a fully managed AWS queue service — both are popular in modern distributed systems.

46
New cards

1. What is the main purpose of a load balancer in a distributed system?

A. Encrypt incoming traffic
B. Serve static files directly
C. Distribute incoming traffic across multiple servers
D. Store backup copies of user data

Correct Answer: C. Distribute incoming traffic across multiple servers

Explanation:
A load balancer helps evenly distribute requests across multiple servers (horizontal scaling), preventing overload on any single machine and improving system availability and scalability.

47
New cards

In what scenario would you most likely choose a Layer 4 (L4) load balancer over Layer 7 (L7)?

A. When routing based on URL path
B. When handling persistent WebSocket connections
C. When routing based on cookies
D. When compressing HTTP responses

Correct Answer: B. When handling persistent WebSocket connections

Explanation:
L4 load balancers operate at the transport layer, making them better suited for persistent connections like WebSockets that require low-level control of the TCP connection.

48
New cards

Which of the following statements is true about how to represent load balancers in system design interviews?

A. Always draw a load balancer in front of every service
B. Never mention load balancers
C. Mention or draw them only when necessary, such as in front of entry points or when sticky sessions are needed
D. Replace all database references with load balancers

Correct Answer: C. Mention or draw them only when necessary, such as in front of entry points or when sticky sessions are needed

Explanation:
In interviews, load balancers are often abstracted, and you don’t need to draw one everywhere. Just mention them when it’s important for routing logic, session persistence, or traffic distribution.

49
New cards

What is one key advantage of using a Layer 7 load balancer over a Layer 4 load balancer?

A. It supports higher network throughput
B. It can route traffic based on application-level data like URL or headers
C. It handles TCP-level retries more efficiently
D. It requires less memory on the server

Correct Answer: B. It can route traffic based on application-level data like URL or headers

Explanation:
L7 load balancers work at the application layer, so they can inspect requests and make decisions based on content (e.g., route /api requests to one service and /images to another).

50
New cards

Which of the following is not a commonly used load balancer technology?

A. AWS Elastic Load Balancer
B. NGINX
C. HAProxy
D. MongoDB

Correct Answer: D. MongoDB

Explanation:
MongoDB is a NoSQL database, not a load balancer. The other options (AWS ELB, NGINX, and HAProxy) are all popular software or managed load balancer tools.

51
New cards

1. What is the primary role of an API gateway in a microservice architecture?

A. Directly storing user data in a database
B. Routing requests to the correct backend service
C. Encrypting all data sent from the client
D. Replacing the need for a load balancer

Correct Answer: B. Routing requests to the correct backend service

Explanation:
An API gateway acts as the front door to your system. It routes incoming client requests (e.g., GET /users/123) to the correct backend service (e.g., the user service). It simplifies client interaction by centralizing and coordinating requests.

52
New cards

Which of the following is a common responsibility of an API gateway besides routing?

A. Hosting frontend code
B. Managing distributed locks
C. Handling authentication and rate limiting
D. Serving as a SQL query engine

Correct Answer: C. Handling authentication and rate limiting

Explanation:
API gateways often handle cross-cutting concerns like authentication, rate limiting, logging, and request transformation, so these responsibilities don’t have to be duplicated across each microservice.

53
New cards

In a system design interview, when is it a good idea to include an API gateway in your design?

A. Only if you're using a NoSQL database
B. Only for frontend-heavy applications
C. In nearly all product design interviews, as the first point of contact
D. Never, it's an implementation detail

Correct Answer: C. In nearly all product design interviews, as the first point of contact

Explanation:
In system design interviews, an API gateway is a strong default choice because it abstracts request routing, enforces policies, and improves maintainability. It shows awareness of microservice best practices and system boundaries.

54
New cards

1. What is the primary reason to use a search optimized database?

A. To reduce storage costs
B. To handle frequent schema changes
C. To perform fast and relevant full-text search
D. To handle complex joins across multiple tables

Correct Answer: C. To perform fast and relevant full-text search

Explanation:
Search optimized databases are designed specifically for efficient full-text search, allowing users to search through large volumes of text data quickly and effectively.

55
New cards

Which data structure is fundamental to making full-text search efficient in search-optimized databases?

A. B-Tree
B. Hash Table
C. Inverted Index
D. Binary Search Tree

Correct Answer: C. Inverted Index

Explanation:
An inverted index maps each word to a list of documents containing it. This allows quick lookup of documents relevant to a search term, making full-text search efficient.

56
New cards

What does “tokenization” mean in the context of full-text search?

A. Encrypting text before indexing
B. Mapping user tokens to documents
C. Breaking down text into individual searchable units (words)
D. Assigning unique IDs to users

Correct Answer: C. Breaking down text into individual searchable units (words)

Explanation:
Tokenization splits text into individual words or tokens so that they can be stored in the inverted index and searched independently.

57
New cards

What is the purpose of “stemming” in a search engine?

A. To remove punctuation from documents
B. To normalize different forms of the same word
C. To encrypt search terms for security
D. To create indexes faster

Correct Answer: B. To normalize different forms of the same word

Explanation:
Stemming reduces words like “running” and “ran” to a root form like “run” so that different variations of a word can match the same index entry.

58
New cards

Which of the following best describes “fuzzy search”?

A. Search that ignores capital letters
B. Search that accepts synonyms
C. Search that can tolerate typos or small differences
D. Search that only returns partial matches

Correct Answer: C. Search that can tolerate typos or small differences

Explanation:
Fuzzy search is useful for finding results even when the search term has misspellings or minor variations, often implemented using edit distance algorithms.

59
New cards

When should you choose Elasticsearch over your traditional relational database for search functionality?

A. When you only need to search for exact IDs
B. When you want minimal infrastructure
C. When you need scalable, high-performance full-text search
D. When you need to enforce foreign key constraints

Correct Answer: C. When you need scalable, high-performance full-text search

Explanation:
Elasticsearch is ideal for systems like social media platforms or e-commerce apps that need advanced full-text search at scale.

60
New cards

Which of the following is a limitation of using a traditional SQL database like Postgres for full-text search?

A. It doesn’t support indexing
B. It requires third-party libraries
C. It may be slower or less feature-rich than dedicated search engines
D. It doesn’t store text data

Correct Answer: C. It may be slower or less feature-rich than dedicated search engines

Explanation:
Postgres supports full-text search using GIN indexes, but for large-scale or advanced search features (like fuzzy search), dedicated tools like Elasticsearch perform better.

61
New cards

Which of the following search features is not typically included in search optimized databases?

A. Real-time document indexing
B. Graph traversal across nodes
C. Fuzzy search support
D. Tokenization and stemming

Correct Answer: B. Graph traversal across nodes

Explanation:
Graph traversal is used in graph databases, not in search-optimized databases like Elasticsearch. These databases specialize in full-text search, not relationship traversal.

62
New cards

What does it mean for a search optimized database to “scale horizontally”?

A. Add more CPUs to a single machine
B. Increase disk space
C. Add more machines to the cluster and distribute data
D. Create new indexes for each document

Correct Answer: C. Add more machines to the cluster and distribute data

Explanation:
Horizontal scaling allows a search engine like Elasticsearch to handle more data and requests by adding nodes and partitioning data (sharding)

63
New cards

What is the most popular search-optimized database used by companies like Netflix, Uber, and Yelp?

A. MongoDB
B. Redis
C. Elasticsearch
D. SQLite

Correct Answer: C. Elasticsearch

Explanation:
Elasticsearch is a widely used search engine based on Apache Lucene. It supports full-text search, analytics, and scalability, making it the industry standard for many companies.

64
New cards

What is the main reason to use blob storage instead of a traditional database for storing images or videos?

A. Blob storage has stronger encryption
B. Blob storage supports SQL queries
C. Blob storage is more cost-effective and efficient for large unstructured files
D. Blob storage is better for small text data

Correct Answer: C. Blob storage is more cost-effective and efficient for large unstructured files

Explanation:
Blob storage is specifically designed to store large objects like images and videos. It's far more scalable and cheaper than using a relational or NoSQL database for the same purpose.

65
New cards

What is a presigned URL used for in the context of blob storage?

A. To keep files permanently hidden
B. To compress files during upload
C. To grant temporary access to upload or download blobs
D. To prevent any access from clients

Correct Answer: C. To grant temporary access to upload or download blobs

Explanation:
Presigned URLs allow clients to directly upload or download a file without going through the backend, and they expire after a set time for security.

66
New cards

What is the best practice for storing large files like videos in an application like YouTube?

A. Store video and metadata together in a relational DB
B. Store video in blob storage and metadata in a separate database
C. Store everything in-memory for faster access
D. Store video as base64 strings in the database

Correct Answer: B. Store video in blob storage and metadata in a separate database

Explanation:
This approach lets you take advantage of blob storage for large file handling and keep searchable metadata (title, tags, uploader, etc.) in a fast, indexable database.

67
New cards

Which of the following is a key benefit of using blob storage with a CDN?

A. It allows file encryption
B. It helps scale the database
C. It delivers content faster to users worldwide
D. It improves upload speed to the origin server

Correct Answer: C. It delivers content faster to users worldwide

Explanation:
A CDN caches blobs at edge locations, so users around the globe can access them quickly, reducing latency.

68
New cards

Which of the following use cases is least suitable for blob storage?

A. Storing video files
B. Storing user profile pictures
C. Storing database transaction logs
D. Storing large document files like PDFs

Correct Answer: C. Storing database transaction logs

Explanation:
Database transaction logs are better suited for specialized storage engines or logging services that can handle fast, sequential writes and recovery semantics.

69
New cards

Which of the following features helps make blob storage highly durable?

A. Tokenization
B. Chunking
C. Replication and erasure coding
D. Indexing and partitioning

Correct Answer: C. Replication and erasure coding

Explanation:
Blob storage services like Amazon S3 use replication and erasure coding to ensure that even if one copy is lost or corrupted, the data can be reconstructed.

70
New cards

Why is chunking used when uploading files to blob storage?

A. To encrypt the file
B. To upload multiple files at once
C. To allow parallel and resumable uploads
D. To create smaller file versions

Correct Answer: C. To allow parallel and resumable uploads

Explanation:
Chunking (e.g., multipart upload) splits a large file into parts, allowing faster, parallel uploads and resume capability if a connection fails.

71
New cards

Which of the following statements about blob storage is true?

A. Blob storage is mainly used for small JSON payloads
B. Blob storage automatically indexes all files
C. Blob storage is ideal for storing large binary files like videos and images
D. Blob storage cannot be accessed directly from the client

Correct Answer: C. Blob storage is ideal for storing large binary files like videos and images

Explanation:
Blob storage is optimized for large, unstructured binary objects, not small structured data or querying purposes.

72
New cards

What role does a traditional database play when paired with blob storage?

A. It stores and serves the blob content directly
B. It indexes and stores references (like URLs) to blobs
C. It compresses blob data
D. It encrypts blob data before upload

Correct Answer: B. It indexes and stores references (like URLs) to blobs

Explanation:
The database stores metadata and pointers (like S3 URLs) so that blobs can be efficiently located and retrieved without storing large files in the DB.

73
New cards

Which of the following services is NOT a blob storage provider?

A. Amazon S3
B. Google Cloud Storage
C. Azure Blob Storage
D. Firebase Firestore

Correct Answer: D. Firebase Firestore

Explanation:
Firestore is a NoSQL document database, not designed for large binary file storage. The others are all major blob storage services.

74
New cards

Which of the following is the most appropriate use case for a relational database?

A. Real-time analytics over millions of log events
B. Storing structured data with ACID guarantees, like user profiles and transactions
C. Storing videos and binary files
D. Managing a dynamic schema with flexible documents

Correct Answer: B. Storing structured data with ACID guarantees, like user profiles and transactions

Explanation: Relational databases are ideal for structured, transactional data that benefits from strong consistency and integrity.

75
New cards

What does ACID stand for in the context of relational databases?

A. Availability, Consistency, Independence, Durability
B. Atomicity, Consistency, Isolation, Durability
C. Accuracy, Complexity, Indexing, Durability
D. Automation, Control, Ingestion, Distribution

Correct Answer: B. Atomicity, Consistency, Isolation, Durability

Explanation: ACID properties ensure safe, consistent, and reliable database transactions.

76
New cards

Which type of index allows fast lookup of documents that contain specific words in full-text search in a relational DB?

A. B-Tree index
B. Inverted index
C. Hash index
D. Spatial index

Correct Answer: B. Inverted index

Explanation: Inverted indexes map words to documents, making them essential for full-text search capabilities.

77
New cards

Why should you be cautious when using SQL joins in high-scale systems?

A. They slow down reads but not writes
B. They require NoSQL databases
C. They can become a major performance bottleneck
D. They are only allowed in PostgreSQL

Correct Answer: C. They can become a major performance bottleneck

Explanation: Joins can be expensive in terms of compute and memory, especially across large tables or unindexed columns.

78
New cards

What is one major difference between relational and NoSQL databases?

A. Relational databases don’t support transactions
B. NoSQL databases cannot be queried
C. Relational databases use fixed schemas, while NoSQL databases can be schema-less
D. NoSQL databases cannot scale horizontally

Correct Answer: C. Relational databases use fixed schemas, while NoSQL databases can be schema-less

Explanation: NoSQL databases are schema-flexible, which is helpful for evolving or irregular data.

79
New cards

When would you prefer DynamoDB over PostgreSQL?

A. When you need multi-table joins
B. When you require strict referential integrity
C. When you need high write throughput and horizontal scalability
D. When you want to use complex stored procedures

Correct Answer: C. When you need high write throughput and horizontal scalability

Explanation: DynamoDB is excellent for write-heavy, scalable applications, especially when data access patterns are well-defined.

80
New cards

Which of the following best describes a transaction in a relational database?

A. A mechanism to roll back a file upload
B. A batch of operations that are always eventually consistent
C. A group of operations that either all succeed or all fail
D. A data structure for storing historical rows

Correct Answer: C. A group of operations that either all succeed or all fail

Explanation: A transaction ensures atomicity, meaning changes are applied fully or not at all.

81
New cards

What is the main benefit of indexing in any type of database?

A. It compresses the data
B. It ensures transaction safety
C. It makes queries faster by avoiding full scans
D. It increases the available disk space

Correct Answer: C. It makes queries faster by avoiding full scans

Explanation: Indexes act like shortcuts, allowing the DB engine to quickly find relevant rows.

82
New cards

What type of NoSQL database is best suited for storing user sessions with fast access using a session ID?

A. Document store
B. Key-value store
C. Column-family store
D. Graph database

Correct Answer: B. Key-value store

Explanation: Key-value stores like Redis or DynamoDB are ideal for simple lookup scenarios like session management.

83
New cards

Which NoSQL database is known for its strong consistency model and serverless architecture on AWS?

A. MongoDB
B. Cassandra
C. Redis
D. DynamoDB

Correct Answer: D. DynamoDB

Explanation: DynamoDB offers strong consistency, on-demand scaling, and is a fully managed NoSQL database on AWS.

84
New cards

In a relational database, which structure stores data in rows and columns?

A. JSON documents
B. BLOBs
C. Tables
D. Nodes

Correct Answer: C. Tables

Explanation: Tables are the core structure in relational databases, organizing data into rows and columns.

85
New cards

Which scenario is best suited for a graph database?

A. Recording sensor data from IoT devices
B. Storing video metadata
C. Performing social network friend-of-a-friend queries
D. Logging API requests

Correct Answer: C. Performing social network friend-of-a-friend queries

Explanation: Graph databases like Neo4j are ideal for modeling relationships and running graph traversal queries.

86
New cards

What is sharding in the context of NoSQL databases?

A. A method of compressing JSON documents
B. A way to build a database index
C. A technique to partition data across servers
D. A way to enforce ACID transactions

Correct Answer: C. A technique to partition data across servers

Explanation: Sharding distributes data across multiple servers to enable horizontal scaling in NoSQL systems.

87
New cards

Which of the following is not typically a strength of a NoSQL database?

A. Horizontal scalability
B. Strict enforcement of foreign keys
C. Flexible data models
D. Schema-less design

Correct Answer: B. Strict enforcement of foreign keys

Explanation: NoSQL databases typically do not enforce foreign keys; instead, application logic manages those relationships.

88
New cards

Which of the following statements about choosing a database in a system design interview is best practice?

A. Always compare SQL and NoSQL databases to show depth of knowledge
B. Choose the database you're familiar with and explain how it solves the problem
C. Always pick a NoSQL database for scale
D. Avoid using databases if you’re using blob storage

Correct Answer: B. Choose the database you're familiar with and explain how it solves the problem

Explanation: Interviewers value practicality and clarity. Use what you know well and focus on how its features match the problem requirements.