System Design Interview

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/111

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

112 Terms

1
New cards

What is the initial setup for building a scalable system?

Everything runs on a single server, including the web app, database, and cache.

2
New cards

What is the role of the Domain Name System (DNS) in web traffic?

DNS translates domain names into IP addresses for user access.

<p>DNS translates domain names into IP addresses for user access.</p>
3
New cards

What communication protocol is used between mobile applications and web servers?

HTTP protocol is used for communication.

4
New cards

What are the two main types of databases used in system design?

Relational databases (RDBMS) and non-relational databases (NoSQL).

5
New cards

What are some examples of relational databases?

MySQL, Oracle database, PostgreSQL.

6
New cards

What are the four categories of non-relational databases?

Key-value stores, graph stores, column stores, and document stores.

7
New cards

What is vertical scaling?

Adding more power (CPU, RAM) to existing servers.

8
New cards

What is horizontal scaling?

Adding more servers to a resource pool.

9
New cards

What is a load balancer's function in a web server setup?

It evenly distributes incoming traffic among multiple web servers.

10
New cards

What is the benefit of using private IPs in a load-balanced setup?

Private IPs enhance security by restricting direct access to web servers.

11
New cards

What is database replication?

A technique where a master database supports write operations and slave databases support read operations.

<p>A technique where a master database supports write operations and slave databases support read operations.</p>
12
New cards

What are the advantages of database replication?

Better performance, reliability, and high availability.

13
New cards

What happens if a slave database goes offline in a replicated setup?

Read operations are redirected to the master database temporarily.

14
New cards

Why might developers choose non-relational databases?

For applications requiring low latency, unstructured data, or massive data storage.

15
New cards

What is the main limitation of vertical scaling?

It has a hard limit on CPU and memory, and lacks failover and redundancy.

16
New cards

How does a load balancer improve system availability?

By routing traffic to healthy servers if one server goes offline.

17
New cards

What is the typical ratio of reads to writes in most applications?

Most applications require a higher ratio of reads to writes.

18
New cards

What is the significance of separating web/mobile traffic and database servers?

It allows for independent scaling of web and database tiers.

19
New cards

What is the common API response format used in web applications?

JavaScript Object Notation (JSON).

20
New cards

What is the impact of high traffic on a single web server?

Users may experience slower response times or fail to connect.

21
New cards

What is the purpose of a master database in replication?

To handle all write operations while slave databases handle read operations.

22
New cards

What is a common technique to improve the availability of the data tier?

Database replication.

23
New cards

What should be done if multiple slave databases are available and one goes offline?

Read operations are redirected to other healthy slave databases.

24
New cards

What is the main advantage of horizontal scaling over vertical scaling?

It allows for unlimited growth by adding more servers.

25
New cards

What is a common issue with relying solely on vertical scaling?

If the server goes down, the entire application becomes unavailable.

26
New cards

What is a typical use case for non-relational databases?

When data is unstructured or when high-speed data access is required.

27
New cards

What happens when the master database goes offline?

A slave database is promoted to be the new master, and operations are executed on it temporarily.

28
New cards

What is required to update missing data in a promoted slave database?

Data recovery scripts need to be run to update the missing data.

29
New cards

What is a cache?

A temporary storage area that stores results of expensive responses or frequently accessed data to serve subsequent requests more quickly.

30
New cards

What is the purpose of a cache tier?

To provide a temporary data store layer that is much faster than the database, improving system performance and reducing database workloads.

31
New cards

What is a read-through cache?

A caching strategy where a web server checks the cache for a response before querying the database and storing the response in the cache.

<p>A caching strategy where a web server checks the cache for a response before querying the database and storing the response in the cache.</p>
32
New cards

What should be considered when deciding to use a cache?

Use cache when data is read frequently but modified infrequently, as cached data is volatile and not ideal for persistent storage.

33
New cards

What is an expiration policy in caching?

A policy that removes cached data after a certain time to prevent stale data from being stored permanently.

34
New cards

What is cache consistency?

The challenge of keeping the data store and cache in sync, especially during data-modifying operations.

35
New cards

What is a single point of failure (SPOF) in a cache system?

A single cache server that, if it fails, can stop the entire system from working.

36
New cards

What is cache eviction?

The process of removing existing items from the cache when it becomes full, often using policies like Least Recently Used (LRU).

37
New cards

What is a Content Delivery Network (CDN)?

A network of geographically dispersed servers used to deliver static content efficiently to users.

<p>A network of geographically dispersed servers used to deliver static content efficiently to users.</p>
38
New cards

How does a CDN improve load time?

By delivering static content from the nearest CDN server to the user, reducing latency.

39
New cards

What is the Time-to-Live (TTL) in CDN caching?

An optional HTTP header that describes how long a cached item remains in the CDN before it expires.

40
New cards

What should be considered regarding CDN costs?

You are charged for data transfers in and out of the CDN, so caching infrequently used assets may not provide significant benefits.

41
New cards

What is the importance of setting an appropriate cache expiry in a CDN?

To ensure content remains fresh without causing excessive reloading from origin servers.

42
New cards

How can files be invalidated in a CDN before they expire?

By using APIs provided by CDN vendors or by versioning the object in the URL.

43
New cards

What is the recommended practice for storing user session data in a stateless web tier?

Store session data in persistent storage such as a relational database or NoSQL database.

44
New cards

What is the role of a load balancer in a database system?

To route user requests to either Server 1 or Server 2 based on the IP address obtained from DNS.

<p>To route user requests to either Server 1 or Server 2 based on the IP address obtained from DNS.</p>
45
New cards

What operations are routed to the master database by a web server?

Data-modifying operations such as write, update, and delete.

46
New cards

What is the benefit of adding a cache layer to a system?

It improves application performance by reducing the number of database calls.

47
New cards

What is the challenge of promoting a new master database in production systems?

The data in the slave database may not be up to date, requiring data recovery.

48
New cards

What are some complexities of using multi-master and circular replication methods?

These setups are more complicated and are not covered in this book.

49
New cards

What happens when a CDN server does not have a requested file in its cache?

The CDN server requests the file from the origin server, caches it, and then returns it to the user.

50
New cards

What is the primary function of a web server in relation to a slave database?

To read user data from the slave database.

51
New cards

What is the impact of caching on database workloads?

Caching reduces database workloads by serving frequently accessed data from memory.

52
New cards

What is the purpose of moving state out of the web tier?

To store session data in persistent storage like a relational database or NoSQL, allowing all web servers to access it.

53
New cards

What is a stateless web tier?

A web architecture where HTTP requests can be sent to any server, which fetches state data from a shared data store.

54
New cards

What are the advantages of a stateless architecture?

It is simpler, more robust, and scalable, allowing for easier auto-scaling of web servers.

55
New cards

What is geoDNS?

A DNS service that resolves domain names to IP addresses based on the user's location.

56
New cards

What is the function of sticky sessions?

To route all requests from the same client to the same server, which can add overhead and complicate scaling.

57
New cards

What challenges arise in a multi-data center setup?

Traffic redirection, data synchronization, and testing and deployment consistency.

58
New cards

What is a message queue?

A durable component that supports asynchronous communication, serving as a buffer for distributing requests.

59
New cards

How does a message queue improve application scalability?

It allows producers to send messages even when consumers are unavailable, enabling independent scaling.

60
New cards

What are some key metrics to monitor in a large system?

Host level metrics (CPU, Memory), aggregated level metrics (database performance), and key business metrics (daily active users, revenue).

61
New cards

What are the drawbacks of vertical scaling?

Hardware limits, risk of single points of failure, and high costs for powerful servers.

62
New cards

What is sharding in database scaling?

The practice of separating large databases into smaller, manageable parts called shards, each with unique data.

<p>The practice of separating large databases into smaller, manageable parts called shards, each with unique data.</p>
63
New cards

What is the role of logging in a large web application?

To monitor error logs for identifying errors and problems in the system.

64
New cards

Why is automation important in large systems?

To improve productivity and ensure consistency in deployment and testing processes.

65
New cards

What is the benefit of using a shared data store in a stateless architecture?

It allows multiple servers to access the same state data, simplifying scaling and improving resilience.

66
New cards

What happens during a data center outage in a multi-data center setup?

Traffic is redirected to a healthy data center to maintain service availability.

67
New cards

What is the significance of asynchronous multi-data center replication?

It ensures data availability across different regions, even during failover situations.

68
New cards

What is continuous integration?

A practice where each code check-in is verified through automation to detect problems early.

69
New cards

What is the purpose of monitoring metrics in a web application?

To gain insights into business performance and the health status of the system.

70
New cards

How does a message queue handle high processing loads?

By allowing more workers to process messages from the queue as needed, scaling independently of producers.

71
New cards

What is the impact of removing session data from web servers?

It simplifies the architecture and allows for easier auto-scaling of the web tier.

72
New cards

What are the components of a message queue architecture?

Producers/publishers create messages, and consumers/subscribers connect to the queue to process them.

73
New cards

What is the role of automation tools in large systems?

To improve productivity and maintain consistency across different data centers.

74
New cards

What is the main advantage of using NoSQL for state data storage?

It is easier to scale compared to traditional relational databases.

75
New cards

What is sharding in databases?

Sharding is a technique used to scale databases by distributing data across multiple database servers based on a sharding key.

76
New cards

How is user data allocated in sharded databases?

User data is allocated to a database server based on user IDs using a hash function.

77
New cards

What is the purpose of a sharding key?

A sharding key, or partition key, determines how data is distributed across shards, allowing for efficient data retrieval and modification.

78
New cards

What is the criteria for choosing a sharding key?

The sharding key should be chosen to ensure even distribution of data across shards.

79
New cards

What is resharding in the context of databases?

Resharding is the process of updating the sharding function and moving data around when a single shard can no longer hold more data or when uneven data distribution occurs.

80
New cards

What is the celebrity problem in sharding?

The celebrity problem, or hotspot key problem, occurs when excessive access to a specific shard causes server overload, often requiring special allocation strategies.

81
New cards

What is a common workaround for performing join operations in sharded databases?

A common workaround is to de-normalize the database so that queries can be performed in a single table.

82
New cards

What are some strategies for scaling a system to support millions of users?

Strategies include keeping the web tier stateless, building redundancy at every tier, caching data, supporting multiple data centers, and sharding the data tier.

83
New cards

What is the primary goal of a system design interview?

The primary goal is to assess a candidate's ability to collaborate, work under pressure, and resolve ambiguity constructively.

84
New cards

What should candidates avoid in a system design interview?

Candidates should avoid over-engineering, narrow-mindedness, and stubbornness, as these are considered red flags.

85
New cards

What is the first step in a system design interview process?

The first step is to understand the problem and establish the design scope.

86
New cards

Why is it important to ask questions in a system design interview?

Asking questions helps clarify requirements and assumptions, ensuring a thorough understanding before proposing a solution.

87
New cards

What is the significance of making assumptions in a system design interview?

Making assumptions allows candidates to outline their thought process and can guide the design discussion.

88
New cards

What does it mean to keep the web tier stateless?

Keeping the web tier stateless means that each request from a client is treated as an independent transaction, without relying on stored session information.

89
New cards

What role does caching play in system design?

Caching helps reduce database load and improve response times by storing frequently accessed data in memory.

90
New cards

What is the benefit of splitting tiers into individual services?

Splitting tiers into individual services allows for better scalability, maintainability, and the ability to optimize each service independently.

91
New cards

What is the importance of monitoring in system design?

Monitoring is crucial for identifying performance issues, ensuring system reliability, and facilitating proactive maintenance.

92
New cards

What is the purpose of using automation tools in system design?

Automation tools help streamline processes, reduce human error, and improve efficiency in system management.

93
New cards

What is the first step in a system design interview?

Understand the problem and establish design scope.

94
New cards

What should you clarify during the initial phase of a system design interview?

Requirements, user base, anticipated scaling, technology stack, and existing services.

95
New cards

What is an example of a question to ask about user interaction?

Is this a mobile app, a web app, or both?

96
New cards

What is the purpose of proposing a high-level design?

To develop an initial blueprint and reach agreement with the interviewer.

97
New cards

What should you include in your high-level design proposal?

Key components such as clients, APIs, web servers, data stores, cache, and message queues.

98
New cards

What is the significance of back-of-the-envelope calculations?

To evaluate if your design fits the scale constraints.

99
New cards

What are the two main flows in a news feed system design?

Feed publishing and news feed building.

100
New cards

What should you do during the design deep dive step?

Identify and prioritize components in the architecture.