IDS CH3

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/10

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

11 Terms

1

New cards

Why Spark & Hadoop?

Handle huge data sets, fast processing

2

New cards

Distributed DB?

Data spread across many computers

3

New cards

Google Search Problem?

Too many pages for one machine → Needs splitting

4

New cards

MapReduce?

Map: Split job, Reduce: Combine results

5

New cards

Hadoop setup?

Cluster of machines working together

6

New cards

HDFS?

Hadoop file system, splits data into blocks

7

New cards

NameNode?

Tracks where each block is stored

8

New cards

Hadoop Limit?

Old, slow (disk-based), not good for real-time

9

New cards

Spark benefit?

Uses RAM = much faster

10

New cards

Spark uses what?

HDFS + YARN, like Hadoop, but faster

11

New cards

Who uses Hadoop now?

Amazon, some gov’t → Big tech.