IDS CH3

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/10

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

11 Terms

1
New cards

Why Spark & Hadoop?

Handle huge data sets, fast processing

2
New cards

Distributed DB?

Data spread across many computers

3
New cards

Google Search Problem?

Too many pages for one machine → Needs splitting

4
New cards

MapReduce?

Map: Split job, Reduce: Combine results

5
New cards

Hadoop setup?

Cluster of machines working together

6
New cards

HDFS?

Hadoop file system, splits data into blocks

7
New cards

NameNode?

Tracks where each block is stored

8
New cards

Hadoop Limit?

Old, slow (disk-based), not good for real-time

9
New cards

Spark benefit?

Uses RAM = much faster

10
New cards

Spark uses what?

HDFS + YARN, like Hadoop, but faster

11
New cards

Who uses Hadoop now?

Amazon, some gov’t → Big tech.