1/10
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Why Spark & Hadoop?
Handle huge data sets, fast processing
Distributed DB?
Data spread across many computers
Google Search Problem?
Too many pages for one machine → Needs splitting
MapReduce?
Map: Split job, Reduce: Combine results
Hadoop setup?
Cluster of machines working together
HDFS?
Hadoop file system, splits data into blocks
NameNode?
Tracks where each block is stored
Hadoop Limit?
Old, slow (disk-based), not good for real-time
Spark benefit?
Uses RAM = much faster
Spark uses what?
HDFS + YARN, like Hadoop, but faster
Who uses Hadoop now?
Amazon, some gov’t → Big tech.