Hadoop & MapReduce

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/12

flashcard set

Earn XP

Description and Tags

Last updated 11:21 AM on 5/4/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

13 Terms

1
New cards
Hadoop
A Java-based framework (not a database) for distributing and processing very large data sets across clusters of computers.
2
New cards
Two most important parts of Hadoop
HDFS (Hadoop Distributed File System) and MapReduce.
3
New cards
HDFS
A highly distributed, fault-tolerant file storage system designed to manage large amounts of data at high speed; a low-level distributed file system used directly for storage.
4
New cards
Four HDFS assumptions
(1) High volume (terabyte+ files), (2) Write-once, read-many (no edits after close), (3) Streaming access (process whole files as a stream), (4) Fault tolerance (replicate data across many machines).
5
New cards
Client node (HDFS)
A node that makes requests to the file system.
6
New cards
Name node (HDFS)
The node that stores metadata about which blocks belong to which files and which data nodes hold them.
7
New cards
Data node (HDFS)
A node that stores the actual file data blocks.
8
New cards
Block report
A report sent every 6 hours from a data node to the name node listing which blocks it holds.
9
New cards
Heartbeat
A signal sent every 3 seconds from a data node to the name node to confirm it is still available.
10
New cards
What happens when a name node stops receiving heartbeats from a data node
It excludes that data node from future read/write lists and may instruct other nodes to replicate the missing data.
11
New cards
MapReduce
A divide-and-conquer parallel processing technique: split a large data block into sub-blocks, compute intermediate results, then summarize into one final answer.
12
New cards
Mapper
A program that performs the Map function
13
New cards
Reducer
A program that performs the Reduce function