Hadoop & MapReduce

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/12

Earn XP

Description and Tags

CGS

Last updated 11:21 AM on 5/4/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

13 Terms

New cards

Hadoop

A Java-based framework (not a database) for distributing and processing very large data sets across clusters of computers.

New cards

Two most important parts of Hadoop

HDFS (Hadoop Distributed File System) and MapReduce.

New cards

HDFS

A highly distributed, fault-tolerant file storage system designed to manage large amounts of data at high speed; a low-level distributed file system used directly for storage.

New cards

Four HDFS assumptions

(1) High volume (terabyte+ files), (2) Write-once, read-many (no edits after close), (3) Streaming access (process whole files as a stream), (4) Fault tolerance (replicate data across many machines).

New cards

Client node (HDFS)

A node that makes requests to the file system.

New cards

Name node (HDFS)

The node that stores metadata about which blocks belong to which files and which data nodes hold them.

New cards

Data node (HDFS)

A node that stores the actual file data blocks.

New cards

Block report

A report sent every 6 hours from a data node to the name node listing which blocks it holds.

New cards

Heartbeat

A signal sent every 3 seconds from a data node to the name node to confirm it is still available.

New cards

What happens when a name node stops receiving heartbeats from a data node

It excludes that data node from future read/write lists and may instruct other nodes to replicate the missing data.

New cards

MapReduce

A divide-and-conquer parallel processing technique: split a large data block into sub-blocks, compute intermediate results, then summarize into one final answer.

New cards

Mapper

A program that performs the Map function

New cards

Reducer

A program that performs the Reduce function