Spark & Hadoop – IDS 200 Lecture Vocabulary

0.0(0)

Studied by 0 people

View linked note

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/11

Earn XP

Description and Tags

Twelve vocabulary flashcards summarizing essential terms from the Spark & Hadoop lecture.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

12 Terms

New cards

Distributed Database

A storage system spreading large datasets across multiple devices, managing data placement, retrieval, and bottlenecks.

New cards

MapReduce

A two-stage algorithm for massive data processing: a Map phase to split work across nodes and a Reduce phase to merge the partial results.

New cards

Map Stage

The first phase of MapReduce in which a problem is broken into smaller pieces, each sent to separate hardware for parallel processing.

New cards

Reduce Stage

The second phase of MapReduce that combines the outputs from all map tasks to produce the final, complete result.

New cards

Hadoop

A disk-based, cluster-oriented MapReduce framework using HDFS for storage and YARN for scheduling; slower and older than Spark.

New cards

Hadoop Distributed File System (HDFS)

Hadoop's disk storage layer saving fixed-size data blocks across nodes, with NameNode managing metadata.

New cards

NameNode

The master node in HDFS that maintains the list of all files and blocks stored on the cluster’s data nodes.

New cards

Yet Another Resource Negotiator (YARN)

Hadoop’s scheduler that assigns computing tasks to nodes and locates needed data via the NameNode.

New cards

Spark

A Newer RAM-based big data engine, in-memory for speed, often running atop Hadoop with MLlib.

New cards

Hadoop Ecosystem

The broader set of technologies that support Hadoop clusters, sometimes used as an umbrella term that even includes Spark.

New cards

MLlib

Spark’s built-in machine learning library that supplies algorithms beyond basic MapReduce processing.

New cards

Petabyte

A data-size unit of roughly 1,000 terabytes (1,000 + TB), commonly referenced as the scale handled by Hadoop clusters.

Explore top notes

Myths and Folklore

Updated 116d ago

Note

Chemistry moles stuff

Updated 150d ago

Note

Human Body Systems

Updated 858d ago

Note

Chapter 27 - The Interwar Years: The Challenge of Dictators and Depression

Updated 1143d ago

Note

16. Từ bài số 15, nếu a, b, c cấu tạo thành được một tam giác, kiểm tra xem đó là tam giác gì (tam giác đều, tam giác vuông cân, tam giác vuông, tam g

Updated 137d ago

Note

What is Economics?

Updated 1002d ago

Note

Chapter 13: Wage Determination

Updated 1011d ago

Note

Unit 1: Limits and Continuity

Updated 812d ago

Note

Explore top flashcards

CLC Stage 25

Updated 888d ago

Flashcards (25)

French 1B En Voiture Vocab

Flashcards (35)

Flashcards (20)

Flashcards (76)

LANGL1171 - Vocabulary - English for engineers bac 1

Updated 200d ago

Flashcards (859)

AP HuG unit 2

Updated 800d ago

Flashcards (45)

Unit 2: Colonial America (1607-1754)

Updated 828d ago

Flashcards (52)

Tener Que - To Have To

Updated 850d ago

Flashcards (45)