Note

0.0(0)

Take a practice test

Chat with Kai

undefined Flashcards

Explore Top Notes

Studied by 29 people

Chapter 3: The Language of Law

Studied by 18 people

Chapter 7 Textbook

Studied by 7 people

Rhetorical terms

Studied by 6 people

DNA and Protein Synthesis

Studied by 45 people

Temporomandibular Joint Dysfunction

Studied by 16 people

Spark & Hadoop – IDS 200 Lecture Vocabulary

Distributed Databases

Data too large for single hardware ➔ split across multiple nodes
Key challenges: data placement, fast retrieval, avoiding device-level bottlenecks
Real-world scale: Google webpages, Facebook photos, YouTube videos

MapReduce Paradigm

Two stages:
- Map: divide job, assign sub-tasks to different nodes
- Reduce: aggregate partial outputs into final result
Original use case: Google search index processing

Hadoop Ecosystem

Cluster of commodity nodes; each stores & processes its own data chunk
Scheduler: YARN (Yet Another Resource Negotiator)
Automatic replication & node scaling for fault-tolerance / elasticity

Hadoop Distributed File System (HDFS)

Data on disk in fixed blocks (default 64 or 128\,\text{MB})
\text{NameNode} holds metadata (file ➔ blocks ➔ data nodes)
Optimized for petabyte-scale, sequential read workloads

Hadoop Limitations

Disk-based I/O ⇒ slower for real-time / ML needs
Introduced ~20 years ago; many orgs now moving beyond classic MapReduce

Spark Highlights

Built to run atop Hadoop (reuses HDFS & YARN) but keeps data in RAM
Supports MapReduce plus richer APIs (MLlib, SQL, streaming)
Performance: markedly faster; Cost: higher RAM requirements
Demands greater technical skill for deployment & tuning

Choosing Hadoop vs. Spark

Hadoop = lower cost, acceptable for batch or less time-critical jobs
Spark = higher speed & flexibility, preferred for iterative analytics / ML
Mixed environments common (e.g., Amazon retail, many government systems)

Takeaways for IDS Majors

Learn underlying principles, not just current tools
Legacy concepts persist in evolved forms ➔ foundation for new tech
Demonstrating breadth + growth mindset valued in interviews

Note

0.0(0)

Take a practice test

Chat with Kai

undefined Flashcards

Explore Top Notes

Studied by 29 people

Chapter 3: The Language of Law

Studied by 18 people

Chapter 7 Textbook

Studied by 7 people

Rhetorical terms

Studied by 6 people

DNA and Protein Synthesis

Studied by 45 people

Temporomandibular Joint Dysfunction

Studied by 16 people