1/10
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What is Big Data?
It refers to datasets that are too large or complex for traditional databases to handle efficiently.
What are the Key Characteristics of Big Data? (5 Vs)
Volume → Huge size (TBs–PBs)
Velocity → Fast data generation (real-time streams)
Variety → Structured + unstructured (text, video, logs)
Veracity → Data uncertainty
Value → Extracting useful insights
What is an example for Key-Value databases?
Redis ( Key —> Value)
What is an example for Document databases?
MongoDB (Stores JSON-like documents)
What is an example for Column Family databases?
Apache Cassandra (Stores data in columns grouped into families.)
What is an example for Graph Databases?
Neo4j (Nodes (entities), Edges (relationships))
What are the Limitations of Traditional RDBMS?
Vertical scaling only (scale-up is expensive)
Rigid schema (not flexible for changing data)
Poor performance with unstructured data
Join-heavy queries become slow
Not designed for distributed environments
The CAP Theorem can only guarantee 2 out of 3:
Consistency (C) → All nodes see same data
Availability (A) → Every request gets response
Partition Tolerance (P) → Works despite network failure
What are the Distributed Database Concepts?
Sharding → Split data across nodes
Replication → Copy data for fault tolerance
Fault Tolerance → System continues if nodes fail
Consistency Models
Strong consistency
Eventual consistency
MapReduce Model (Distributed model) has two phases:
Map: converts data into key-value pairs
Reduce: combine counts
What are the core components of Hadoop Ecosystem?
HDFS → Distributed storage
MapReduce → Processing engine
YARN → Resource management