1/77
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai | Chat |
|---|
No analytics yet
Send a link to your students to track their progress
ACID
Properties that ensure reliable database transactions: Atomicity, Consistency, Isolation, Durability.
Atomicity
A transaction happens completely or not at all. ALL OR NOTHING
Consistency
A transaction moves the database from one valid state to another.
Isolation
Transactions do not interfere with one another.
Durability
Committed data remains even after a crash.
CAP Theorem
A distributed system can guarantee only two of the following three:
Consistency
Availability
Partition Tolerance
OLTP
Processes many small, real-time transactions.
OLAP
Analyzes large amounts of historical data for reporting and decision-making.
SQL Database
A relational database that stores data in tables with fixed schemas.
NoSQL Database
A non-relational database designed for flexible, scalable storage of structured or unstructured data.
Throughput
How much work gets done
Latency
Waiting time. Time required to get a response
Horizontal Scaling
Adds more servers to increase capacity. (growth is faster/ foundation of distributed systems and NoSQL)
Vertical Scaling
Adds more CPU, RAM, or storage to one server.
Replication
Creates copies of data on multiple servers for reliability and availability.
Sharding
Splits data across multiple servers to improve scalability.
Hadoop
A framework for storing and processing large datasets across many computers.
HDFS
Hadoop Distributed File System for storing large datasets across multiple machines
Map
Definition:
Processes input data into key-value pairs.
Reduce
Combines intermediate results into a final output.
Spark
A distributed computing framework that processes data primarily in memory for high speed.
ETL
Extract, Transform, Load.
Data is transformed before loading.
ELT
Extract, Load, Transform.
Data is transformed after loading.
Data Warehouse
Stores structured, cleaned data for analytics.
Data Lake
Stores raw structured and unstructured data.
Graph
A data structure made of nodes connected by relationships.
Node
An entity in a graph.
Relationship (Edge)
A connection between two nodes.
Property
Information stored on a node or relationship.
Neo4j
A native graph database.
Cypher
Neo4j's query language.
Index-Free Adjacency
Nodes directly reference neighboring nodes, enabling fast graph traversal
BFS (Breadth-First Search)
A graph traversal algorithm that explores level by level using a queue and finds shortest paths in unweighted graphs.
DFS (Depth-First Search)
A graph traversal algorithm that explores as deep as possible using a stack (or recursion).
Dijkstra's Algorithm
Finds the shortest path in a weighted graph with non-negative edge weights using a priority queue.
Queue
FIFO (First-In, First-Out) data structure
Stack
LIFO (Last-In, First-Out) data structure.
Priority Queue
A queue that removes the item with the highest priority (lowest distance in Dijkstra).
Data Stream
A continuous, real-time, potentially infinite flow of data.
Event-Time
The time an event actually occurred.
Processing-Time
The time the system processes an event.
Watermark
A mechanism that tracks stream progress and helps handle late events.
Tumbling Window
A non-overlapping fixed-size window.
Sliding Window
An overlapping fixed-size window.
Session Window
A window based on user activity that closes after inactivity.
Record-by-Record Processing
Processes each event immediately as it arrives.
Micro-Batching
Processes small batches of events at regular intervals.
Stateful Processing
Maintains information across multiple events.
CEP (Complex Event Processing)
Detects meaningful patterns across multiple events.
Lambda Architecture
A three-layer architecture combining batch and stream processing.
Kappa Architecture
A single stream-processing architecture that replays event logs
IoT Pipeline
Processes IoT data through collection, preprocessing, processing, and visualization.
IaaS
Infrastructure as a Service; provides virtual hardware while users manage the operating system and applications.
PaaS
Platform as a Service; users deploy applications while the provider manages the platform.
SaaS
Software as a Service; complete software provided over the internet.
FaaS (Serverless)
Function as a Service; executes individual functions on demand without managing servers.
Stateless
Does not retain information between executions.
Ephemeral
Temporary; exists only while running.
Auto-Scaling
Automatically adjusts computing resources based on demand.
Cold Start
Initialization delay when an inactive serverless function runs again.
Storage Disaggregation
Separates compute resources from storage so each can scale independently.
LLM
A Large Language Model trained on massive text datasets to generate human-like text.
Transformer
The neural network architecture used by modern LLMs.
GPT
Generative Pre-trained Transformer; predicts the next token.
Token
A piece of text processed by an LLM.
Embedding
A numerical vector representation of data.
Attention
A mechanism that focuses on relevant surrounding tokens to understand context.
Vector Database
A database that stores and searches embeddings for similarity search.
Semantic Search
Search based on meaning instead of exact keywords.
Cosine Similarity
Measures similarity using the angle between vectors.
Euclidean Distance
Measures straight-line distance between vectors.
KNN
Exact nearest-neighbor search that compares against every vector.
ANN
Approximate nearest-neighbor search that is faster but slightly less accurate.
HNSW
An Approximate Nearest Neighbor (ANN) search algorithm.
Chroma
ANN open-source vector database.
Pinecone
A cloud-native vector database
RAG (Retrieval-Augmented Generation)
A technique that retrieves relevant documents from a vector database before an LLM generates an answer, improving accuracy and reducing hallucinations.
Hallucination
An incorrect or unsupported answer generated by an LLM.