Looks like no one added any tags here yet for you.
Data pipeline
A series of processes that move data from one or more sources to a destination, transforming or processing it along the way
ELT (Extract, Load, Transform)
A data pipeline method where data is loaded first, then transformed
ETL (Extract, Transform, Load)
A data pipeline method where data is transformed before being loaded
Data warehouse
A database system that stores historical information primarily for data analysis, typically using ETL
Data lake
A storage system for heterogeneous data that does not fit into a relational database model (e.g., video, audio, images), typically using ELT
Data wrangling
The process of cleaning, transforming, and organizing raw data into a usable format
OLAP (Online Analytical Processing)
A database system designed for complex queries and analysis of large volumes of data, used in financial planning and business intelligence
OLTP (Online Transaction Processing)
A database system used for managing day-to-day transactional operations, such as banking and retail
CRUD
An acronym for Create, Read, Update, Delete, which represents basic database operations
Query Processor
Interprets and executes SQL queries
DBMS (Database Management System)
Acts as an interface between the database and users/applications
Transaction Manager
Ensures transactions are processed reliably, preventing conflicts and ensuring completion
Storage Manager
Translates query processor instructions into low-level file system commands for data retrieval and modification
Log
A file that records all inserts, updates, and deletes processed by a database, used for recovery after failures
Catalog (Data Dictionary)
A directory of tables, columns, indexes, and other database objects
Client
The front-end application that interacts with users and sends requests to the server
Server
The back-end application that manages the database and processes user requests
Database schema
The structure of a database, including tables, columns, data types, constraints, primary keys, and foreign keys
Tuple
A row in a relational database, represented with parentheses ()
ACID
An acronym for Atomicity, Consistency, Isolation, Durability, which ensures reliable database transactions
Atomicity
Ensures a transaction is treated as a single unit, meaning either all of it completes or none of it does
Consistency
Ensures database rules are always followed to maintain valid data
Isolation
Ensures concurrent transactions do not interfere with each other
Durability
Ensures that once a transaction is complete, its effects are permanent and will survive system failures
Primary Key
An attribute or set of attributes that uniquely identifies a record in a table and cannot be null
Foreign Key
An attribute in one table that references the primary key in another table, establishing relationships between tables
UNIQUE
A constraint that ensures all values in a column are distinct
NOT NULL
A constraint that prevents a column from having NULL values
CHECK
A constraint that ensures all values in a column satisfy a specific condition
DEFAULT
A constraint that provides a default value for a column when no value is inserted
Composite Key
A primary key consisting of multiple attributes to uniquely identify a record
Hierarchical Database
A database model organized in a tree-like structure with parent-child relationships (one-to-many)
Network Database
A database model organized in a graph-like structure that supports many-to-many relationships using "links"
Relational Database
A database model that organizes data into tables with rows and columns, using foreign keys to establish relationships
Data independence
The ability to modify the database schema at one level without affecting schema at higher levels
DDL (Data Definition Language)
A part of SQL that defines and modifies the database schema (CREATE, ALTER, DROP)
DML (Data Manipulation Language)
A part of SQL used to manipulate data in the database (SELECT, INSERT, UPDATE, DELETE)
DCL (Data Control Language)
A part of SQL used to control access to the database
DTL (Data Transaction Language)
A part of SQL used to manage transactions within the database
DQL (Data Query Language)
A part of SQL used to query and retrieve data from the database (SELECT)
CHAR vs VARCHAR
CHAR(n) stores fixed-length strings and pads with spaces, while VARCHAR(n) stores variable-length strings up to n characters
DECIMAL(x
y),A data type used to store decimal numbers, where x is the total number of digits and y is the number of digits after the decimal
NULL Value Interpretations
Unknown Value (exists but unknown), Not Applicable (does not apply), and Missing Value (exists but not recorded)
Referential Integrity
Ensures that relationships between tables are maintained, preventing orphaned records
Ted Codd
The father of the relational database model who introduced normalization to reduce redundancy and improve data integrity
Normalization
The process of organizing data in a database to reduce redundancy and improve data integrity
Sorting Numbers vs Strings
Numbers are ordered by magnitude (51 < 200), while strings are sorted lexicographically ('200' < '51')
AUTO_INCREMENT
A feature in databases that automatically generates unique numeric values for a column