1/23
Vocabulary flashcards covering key concepts, terms, and definitions from the data warehousing lecture notes.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Data Warehouse
A repository that stores data from multiple sources, organized, cleansed, and standardized for enterprise use; stores current and historical data for analysis.
Subject-oriented
Data is organized around a theme or topic (e.g., sales, product).
Integrated
Consolidates data from disparate sources to create consistency across the enterprise.
Time-variant
Data is organized in time intervals (e.g., weekly, monthly, quarterly).
Non-volatile
Once in the warehouse, data remains unchanged and is not deleted.
Summarized
Data is often aggregated for optimized reporting.
ETL
Extract, Transform, Load — the process of moving and preparing data from source systems into the data warehouse.
Extract
Phase of ETL where data is pulled from operational or transactional systems.
Transform
Phase of ETL where data is cleaned, formatted, and converted to fit the warehouse schema.
Load
Phase of ETL where transformed data is loaded into the data warehouse for analytics and reporting.
OLAP
Online Analytical Processing; a data processing framework that supports complex analytical queries and decision-making, typically using a dimensional model.
OLTP
Online Transactional Processing; a data processing framework for day-to-day operations, real-time processing, simple queries, and a relational model.
Data Marts
Subsets of a data warehouse focused on a specific department or unit and optimized for its needs.
Data Lakes
A decentralized approach storing raw, untransformed data from operational sources; often uses ELT (loading first, transforming later).
On-Premises
Traditional deployment of data warehouses inside an organization’s local infrastructure.
Cloud
Deployment of data warehouses in the cloud, with upkeep managed by a third party, offering scalability and off-site hosting.
Big Data
Massive amounts of data, often raw and unstructured, coming in various formats.
NoSQL
Non-relational databases capable of handling structured, semi-structured, and unstructured data at large scale.
Data Warehouse vs Database
Data warehouse is designed for analysis and reporting with integrated, time-variant data; database supports day-to-day transactions with simple queries.
ELT
Extract, Load, Transform — data is loaded first and then transformed inside the warehouse or data store.
ETL vs ELT
ETL transforms data before loading; ELT loads raw data first and then transforms it inside the warehouse.
Data quality and consistency
Benefits of DW include improved data quality and consistency due to cleansing and standardization across sources.
Data ownership concerns
Disadvantages of DW include potential governance and ownership issues when integrating data from multiple sources.
Deployment considerations
DW deployments can be traditional on-premises or modern cloud-based, affecting maintenance and focus on business goals.