1/63
Flashcards covering key concepts in Data Warehousing (DWH), ETL, DWH Schemas, Dimension Tables, Fact Tables, and Business Intelligence (BI).
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is a Data Warehouse (DWH)?
A repository that stores current and historical data from various sources for decision support, analytics, business intelligence, and data mining.
What is Data Warehousing?
The process of building and using a data warehouse to support analytical reports, ad hoc queries, and decision making.
What is an Ad Hoc Query?
A dynamically constructed query meant to answer a specific question at a particular moment, without a predefined structure.
What is an Ad Hoc Report?
Created for one-time use, shared on the spot, often done by non-technical BI users, and designed for a single specific purpose.
What is OLAP (Online Analytical Processing)?
Software used to perform multidimensional analysis at high speed on large volumes of data from a data warehouse or data mart.
What does it mean for a data warehouse to be Subject-Oriented?
Designed to analyze specific subjects like Products, Customers, Sales, Revenues, giving a complete view of a subject.
What does it mean for a data warehouse to be Integrated?
Combines data from various heterogeneous sources such as flat files (CSV) and relational databases (SQL Server, Oracle, DB2).
What does it mean for a data warehouse to be Time-Variant?
Stores historical data over consistent time intervals (e.g., every 3 or 6 months).
What does it mean for a data warehouse to be Non-Volatile?
Old data is not deleted when new data is added.
What is Information Processing in the context of DWH usage?
Produces statistical analysis and reports in tables, charts, graphs, etc.
What is Analytical Processing (OLAP) in the context of DWH usage?
Supports OLAP operations like Slice & Dice, Drill Down, and Pivoting.
What is Data Mining in the context of DWH usage?
Discovers hidden patterns and associations in the data; used to build analytical models.
What is a DWH Schema?
A blueprint or architecture — a logical structure describing how data is organized and related in a data warehouse.
What is a Star Schema?
Named for its star-like shape with a central Fact Table connected to multiple Dimension Tables.
What is a Fact Table?
Stores measurements, metrics, or facts related to a business process (e.g., revenue, sales).
What is a Dimension Table?
Contains descriptive attributes related to facts (e.g., Product details, Salesperson info, Time, Location).
What is a Snowflake Schema?
An extension of the Star Schema where data is normalized into sub-dimension tables to reduce redundancy.
What is a Fact Constellation Schema (Galaxy Schema)?
A collection of star schemas where multiple fact tables share dimension tables.
What is a Data Lake?
All data types (structured, semi-structured, unstructured); schema is applied after storing (Schema-on-Read); ideal for deep analysis & data scientists; uses ELT.
What is a Data Warehouse in comparison to a Data Lake?
Structured data with schema defined before storing (Schema-on-Write); ideal for operational users & BI analysts; uses ETL.
What is OLTP (Online Transaction Processing)?
Day-to-day operations with small, frequent transactions using traditional DBMS.
What is OLAP (Online Analytical Processing) in comparison to OLTP?
Business analysis and reporting with large volumes of historical data using a Multidimensional DB.
What is a Data Warehouse in terms of scope, size and complexity?
Scope is the entire organization, size is 100GB – 1TB+, and design complexity is complex with build time of 1 month – 1 year.
What is a Data Mart in Comparison to a Data Warehouse?
Scope is a specific department, size is < 100GB, and design complexity is simple with build time of a few weeks – few months.
What does ETL stand for?
Extract, Transform, Load – a core process in Data Warehousing.
What is the Extraction phase in ETL?
Pulling raw data from different sources such as relational databases, NoSQL databases, XML, and flat files.
What is a staging area used for in ETL?
A temporary storage area used to ensure safe transformation and rollback, as raw data may be inconsistent or corrupted.
What is the Transformation phase in ETL?
Applying rules or functions to standardize the data, including filtering, cleaning, standardization, deduplication, joining, splitting, and sorting.
What is the Loading phase in ETL?
Loading transformed data into the final Data Warehouse; can be a Full Load or Incremental Load.
What is a Full Load in ETL?
All records are loaded from scratch every time.
What is an Incremental Load in ETL?
Only new or changed data is loaded; efficient and ideal for ongoing updates.
What is ELT (Extract, Load, Transform)?
Extract, Load, Transform – where data is loaded into the DWH first, then transformed within the warehouse.
What are the key characteristics of ETL?
Transformation occurs before loading, staging area is required, less flexibility, faster for small data, lower complexity, mature and stable tools.
What are the key characteristics of ELT?
Transformation occurs after loading, staging area is not required, high flexibility, better for large-scale raw data, higher setup & processing logic, still evolving tools.
Describe the Business Intelligence (BI) Lifecycle.
Data is collected from internal and external sources, converted into usable information using tools, presented in reports, and monitored to predict future outcomes.
What is Business Intelligence (BI)?
Converting raw data into meaningful insights to support business decisions.
What is the Information Gathering stage of BI?
Data is collected from internal sources (CRM, ERP, financial databases) and external sources (surveys, forms, polls).
What is the Analysis stage of BI?
Raw data is converted into usable information using tools such as spreadsheets, custom queries, and visualization tools.
What is the Reporting stage of BI?
Analyzed data is presented in a human-readable format through tables, graphs, dashboards, and executive summaries.
What is the Monitoring stage of BI?
Real-time observation of key metrics through dashboards and KPIs to catch issues and observe performance.
What is the Prediction stage of BI?
Uses current and historical data to predict future outcomes through data mining and predictive modeling.
What are Degenerate Dimensions?
A dimension key without a corresponding dimension table, existing within the fact table.
What are Junk Dimensions?
Combining low-cardinality fields into a single table to reduce clutter in fact tables.
What are Role-Playing Dimensions?
A single dimension appearing multiple times in a fact table, each with a different meaning.
What are Conformed Dimensions?
A dimension that has the same meaning and structure across multiple fact tables, ensuring consistency in reporting.
What are Outrigger Dimensions?
A dimension that references another dimension, creating chained relationships.
What are Snowflake Dimensions?
A normalized dimension table — split into multiple related tables to reflect hierarchy.
What are Slowly Changing Dimensions (SCDs)?
Tracking historical changes in dimension attributes using various techniques (Type 0 to Type 5+).
What are Fast Changing Dimensions?
Attributes that change very frequently; moved to a separate Mini-Dimension to avoid too many rows.
What are Shrunken Rollup Dimensions?
Summarized versions of existing dimensions used for aggregate fact tables.
What are Multi-Valued Dimensions?
One fact row relates to multiple dimension values.
What are Swappable Dimensions?
A dimension that has alternate versions of itself, selected at query time, each with different structure and attributes.
What are Heterogeneous Dimensions?
Used when a business serves multiple product types with different attributes but a shared customer base.
What is a Fact Table?
Stores quantitative data (facts or measures), describes business events or processes, and contains foreign keys to dimension tables.
What are the steps to design a Fact Table?
Choose the business process, identify the grain, identify the dimension tables, and define the facts.
What is a Transaction Fact Table?
One row per transaction, grows quickly, captures fine-grained data.
What is a Periodic (Aggregated) Fact Table?
One row per time period (e.g., daily, monthly) used for summary reporting.
What is an Accumulated Snapshot Fact Table?
One row represents an entire process lifecycle, updated over time as milestones are completed.
What are Additive Facts?
Can be summed across all dimensions; the most common type.
What are Semi-Additive Facts?
Can be summed across some dimensions but not across time.
What are Non-Additive Facts?
Cannot be summed across any dimension; usually ratios or percentages.
What are Derived Facts (Calculated Facts)?
Created from other facts; may or may not be stored physically.
What are Textual Facts?
Text data such as indicators or flags; should generally be avoided in fact tables.
What are Factless Fact Tables?
No numerical measures, only foreign keys used to record events or associations.