1/21
Flashcards covering the fundamentals of Business Intelligence, Data Warehouse characteristics, architecture evolution, and core data storage concepts.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Business Intelligence (BI)
The processes for collecting and analyzing data, the technologies used in these processes, and the information obtained from these processes with the purpose of facilitating corporate decision making.
Data Warehouse (Inmon 1993 definition)
A centralized, subject-oriented, integrated, non-volatile, time-variant collection of data that supports management’s decision-making process.
Subject-oriented
A characteristic of a Data Warehouse where data is organized around major subjects of the enterprise, such as customers, products, and sales, rather than application areas.
Integrated
A characteristic of a Data Warehouse where source data from different sources is made consistent to present a unified view of the data.
Time-variant
A characteristic of a Data Warehouse indicating that the data is accurate and valid only at some point in time or over some time interval.
Non-volatile
A characteristic of a Data Warehouse where data is not updated in real-time but refreshed periodically; new data supplements old data rather than replacing it.
OLTP (Online Transactional Processing)
Systems that support operational processing with real-time data latency, detailed data granularity, and high transaction throughput.
OLAP (Online Analytical Processing)
A technology that organizes large business databases and supports complex analysis and queries without negatively affecting transactional systems.
Data Silos
Large amounts of data generated across multiple disparate, source operational systems that are often isolated from one another.
ETL (Extract, Transform, Load)
A traditional method for structured data where data is extracted from sources, transformed in a pipeline or middleware, and then loaded into the warehouse.
ELT (Extract, Load, Transform)
A modern data integration method, popular with cloud warehouses, where raw data is extracted and loaded into the warehouse first, then transformed inside the warehouse using SQL or MPP engines.
Fact Tables
Tables that contain quantitative data or measurable facts and reference multiple dimension tables via foreign keys.
Dimension Tables
Tables that contain descriptive attributes related to dimensions of facts, used to filter, group, and label facts in reports.
Star Schema
A multi-dimensional schema consisting of a central fact table joined to a single table for each dimension.
Snowflake Schema
A variation of the star schema where dimensional tables are organized as a hierarchy by normalizing them.
Data Mart
A smaller, more manageable, and relevant dataset for a specific business area or department, such as HR or Finance.
Lakehouse
A data management system based on low-cost and directly-accessible storage that provides traditional analytical DBMS features such as ACID transactions, data versioning, auditing, and query optimization.
Business Metadata
Metadata that includes data ownership information, business definitions, and changing policies.
Technical Metadata
Metadata including database system names, table and column names and sizes, data types, and allowed values.
Operational Metadata
Metadata that Includes currency of data, data lineage, refresh times, and the history of migrations and transformations applied.
Data Mining
The process of discovering meaningful new correlations and patterns by mining large amounts of data using statistical, mathematical, and AI techniques.
OLAP Cubes
1990s technology that maintained multi-dimensional arrays as pre-computed aggregations to speed up queries.