CS 585: Lecture 2, Business Intelligence

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/15

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

16 Terms

1
New cards

Business Intelligence (BI)

BI = integrated tools and processes to capture, integrate, store and analyze data so managers can makebetterdecisions.Itturnsrawdata→information→knowledge→(ideally)wiser decisions.

2
New cards

BI benefits & framework

BI gives better decisions, a shared architecture, one common data repository (“single version of truth”), and common reporting/analysis tools. Typical flow: operational + external data → ETL → data warehouse + data marts → reporting, analytics, monitoring, dashboards; all used by people within defined processes and governance.

3
New cards

Operational vs decision-support data

Operational data lives in highly normalized OLTP DBs, focused on fast, accurate transaction processing (current, detailed, many updates, simple/narrow queries).
Decision-support data is historical, summarized, subject-oriented and multi-dimensional, focused on fast retrieval and complex, broad queries with drill-down (more detail) and roll-up (more aggregation).

4
New cards

Data warehouse vs operational DB

Operational DBs are siloed by function (orders, payments, inventory), use local formats and semantics, and constantly change.
A data warehouse is integrated (common codes/definitions), subject-oriented (product, region, customer, etc.), time-variant (keeps history with explicit time dimension), and non-volatile (loaded periodically, rarely updated in place).

5
New cards

ETL & schema-on-write

ETL extracts data from operational sources, transforms it (clean, standardize, integrate,
aggregate), then loads it into the warehouse. Because schema is defined before loading, this is
“schema-on-write”.

6
New cards

Star schema – core warehouse design

Star schema maps multi-dimensional data into a relational model.
Fact table: numeric business measures (e.g., sales_amount), with foreign keys to dimensions.
Dimension tables: descriptive attributes (product, customer, time, location, etc.) used to filter,
group, and “slice and dice”.
Attribute hierarchies: ordered levels (e.g., Year → Quarter → Month → Day) used for drill-
down/roll-up.
Cube view: each fact row represents one point in the multi-dimensional cube (e.g., product ×
time × location).

7
New cards

Star schema representation

Fact table in the center, one row per event (sale, shipment, etc.), with a composite primary key ofsignatures: equals(geom), disjoint(geom), intersects(geom), touches(geom), crosses(geom),

all dimension FKs plus numeric measures.
Dimension tables surround it, each with a PK referenced by the fact table (1:M from dimension
to fact).

8
New cards

Snowflake schema – super important

In a snowflake schema, dimension tables are normalized into multiple related tables along their
hierarchies (e.g., Store → City → State → Region instead of one wide Location table).
Pros: cleaner dimension design, avoids repeated region/state values.
Cons: more joins to answer queries (to roll up from store to state or region).
Snowflake is really about how we implement attribute hierarchies: each level gets its own table, so drill-down/roll-up is following these 1:M chains.

9
New cards

Ways to structure a warehouse & multiple facts

Star: fact table → single-level (denormalized) dimension tables.
Snowflake: fact table → lowest-level dimension; that dimension links up a normalized chain. All-in-one variants: put higher-level attributes directly into fact, or have separate fact tables at region/state/city levels to avoid long joins but use more space.
Multiple fact tables (e.g., SALES_REGION, SALES_STATE, SALES_CITY, SALES_LOCATION) can speed queries at different aggregation levels.

10
New cards

Star vs snowflake – trade-offs

Star: fewer joins, simpler queries, more redundancy in dimensions; good for performance and user-friendliness.


Snowflake: more normalized dimensions, less redundancy, but more joins and sometimes more complex queries.

BI design often mixes them and may pre-aggregate into extra fact tables to avoid expensive GROUP BY operations on huge base facts.

11
New cards


Data analytics (inside BI)

Data analytics = mathematical/statistical/modeling methods used on warehouse data to understand what happened (explanatory analytics) and predict what will happen (predictive analytics). This overlaps heavily with “data mining”.

12
New cards

OLAP (online analytical processing)

OLAP = multi-dimensional analysis over a data warehouse to support decision-making and
business modeling.
Architecture: sources → ETL → DW → OLAP engine/server → client tools (Excel, dashboards, GUIs) → reports, cubes, visualizations.
Departments may have local data marts (small, subject-specific subsets of the DW) for fast, focused analysis.

13
New cards

Data marts

Data mart = smaller, single-subject slice of the warehouse (e.g., Sales, Marketing). Cheaper and faster to build than a full DW; tailored to one group’s decision-support needs.

14
New cards

ROLAP vs MOLAP

ROLAP (Relational OLAP): stores star/snowflake schemas in a relational DBMS; uses SQL +
extensions; easy to add dimensions; good for ad hoc queries and very large relational datasets.
MOLAP (Multidimensional OLAP): stores data in multidimensional “cubes” (n-D arrays); often cached in memory; very fast for predefined dimensions and common cube operations; more rigid (dimensions fixed when cube built).
Roughly: ROLAP = flexible schemas on relational DB; MOLAP = pre-built cubes with high performance.

15
New cards

SQL OLAP extensions – ROLLUP & CUBE

ROLLUP: GROUP BY with automatic subtotals along the listed dimensions in order, plus a grand total.

CUBE: GROUP BY with all combinations of the listed dimensions, generating subtotals for
every subset plus grand total. These are used to precompute aggregates for OLAP-style analysis.

16
New cards

Modern architectures: warehouse, lake, lakehouse

Traditional warehouse: ETL, historical data, schema-on-write, mostly structured tables.
Data lake: raw, semi-structured, and streaming data stored first (often in cheap cloud storage);
schema applied later at read time (“schema-on-read”, ELT).
Lakehouse: merges lake + warehouse ideas; uses the lake as storage but layers
warehouse/BI/ML capabilities on top. Concepts like data mesh (domain-owned, federated data)
extend how organizations own and publish data products.