1/58
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Data Frame
table-like object where each column can have a different type.
Factor
categorical variable storing labels as numeric codes + levels.
complete.cases()
!complete.cases() finds rows with missing values (use for removal or checking).
Elbow Method
Run kmeans() for different k values and plot total within-cluster sum of squares. "Elbow" point = optimal number of clusters.
results$cluster
cluster assignment per record from kmeans().
results$withinss
within-cluster variance from kmeans().
Business Intelligence (BI)
Process of transforming raw data into meaningful insights to support decision-making.
BI Architecture Components
Data Sources - internal/external systems, ETL (Extract, Transform, Load) - cleans and loads data, Data Warehouse - centralized storage, Analytics Tools - OLAP, visualization, reporting.
Data Warehousing
Centralized repository for historical, integrated, nonvolatile, time-variant data to support decision-making.
Subject-oriented
organized by topic (sales, finance).
Integrated
data combined from multiple sources.
Time-variant
contains historical data.
Nonvolatile
data doesn't change once entered.
Data Mart vs Data Warehouse
Data Warehouse is enterprise-wide, while Data Mart is department-level.
Data Warehouse Scope
Enterprise-wide.
Data Mart Scope
Department-level.
Data Warehouse Source
Multiple systems.
Data Mart Source
Subset of DW or single system.
Data Warehouse Type
Centralized.
Data Mart Type
Focused.
Dependent mart
draws data from a central warehouse.
Independent mart
built directly from operational systems.
Metadata
Data about data.
OLTP
Transaction processing.
OLAP
Analytical reporting.
Data Type in OLTP
Real-time, detailed.
Data Type in OLAP
Historical, summarized.
Speed in OLTP
Fast writes.
Speed in OLAP
Fast reads.
Users of OLTP
Operational staff.
Users of OLAP
Analysts/managers.
Example of OLTP
POS system.
Example of OLAP
Data warehouse.
Slice Operation
Focus on one dimension.
Dice Operation
Focus on two or more dimensions.
Roll-up Operation
Summarize data (e.g., daily → monthly).
Drill-down Operation
Increase detail (e.g., region → city).
Pivot Operation
Rotate data to view from different perspectives.
ETL Process
Extract: Pull data from multiple sources; Transform: Clean, standardize, and format data; Load: Insert into target data warehouse.
Independent Data Mart Architecture
Isolated, small-scale.
Data Mart Bus Architecture
Bottom-up, integrated marts.
Centralized Warehouse
Top-down, enterprise-wide.
Business Performance Management (BPM)
Goal: Monitor and improve performance aligned with strategy.
KPI
Quantifiable performance indicator (e.g., profit margin, churn rate).
Balanced Scorecard
Links financial and non-financial KPIs to strategy.
Six Sigma
Focus on reducing process variation and defects.
Data Mining
The nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data.
CRISP-DM Framework
Cross Industry Standard Process for Data Mining with six phases: Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, Deployment.
Supervised Learning
Find hidden structure with labeled data to predict known outcomes.
Unsupervised Learning
Find hidden structure without labeled data.
Classification
Predict categorical outcomes (Yes/No, High/Low).
Prediction
Predict continuous values (Sales $, Temperature).
Confusion Matrix
Summarizes correct vs incorrect classifications.
Association (Market Basket Analysis)
Goal: Discover items that frequently occur together in transactions.
Support
Frequency of rule in dataset.
Confidence
Probability of consequent given antecedent.
Lift
Strength of association beyond random chance.
Clustering
Group similar records (unsupervised).
Segmentation
Broader business application of clustering (e.g., customer types).