Data Analysis and Business Intelligence: Key Concepts and Techniques

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/58

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

59 Terms

1
New cards

Data Frame

table-like object where each column can have a different type.

2
New cards

Factor

categorical variable storing labels as numeric codes + levels.

3
New cards

complete.cases()

!complete.cases() finds rows with missing values (use for removal or checking).

4
New cards

Elbow Method

Run kmeans() for different k values and plot total within-cluster sum of squares. "Elbow" point = optimal number of clusters.

5
New cards

results$cluster

cluster assignment per record from kmeans().

6
New cards

results$withinss

within-cluster variance from kmeans().

7
New cards

Business Intelligence (BI)

Process of transforming raw data into meaningful insights to support decision-making.

8
New cards

BI Architecture Components

Data Sources - internal/external systems, ETL (Extract, Transform, Load) - cleans and loads data, Data Warehouse - centralized storage, Analytics Tools - OLAP, visualization, reporting.

9
New cards

Data Warehousing

Centralized repository for historical, integrated, nonvolatile, time-variant data to support decision-making.

10
New cards

Subject-oriented

organized by topic (sales, finance).

11
New cards

Integrated

data combined from multiple sources.

12
New cards

Time-variant

contains historical data.

13
New cards

Nonvolatile

data doesn't change once entered.

14
New cards

Data Mart vs Data Warehouse

Data Warehouse is enterprise-wide, while Data Mart is department-level.

15
New cards

Data Warehouse Scope

Enterprise-wide.

16
New cards

Data Mart Scope

Department-level.

17
New cards

Data Warehouse Source

Multiple systems.

18
New cards

Data Mart Source

Subset of DW or single system.

19
New cards

Data Warehouse Type

Centralized.

20
New cards

Data Mart Type

Focused.

21
New cards

Dependent mart

draws data from a central warehouse.

22
New cards

Independent mart

built directly from operational systems.

23
New cards

Metadata

Data about data.

24
New cards

OLTP

Transaction processing.

25
New cards

OLAP

Analytical reporting.

26
New cards

Data Type in OLTP

Real-time, detailed.

27
New cards

Data Type in OLAP

Historical, summarized.

28
New cards

Speed in OLTP

Fast writes.

29
New cards

Speed in OLAP

Fast reads.

30
New cards

Users of OLTP

Operational staff.

31
New cards

Users of OLAP

Analysts/managers.

32
New cards

Example of OLTP

POS system.

33
New cards

Example of OLAP

Data warehouse.

34
New cards

Slice Operation

Focus on one dimension.

35
New cards

Dice Operation

Focus on two or more dimensions.

36
New cards

Roll-up Operation

Summarize data (e.g., daily → monthly).

37
New cards

Drill-down Operation

Increase detail (e.g., region → city).

38
New cards

Pivot Operation

Rotate data to view from different perspectives.

39
New cards

ETL Process

Extract: Pull data from multiple sources; Transform: Clean, standardize, and format data; Load: Insert into target data warehouse.

40
New cards

Independent Data Mart Architecture

Isolated, small-scale.

41
New cards

Data Mart Bus Architecture

Bottom-up, integrated marts.

42
New cards

Centralized Warehouse

Top-down, enterprise-wide.

43
New cards

Business Performance Management (BPM)

Goal: Monitor and improve performance aligned with strategy.

44
New cards

KPI

Quantifiable performance indicator (e.g., profit margin, churn rate).

45
New cards

Balanced Scorecard

Links financial and non-financial KPIs to strategy.

46
New cards

Six Sigma

Focus on reducing process variation and defects.

47
New cards

Data Mining

The nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data.

48
New cards

CRISP-DM Framework

Cross Industry Standard Process for Data Mining with six phases: Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, Deployment.

49
New cards

Supervised Learning

Find hidden structure with labeled data to predict known outcomes.

50
New cards

Unsupervised Learning

Find hidden structure without labeled data.

51
New cards

Classification

Predict categorical outcomes (Yes/No, High/Low).

52
New cards

Prediction

Predict continuous values (Sales $, Temperature).

53
New cards

Confusion Matrix

Summarizes correct vs incorrect classifications.

54
New cards

Association (Market Basket Analysis)

Goal: Discover items that frequently occur together in transactions.

55
New cards

Support

Frequency of rule in dataset.

56
New cards

Confidence

Probability of consequent given antecedent.

57
New cards

Lift

Strength of association beyond random chance.

58
New cards

Clustering

Group similar records (unsupervised).

59
New cards

Segmentation

Broader business application of clustering (e.g., customer types).