EXAM CRAM

0.0(0)
studied byStudied by 0 people
full-widthCall with Kai
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/129

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

130 Terms

1
New cards

what is business analytics

study of data through statistical and operational analysis, formation of predictive models, application of optimization techniques and communication of these results to customers

2
New cards

purpose of business analytics

turning big data sets into insights

3
New cards

why business analytics matter

helps firm move from initution-based decisions to data-driven decisions. gives businesses a competitive advantage

4
New cards

DELTA model

Data, Enterprise Orientation, Leadership, Targets, Analysts

5
New cards

D in DELTA Model

Data

accessible, high quality data sets

6
New cards

E in DELTA

Enterprise Orientation

analytics should be used across departments and not siloed

7
New cards

L in DELTA

Leadership

executives champion data-driven decisions

8
New cards

T in DELTA

Targets

clear strategic goals

9
New cards

A in DELTA

Analysts (and technology)

Skilled analysts should be using proper tooling

10
New cards

requirements for successful analytics implementation

high quality data, enterprise wide buy-in, strategic alignment, analytical skills

11
New cards

CRISP-DM

Cross-Industry Standard Practice for Data Mining

12
New cards

Step #1 in CRISP-DM

Business Understanding

define the business problem and objectives before touching the data. “what are we trying to improve by using this data?”

13
New cards

Step #2 in CRISP-DM

Data Understanding

get familiar with the data. collect initial data sources, describe the data and explore with visualizations

14
New cards

Step #3 in CRISP-DM

Data Preparation

make the data usable for modeling. clean missing or invalid values - this step usually takes 60-80% of the project time.

15
New cards

Step #4 in CRISP-DM

Modeling

build models that are actually able to describe and predict. select algorithms, set parameters and train/test using data splits.

16
New cards

model

simplified mathematical representation that helps you understand something

17
New cards

descriptive models

summarize what happened or what is happening (eg, a dashboard or summary report)

18
New cards

predictive models

uses historical data to forecast future outcomesreg

19
New cards

regression models

type of predictive model that tries to classify a number

20
New cards

classification models

type of predictive model that tries to predict a category (eg, fraud or no fraud, churn or no churn, etc)

21
New cards

prescriptive models

uses data + predictions to tell you what to do next

22
New cards

Step #5 in CRISP-DM

Evaluation

assess if the model actually meets the business goal that you were trying to solve.

23
New cards

Step #6 in CRISP-DM

Deployment

implement the insights of the model into business operations. includes monitoring and maintenance over time.

24
New cards

analytically impaired

decisions are made through guess work

25
New cards

localized analytics

isolated through teams, no coordination

26
New cards

analytical aspirations

some leadership support

27
New cards

analytical companies

consistent use of analytics in multiple areas

28
New cards

analytical competitors

analytics are embedded in the culture of the organization

29
New cards

data visualization

simplifying complex data to make patterns visible and understandable

30
New cards

histogram

shows data distribution & skewedness

31
New cards

box and whisker plot

detects outliers and variabilitys

32
New cards

scatter plots

shows correlation between two numeric variablesl

33
New cards

ine chart

displays change over time

34
New cards

bar chart

compares catagories

35
New cards

pie chart

shows part-to-whole relationshipsb

36
New cards

bubble chart

adds a third variable using bubble size

37
New cards

heat map

uses color intensity to represent values in different dimensions

38
New cards

stacked charts

compares multiple relationshipssc

39
New cards

scatter matrix

explores relationships among many variables

40
New cards

data-ink ratio

the ratio of ink used to display data vs. the ink used for decoration. when making data visualizations, remove unnecessary grid lines, shading, 3d effect, etc.

41
New cards

miller’s law

“magic number seven, plus or minus two”

humans can only hold 5-9 pieces of information, so keep dashboards and visualizations simple

42
New cards

performance dashboards

used to monitor business performance in real time. displays KPIs visually, and delivers the right metrics at the right time.

43
New cards

supervised learning

you know the outcome (target variable). used for prediction and classification

44
New cards

classification model

discrete outcome, used for decision trees and logistical regressionpr

45
New cards

prediction and regression model

continuous outcome, used with linear regression

46
New cards

unsupervised learning

no outcome variable to predict, but rather used to find patterns or groups

47
New cards

training data

building the model (fitting rules / patterns) that teaches the algorithms

48
New cards

testing data

evaluating the model (on unseen data), measures how well the model generalizes

49
New cards

overfitting a model

occurs when a model learns on noise and random patterns in the training data, rather than true underlying relationships.

50
New cards

classification modelling

assigning a record to a predetermined category based on input variables. learns the patterns that distinguish one class from another.

examples:

  • approve vs. reject loan applications

  • predict churn vs. no chrun

  • fraudulent vs. legitimate transactions

  • spam vs. legit email

51
New cards

decision tree model

predict a categorical outcome using a tree-like structure of if-then rules

52
New cards

root node of a decision tree

entire data set

53
New cards

internal nodes of a decision tree

tests on attributes

54
New cards

branches of a decision tree

outcomes of the tests

55
New cards

leaf nodes of a decision tree

final, predicted outcomes

56
New cards

induction process

goal is to split records into subsets that are homogenous as possible

57
New cards

split rules

used to determine which attribute is best for maximizing purity or information gain

58
New cards

gini index

measures how often a randomly chosen record would be misclassified if labelled randomly by the node’s distribution

a lower gini index is more pure

59
New cards

formula for gini index

1 - ∑(pi)2

60
New cards

entropy

measures disorder or uncertainty of the dataent

61
New cards

entropy formula

-∑[pi(t) logn pi(t)]

62
New cards

misclassification error

a simplified impurity measure.

63
New cards

stopping rules

when the tree knows when to stop splitting. trees will keep splitting until every record is perfectly classified, leading to overfitting.

64
New cards

common stopping rules

purity threshold, minimum records per node, maximum tree depth, no improvement in impurity

65
New cards

confusion matrix

compares actual vs. predicted classes

66
New cards

true positive (TP)

predicted positive, actual positive

67
New cards

false positive (FP)

predicted positive, actual negative

68
New cards

false negative (FN)

predicted negative, actual positive

69
New cards

true negative (TN)

predicted negative, actual negative

70
New cards

model accuracy metric 

(TP + TN) / (TP + FP + FN + TN)

71
New cards

model precision metric 

TP / (TP + FP)

72
New cards

model sensitivity metric 

TP / (TP + FN)

73
New cards

model F1 score metric 

2 x (precision x recall) / (precision + recall)

74
New cards

model recall

tells how complete your positive predictions are

75
New cards

model F1 score

balances precision and accuracy, better than just using accuracy

76
New cards

type I error

when the model predicts positive, when it’s actually negative

77
New cards

type I error formula

FP / (FP + TN)

78
New cards

type II error

when the model predicts “negative” when it’s actually positive

79
New cards

type II error formula

FN / (TP + FN)

80
New cards

minimize type I errors when…

false alarms are expensivemi

81
New cards

minimize type II errors when..

false positives are costly

82
New cards

imbalanced class problem

occurs when one class dominates a data set (eg, 97% churn vs. no churn 3%). can lead to rare but important cases being missed,

83
New cards

remidies

stratified sampling, oversampling minority, down-sampling the majority, used balance metrics, adjust classification threshold

84
New cards

stratified sampling

ensure both classes are proportionally represented in trainingoversam

85
New cards

oversampling minority

randomly duplicate minority cases so model learns

86
New cards

down-sampling the majority

randomly remove some majority examples to reduce imbalance

87
New cards

used balance metrics

use F1 & precision / recall instead of raw accuracy

88
New cards

root node

entire sample before splittingchi

89
New cards

ld node

segments created by splitting variables

90
New cards

predicted class

majority class within that node (eg income > 60%)

91
New cards

% of records

used to see how pure each node became after the splitc

92
New cards

gain / lift chart

shows how well model concentrates positive in top segments

93
New cards

confusion matrix output

reports TP, FP, TN, FN counts for overall accuracy

94
New cards

decision tree algorithms

CHAID, C&R, C5.0

95
New cards

CHAID

uses catagorical (nominal_ variables) does a chi-square test to find statistically significant relationships between predictor and target.

96
New cards

when to use CHAID

categorical predictors, large sample sizes

97
New cards

limitations of CHAID

doesn’t handle continuous targets, less effective with small data sets

98
New cards

C&R tree

uses catagorical OR classification variables. uses Gini index for classification, or least squares for regression

99
New cards

when to use C&R tree

when you need a robust, general purpose tree - works well for classification & regression and handles missing values well

100
New cards

limitations of the C&R tree

can overfit without data pruning, binary only