EXAM CRAM

0.0(0)

Studied by 0 people

Call with Kai

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/129

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

130 Terms

New cards

what is business analytics

study of data through statistical and operational analysis, formation of predictive models, application of optimization techniques and communication of these results to customers

New cards

purpose of business analytics

turning big data sets into insights

New cards

why business analytics matter

helps firm move from initution-based decisions to data-driven decisions. gives businesses a competitive advantage

New cards

DELTA model

Data, Enterprise Orientation, Leadership, Targets, Analysts

New cards

D in DELTA Model

Data

accessible, high quality data sets

New cards

E in DELTA

Enterprise Orientation

analytics should be used across departments and not siloed

New cards

L in DELTA

Leadership

executives champion data-driven decisions

New cards

T in DELTA

Targets

clear strategic goals

New cards

A in DELTA

Analysts (and technology)

Skilled analysts should be using proper tooling

New cards

requirements for successful analytics implementation

high quality data, enterprise wide buy-in, strategic alignment, analytical skills

New cards

CRISP-DM

Cross-Industry Standard Practice for Data Mining

New cards

Step #1 in CRISP-DM

Business Understanding

define the business problem and objectives before touching the data. “what are we trying to improve by using this data?”

New cards

Step #2 in CRISP-DM

Data Understanding

get familiar with the data. collect initial data sources, describe the data and explore with visualizations

New cards

Step #3 in CRISP-DM

Data Preparation

make the data usable for modeling. clean missing or invalid values - this step usually takes 60-80% of the project time.

New cards

Step #4 in CRISP-DM

Modeling

build models that are actually able to describe and predict. select algorithms, set parameters and train/test using data splits.

New cards

model

simplified mathematical representation that helps you understand something

New cards

descriptive models

summarize what happened or what is happening (eg, a dashboard or summary report)

New cards

predictive models

uses historical data to forecast future outcomesreg

New cards

regression models

type of predictive model that tries to classify a number

New cards

classification models

type of predictive model that tries to predict a category (eg, fraud or no fraud, churn or no churn, etc)

New cards

prescriptive models

uses data + predictions to tell you what to do next

New cards

Step #5 in CRISP-DM

Evaluation

assess if the model actually meets the business goal that you were trying to solve.

New cards

Step #6 in CRISP-DM

Deployment

implement the insights of the model into business operations. includes monitoring and maintenance over time.

New cards

analytically impaired

decisions are made through guess work

New cards

localized analytics

isolated through teams, no coordination

New cards

analytical aspirations

some leadership support

New cards

analytical companies

consistent use of analytics in multiple areas

New cards

analytical competitors

analytics are embedded in the culture of the organization

New cards

data visualization

simplifying complex data to make patterns visible and understandable

New cards

histogram

shows data distribution & skewedness

New cards

box and whisker plot

detects outliers and variabilitys

New cards

scatter plots

shows correlation between two numeric variablesl

New cards

ine chart

displays change over time

New cards

bar chart

compares catagories

New cards

pie chart

shows part-to-whole relationshipsb

New cards

bubble chart

adds a third variable using bubble size

New cards

heat map

uses color intensity to represent values in different dimensions

New cards

stacked charts

compares multiple relationshipssc

New cards

scatter matrix

explores relationships among many variables

New cards

data-ink ratio

the ratio of ink used to display data vs. the ink used for decoration. when making data visualizations, remove unnecessary grid lines, shading, 3d effect, etc.

New cards

miller’s law

“magic number seven, plus or minus two”

humans can only hold 5-9 pieces of information, so keep dashboards and visualizations simple

New cards

performance dashboards

used to monitor business performance in real time. displays KPIs visually, and delivers the right metrics at the right time.

New cards

supervised learning

you know the outcome (target variable). used for prediction and classification

New cards

classification model

discrete outcome, used for decision trees and logistical regressionpr

New cards

prediction and regression model

continuous outcome, used with linear regression

New cards

unsupervised learning

no outcome variable to predict, but rather used to find patterns or groups

New cards

training data

building the model (fitting rules / patterns) that teaches the algorithms

New cards

testing data

evaluating the model (on unseen data), measures how well the model generalizes

New cards

overfitting a model

occurs when a model learns on noise and random patterns in the training data, rather than true underlying relationships.

New cards

classification modelling

assigning a record to a predetermined category based on input variables. learns the patterns that distinguish one class from another.

examples:

approve vs. reject loan applications
predict churn vs. no chrun
fraudulent vs. legitimate transactions
spam vs. legit email

New cards

decision tree model

predict a categorical outcome using a tree-like structure of if-then rules

New cards

root node of a decision tree

entire data set

New cards

internal nodes of a decision tree

tests on attributes

New cards

branches of a decision tree

outcomes of the tests

New cards

leaf nodes of a decision tree

final, predicted outcomes

New cards

induction process

goal is to split records into subsets that are homogenous as possible

New cards

split rules

used to determine which attribute is best for maximizing purity or information gain

New cards

gini index

measures how often a randomly chosen record would be misclassified if labelled randomly by the node’s distribution

a lower gini index is more pure

New cards

formula for gini index

1 - ∑(pi)²

New cards

entropy

measures disorder or uncertainty of the dataent

New cards

entropy formula

-∑[p_i(t) log_np_i(t)]

New cards

misclassification error

a simplified impurity measure.

New cards

stopping rules

when the tree knows when to stop splitting. trees will keep splitting until every record is perfectly classified, leading to overfitting.

New cards

common stopping rules

purity threshold, minimum records per node, maximum tree depth, no improvement in impurity

New cards

confusion matrix

compares actual vs. predicted classes

New cards

true positive (TP)

predicted positive, actual positive

New cards

false positive (FP)

predicted positive, actual negative

New cards

false negative (FN)

predicted negative, actual positive

New cards

true negative (TN)

predicted negative, actual negative

New cards

model accuracy metric

(TP + TN) / (TP + FP + FN + TN)

New cards

model precision metric

TP / (TP + FP)

New cards

model sensitivity metric

TP / (TP + FN)

New cards

model F1 score metric

2 x (precision x recall) / (precision + recall)

New cards

model recall

tells how complete your positive predictions are

New cards

model F1 score

balances precision and accuracy, better than just using accuracy

New cards

type I error

when the model predicts positive, when it’s actually negative

New cards

type I error formula

FP / (FP + TN)

New cards

type II error

when the model predicts “negative” when it’s actually positive

New cards

type II error formula

FN / (TP + FN)

New cards

minimize type I errors when…

false alarms are expensivemi

New cards

minimize type II errors when..

false positives are costly

New cards

imbalanced class problem

occurs when one class dominates a data set (eg, 97% churn vs. no churn 3%). can lead to rare but important cases being missed,

New cards

remidies

stratified sampling, oversampling minority, down-sampling the majority, used balance metrics, adjust classification threshold

New cards

stratified sampling

ensure both classes are proportionally represented in trainingoversam

New cards

oversampling minority

randomly duplicate minority cases so model learns

New cards

down-sampling the majority

randomly remove some majority examples to reduce imbalance

New cards

used balance metrics

use F1 & precision / recall instead of raw accuracy

New cards

root node

entire sample before splittingchi

New cards

ld node

segments created by splitting variables

New cards

predicted class

majority class within that node (eg income > 60%)

New cards

% of records

used to see how pure each node became after the splitc

New cards

gain / lift chart

shows how well model concentrates positive in top segments

New cards

confusion matrix output

reports TP, FP, TN, FN counts for overall accuracy

New cards

decision tree algorithms

CHAID, C&R, C5.0

New cards

CHAID

uses catagorical (nominal_ variables) does a chi-square test to find statistically significant relationships between predictor and target.

New cards

when to use CHAID

categorical predictors, large sample sizes

New cards

limitations of CHAID

doesn’t handle continuous targets, less effective with small data sets

New cards

C&R tree

uses catagorical OR classification variables. uses Gini index for classification, or least squares for regression

New cards

when to use C&R tree

when you need a robust, general purpose tree - works well for classification & regression and handles missing values well

100

New cards

limitations of the C&R tree

can overfit without data pruning, binary only