3/13/25
business analytics
process of transforming data into insights for informed decision-making
descriptive analytics
analyzing historical data to understand what has happened
purpose: provide context and trends
examples: sales reports, web traffic analysis
predictive analytics
using models to predict future outcomes
purpose: anticipate trends and behaviors
demand forecasting, churn prediction
prescriptive analytics
recommending actions based on predictive insights
purpose: optimize decision-making
examples: route optimization, personalized marketing
business intelligence
focuses on historical data and reporting
differences and similarities between BA and BI?
differences:
BI is descriptive
BA includes predictive modeling, forecasting and optimization
similarities:
both leverage data for insights
data science
technical, focused on algorithms
differences and similarities between BA and data science?
differences:
data science is more technical, focused on algorithms
business analytics emphasized business context
similarities:
both analyze data
data mining
discovering patterns in data
machine learning
algorithms that learn from data
artificial intelligence
broader concept of intelligent systems
big data
large, complex datasets requiring advanced tools
data-driven decision making
using data to guide actions and strategies
algorithm
A sequence of steps or rules followed to solve a problem or perform a computation. In machine learning, _________ process data to create models.
attribute, predictor, or input variable
Variables or features used to predict the outcome. These are independent variables in statistical models.
case, observation, record
a single unit of data in a dataset, often represented as a row in a table
categorical or factor variable
a variable that represents categories or groups
examples: gender, product type, region
confidence
statistical definition: A measure of certainty in an estimate, often represented by a confidence interval.
machine learning definition: The probability assigned to a predicted class or outcome.
dependent, response, target, or outcome variable
the variable being predicted or explained in a model
examples: sales revenue, customer churn, temperature
estimation
determining unknown parameters of a model based on data
prediction
using a model to forecast outcomes for new or unseen data
holdout data or set
a subset of data kept separate from training to evaluate a model’s performance on unseen data
inference
statistical definition: a measure of certainty in an estimate, often represented by a confidence interval
machine learning definition: the probability assigned to a predicted class or outcome
model
a mathematical representation of the relationships between variables
in ML, models are built to make predictions or classification
conditional probability
the probability of an event occurring given that another event has occurred
denoted as P(A|B)
prediction
the outcome or value that a model forecasts for given input data
profile
a description or summary of data for a single case or group of cases, often used for analysis or comparison
sample
a subset of data taken from a larger population, used for analysis
score
a numeric output from a model, often indicating the likelihood of a certain outcome
success class
the outcome of interest in a classification problem. For example, predicting “Yes” for a churn model
supervised learning
a type of machine learning where the model is trained on labeled data (i.e., inputs paired with known outputs)
test data or set
data used to evaluate the final performance of a model after training and validation
training data or set
data used to build and train a model, including identifying relationships and adjusting parameters
unsupervised learning
a type of machine learning where the model learns patterns and structures from unlabeled data
validation data or set
data used during training to tune model parameters and avoid overfitting
variable
a measurable characteristic or attribute in a dataset.
- can be dependent, independent, categorical, or continuous
R
A programming language for statistical computing and graphics.
Open-source and widely used in data analysis and machine learning.
Command-line interface for running code.
RStudio
An integrated development environment (IDE) for R.
Provides a user-friendly interface with features like:
Script editor
Console
Environment pane
Plot viewer
package (in R)
a collection of functions, datasets, and compiled code that extends the functionality of R
literate programming
a programming paradigm introduced by Donald Knuth that combines human-readable text with executable code.
core principle of literate programming
“Code is written for humans to read and only incidentally for machines to execute.”
core ideas of machine learning
classification
prediction
association rules & recommenders
data & dimension reduction
data exploration
visualization
generative AI
classification
Predicting categorical target variables using algorithms like decision trees and logistic regression
prediction
Estimating numerical values using methods like linear regression and neural networks
association rules & recommenders
Identifying relationships between variables to recommend items based on purchasing behavior (e.g., Apriori algorithm)
data & dimension reduction
Simplifying datasets into more manageable forms by reducing or combining variables with techniques like Principal Component Analysis (PCA)
data exploration
summarizing and visualizing data to uncover patterns, identify anomalies, and refine questions for analysis
visualization
conveys information and insights effectively using charts, graphs, and plots
generative AI
involves creating new data instances that resemble the input data
steps for machine learning
understand purpose of project
obtain data
sample (optional)
explore, preprocess, and prepare data
reduce dimensions
partition data (for supervised tasks)
choose machine learning techniques and apply them
interpret and assess results
deploy the model
clustering
common unsupervised learning technique used to group similar data points into clusters based on their feastures
overfitting
occurs when a model learns not only the underlying patterns in the data but also the random noise or idiosyncrasies
cross-validation
statistical technique used to evaluate a model’s ability to generalize to new data
take training data and divide into certain # of folds
calculate holdout R² / RMSE for each fold
average values
Root Mean Squared Error
Represents the square root of the average squared differences between predicted and actual values
A lower value indicates better model performance
R-squared
Indicates the proportion of variance in the target variable that is explained by the model
Higher values (closer to 1) suggest better explanatory power of the model
purposes of exploratory data analysis
Exploration and Data Preparation: Investigating the dataset to identify patterns, anomalies, and necessary preprocessing steps for modeling.
Presentation and Storytelling: Using visualizations and summaries to communicate insights, often for stakeholders or reports.
variable encoding in data analysis
representing the data in a specific format
simple imputation
replace the missing values with a constant value based on the data
examples: mean, median, mode, constant
time series imputation
replace using the time order of the data
interpolation
estimate missing values using interpolation
predictive imputation
use a model to predict the missing value based on other variables in the data set
examples: linear regression, k-nearest neighbors (KNN)
advanced methods of imputation
examples: multiple imputation, maximum likelihood estimation
outlier
data point that is significantly different from other observations in the dataset
dimension (p) of a data set
number of variables (columns) of the data set
principal components analysis (PCA)
method that uses geometry to create a new coordinate system for the data based on the correlation structure
first coordinate contains the most variation (information) in the data; second contains the second most variation, etc.
total variation of PCA
sum of the variances of each variable
covariance
measure of the unscaled linear association between two variables
evaluating explanatory models
Theoretical Justification: Does the model align with existing theories and concepts?
Model Fit: How well does the model fit the training sample?
Variable Significance: Which predictors are statistically significant?
Model Interpretability: Can the model results be easily understood and communicated?
Hypothesis Testing: Can the model be used to test specific hypotheses or theories?
evaluating predictive models
Prediction Accuracy: How well does the model predict new observations (in validation/test samples)?
Generalization: How well does the model perform on unseen data?
Model Complexity: Is the model simple enough to avoid overfitting?
Business Value: Does the model provide actionable insights and support decision-making?
three uses of predictive models
prediction: estimates a continuous numerical value based on input data
classification: assigns a categorical label to an observation based on input features
propensity or ranking: predicts the likelihood or ranking of outcomes rather than a direct value or label
predictive accuracy
measures how well the model predicts new observations
essential for assessing the model’s generalization to unseen data
naive benchmark
simplest form of predictive performance evaluation
mean absolute error (MAE)
measures the average magnitude of prediction errors without considering their direction
mean error
captures the average error, retaining the signs of the errors(positive or negative)
mean percentage error (MPE)
gives the percentage score of how predictions deviate from the actuals (on average)
mean absolute percentage error (MAPE)
measures the average percentage deviation of predictions from actual values
root mean squared error
similar to the standard error estimate in regression, but is computed on the holdout sample rather than the training sample
training data & errors
data: used to train the model
errors: called residuals in regression
holdout data & errors
data: reserved for evaluation to mimic real-world performance
errors: computed by comparing the predicted to actual values on new data
cumulative gains curve
shows the proportion of the actual number of positive cases (y-axis) that are captured by considering a certain proportion of the dataset (x-axis)
useful for evaluating the model’s ability to rank cases in order of performance
lift
measures the ratio of the cumulative gains of the model to the cumulative gains of a baseline model
shows how much better the model is at identifying high-value cases compared to random selection
oversampling
method to over represent the class of interest; will improve ability to develop a model that predicts our class of interest
using MLR for prediction vs explanation
predictive model:
how well the model will perform on new data
measures the fit between the model and new, unseen data
focuses on small prediction errors (we want to minimize the difference between the predicted and actual values of response variable)
explanatory model:
good model fits the data well and has interpretable coefficients
uses the entire dataset to estimate the best-fit model
measures the fit between the model and the training data
forward selection
adds variables sequentially to the model based on the next best variable
backward selection
removes variables sequentially from the model based on the next worst variable
stepwise selection
combines forward and backward selection; variables are added and possibly removed at each step
regularization or shrinkage
method where we shrink the coefficients toward zero; imposes a penalty on the model fit
two methods of regularization
Lasso and Ridge
ridge regression
the penalty is based on the sum of the squared regression coefficients, ∑𝑝𝑗=1𝛽2𝑗∑j=1pβj2. This is called L2 regularization
lasso regression
the penalty is based on the sum of the absolute values of the regression coefficients, ∑𝑝𝑗=1|𝛽𝑗|∑j=1p|βj|. This is called L1 regularization
ordinary least squares regression (OLS)
we select the estimated coefficients by minimizing the training SSE:
∑𝑖=1𝑛(𝑦𝑖−𝑦̂ 𝑖)^2
elastic net
an approach that combines L1 and L2 and has two tuning parameters, α and λ
alpha (α)
a mixing parameter that determines the mix of L1 and L2 regularization
equal to 0: elastic net is equivalent to ridge regression
equal to 1: elastic net is equivalent to lasso regression
between 0 and 1: elastic net is a combination of ridge and lasso regression
λ
tuning parameter that controls the overall amount of shrinkage