ISA 491 EXAM ONE TERMS & CONCEPTS

Studied by 11 people

0.0(0)

LearnA personalized and smart learning plan

Practice TestTake a test on your terms and definitions

Spaced RepetitionScientifically backed study method

Matching GameHow quick can you match all your cards?

FlashcardsStudy terms and definitions

1 / 93

Earn XP

Description and Tags

3/13/25

94 Terms

business analytics

process of transforming data into insights for informed decision-making

New cards

descriptive analytics

analyzing historical data to understand what has happened

purpose: provide context and trends
examples: sales reports, web traffic analysis

New cards

predictive analytics

using models to predict future outcomes

purpose: anticipate trends and behaviors
demand forecasting, churn prediction

New cards

prescriptive analytics

recommending actions based on predictive insights

purpose: optimize decision-making
examples: route optimization, personalized marketing

New cards

business intelligence

focuses on historical data and reporting

New cards

differences and similarities between BA and BI?

differences:

BI is descriptive
BA includes predictive modeling, forecasting and optimization

similarities:

both leverage data for insights

New cards

data science

technical, focused on algorithms

New cards

differences and similarities between BA and data science?

differences:

data science is more technical, focused on algorithms
business analytics emphasized business context

similarities:

both analyze data

New cards

data mining

discovering patterns in data

New cards

machine learning

algorithms that learn from data

New cards

artificial intelligence

broader concept of intelligent systems

New cards

big data

large, complex datasets requiring advanced tools

New cards

data-driven decision making

using data to guide actions and strategies

New cards

algorithm

A sequence of steps or rules followed to solve a problem or perform a computation. In machine learning, _________ process data to create models.

New cards

attribute, predictor, or input variable

Variables or features used to predict the outcome. These are independent variables in statistical models.

New cards

case, observation, record

a single unit of data in a dataset, often represented as a row in a table

New cards

categorical or factor variable

a variable that represents categories or groups

examples: gender, product type, region

New cards

confidence

statistical definition: A measure of certainty in an estimate, often represented by a confidence interval.
machine learning definition: The probability assigned to a predicted class or outcome.

New cards

dependent, response, target, or outcome variable

the variable being predicted or explained in a model

examples: sales revenue, customer churn, temperature

New cards

estimation

determining unknown parameters of a model based on data

New cards

prediction

using a model to forecast outcomes for new or unseen data

New cards

holdout data or set

a subset of data kept separate from training to evaluate a model’s performance on unseen data

New cards

inference

statistical definition: a measure of certainty in an estimate, often represented by a confidence interval
machine learning definition: the probability assigned to a predicted class or outcome

New cards

model

a mathematical representation of the relationships between variables

in ML, models are built to make predictions or classification

New cards

conditional probability

the probability of an event occurring given that another event has occurred

denoted as P(A|B)

New cards

prediction

the outcome or value that a model forecasts for given input data

New cards

profile

a description or summary of data for a single case or group of cases, often used for analysis or comparison

New cards

sample

a subset of data taken from a larger population, used for analysis

New cards

score

a numeric output from a model, often indicating the likelihood of a certain outcome

New cards

success class

the outcome of interest in a classification problem. For example, predicting “Yes” for a churn model

New cards

supervised learning

a type of machine learning where the model is trained on labeled data (i.e., inputs paired with known outputs)

New cards

test data or set

data used to evaluate the final performance of a model after training and validation

New cards

training data or set

data used to build and train a model, including identifying relationships and adjusting parameters

New cards

unsupervised learning

a type of machine learning where the model learns patterns and structures from unlabeled data

New cards

validation data or set

data used during training to tune model parameters and avoid overfitting

New cards

variable

a measurable characteristic or attribute in a dataset.

- can be dependent, independent, categorical, or continuous

New cards

A programming language for statistical computing and graphics.

Open-source and widely used in data analysis and machine learning.
Command-line interface for running code.

New cards

RStudio

An integrated development environment (IDE) for R.

Provides a user-friendly interface with features like:
- Script editor
- Console
- Environment pane
- Plot viewer

New cards

package (in R)

a collection of functions, datasets, and compiled code that extends the functionality of R

New cards

literate programming

a programming paradigm introduced by Donald Knuth that combines human-readable text with executable code.

New cards

core principle of literate programming

“Code is written for humans to read and only incidentally for machines to execute.”

New cards

core ideas of machine learning

classification
prediction
association rules & recommenders
data & dimension reduction
data exploration
visualization
generative AI

New cards

classification

Predicting categorical target variables using algorithms like decision trees and logistic regression

New cards

prediction

Estimating numerical values using methods like linear regression and neural networks

New cards

association rules & recommenders

Identifying relationships between variables to recommend items based on purchasing behavior (e.g., Apriori algorithm)

New cards

data & dimension reduction

Simplifying datasets into more manageable forms by reducing or combining variables with techniques like Principal Component Analysis (PCA)

New cards

data exploration

summarizing and visualizing data to uncover patterns, identify anomalies, and refine questions for analysis

New cards

visualization

conveys information and insights effectively using charts, graphs, and plots

New cards

generative AI

involves creating new data instances that resemble the input data

New cards

steps for machine learning

understand purpose of project
obtain data
sample (optional)
explore, preprocess, and prepare data
reduce dimensions
partition data (for supervised tasks)
choose machine learning techniques and apply them
interpret and assess results
deploy the model

New cards

clustering

common unsupervised learning technique used to group similar data points into clusters based on their feastures

New cards

overfitting

occurs when a model learns not only the underlying patterns in the data but also the random noise or idiosyncrasies

New cards

cross-validation

statistical technique used to evaluate a model’s ability to generalize to new data

take training data and divide into certain # of folds
calculate holdout R² / RMSE for each fold
average values

New cards

Root Mean Squared Error

Represents the square root of the average squared differences between predicted and actual values

A lower value indicates better model performance

New cards

R-squared

Indicates the proportion of variance in the target variable that is explained by the model

Higher values (closer to 1) suggest better explanatory power of the model

New cards

purposes of exploratory data analysis

Exploration and Data Preparation: Investigating the dataset to identify patterns, anomalies, and necessary preprocessing steps for modeling.
Presentation and Storytelling: Using visualizations and summaries to communicate insights, often for stakeholders or reports.

New cards

variable encoding in data analysis

representing the data in a specific format

New cards

simple imputation

replace the missing values with a constant value based on the data

examples: mean, median, mode, constant

New cards

time series imputation

replace using the time order of the data

New cards

interpolation

estimate missing values using interpolation

New cards

predictive imputation

use a model to predict the missing value based on other variables in the data set

examples: linear regression, k-nearest neighbors (KNN)

New cards

advanced methods of imputation

examples: multiple imputation, maximum likelihood estimation

New cards

outlier

data point that is significantly different from other observations in the dataset

New cards

dimension (p) of a data set

number of variables (columns) of the data set

New cards

principal components analysis (PCA)

method that uses geometry to create a new coordinate system for the data based on the correlation structure

first coordinate contains the most variation (information) in the data; second contains the second most variation, etc.

New cards

total variation of PCA

sum of the variances of each variable

New cards

covariance

measure of the unscaled linear association between two variables

New cards

evaluating explanatory models

Theoretical Justification: Does the model align with existing theories and concepts?
Model Fit: How well does the model fit the training sample?
Variable Significance: Which predictors are statistically significant?
Model Interpretability: Can the model results be easily understood and communicated?
Hypothesis Testing: Can the model be used to test specific hypotheses or theories?

New cards

evaluating predictive models

Prediction Accuracy: How well does the model predict new observations (in validation/test samples)?
Generalization: How well does the model perform on unseen data?
Model Complexity: Is the model simple enough to avoid overfitting?
Business Value: Does the model provide actionable insights and support decision-making?

New cards

three uses of predictive models

prediction: estimates a continuous numerical value based on input data
classification: assigns a categorical label to an observation based on input features
propensity or ranking: predicts the likelihood or ranking of outcomes rather than a direct value or label

New cards

predictive accuracy

measures how well the model predicts new observations

essential for assessing the model’s generalization to unseen data

New cards

naive benchmark

simplest form of predictive performance evaluation

New cards

mean absolute error (MAE)

measures the average magnitude of prediction errors without considering their direction

New cards

mean error

captures the average error, retaining the signs of the errors(positive or negative)

New cards

mean percentage error (MPE)

gives the percentage score of how predictions deviate from the actuals (on average)

New cards

mean absolute percentage error (MAPE)

measures the average percentage deviation of predictions from actual values

New cards

root mean squared error

similar to the standard error estimate in regression, but is computed on the holdout sample rather than the training sample

New cards

training data & errors

data: used to train the model
errors: called residuals in regression

New cards

holdout data & errors

data: reserved for evaluation to mimic real-world performance
errors: computed by comparing the predicted to actual values on new data

New cards

cumulative gains curve

shows the proportion of the actual number of positive cases (y-axis) that are captured by considering a certain proportion of the dataset (x-axis)

useful for evaluating the model’s ability to rank cases in order of performance

New cards

lift

measures the ratio of the cumulative gains of the model to the cumulative gains of a baseline model

shows how much better the model is at identifying high-value cases compared to random selection

New cards

oversampling

method to over represent the class of interest; will improve ability to develop a model that predicts our class of interest

New cards

using MLR for prediction vs explanation

predictive model:

how well the model will perform on new data
measures the fit between the model and new, unseen data
focuses on small prediction errors (we want to minimize the difference between the predicted and actual values of response variable)

explanatory model:

good model fits the data well and has interpretable coefficients
uses the entire dataset to estimate the best-fit model
measures the fit between the model and the training data

New cards

forward selection

adds variables sequentially to the model based on the next best variable

New cards

backward selection

removes variables sequentially from the model based on the next worst variable

New cards

stepwise selection

combines forward and backward selection; variables are added and possibly removed at each step

New cards

regularization or shrinkage

method where we shrink the coefficients toward zero; imposes a penalty on the model fit

New cards

two methods of regularization

Lasso and Ridge

New cards

ridge regression

the penalty is based on the sum of the squared regression coefficients, ∑𝑝𝑗=1𝛽2𝑗∑j=1pβj2. This is called L2 regularization

New cards

lasso regression

the penalty is based on the sum of the absolute values of the regression coefficients, ∑𝑝𝑗=1|𝛽𝑗|∑j=1p|βj|. This is called L1 regularization

New cards

ordinary least squares regression (OLS)

we select the estimated coefficients by minimizing the training SSE:

∑𝑖=1𝑛(𝑦𝑖−𝑦̂ 𝑖)^2

New cards

elastic net

an approach that combines L1 and L2 and has two tuning parameters, α and λ

New cards

alpha (α)

a mixing parameter that determines the mix of L1 and L2 regularization

equal to 0: elastic net is equivalent to ridge regression
equal to 1: elastic net is equivalent to lasso regression
between 0 and 1: elastic net is a combination of ridge and lasso regression

New cards

tuning parameter that controls the overall amount of shrinkage

New cards