1/118
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is Business Analytics?
Business Analytics is the process of transforming data into actions through analysis and insights in the context of organizational decision making and problem-solving.
Three Disciplines of Business Analytics
Business Intelligence (collects and manages data), Statistics (analyzes data relationships), and Operations Research/Management Science (provides solutions using models).
Examples of Business Analytics Applications
Pricing, customer segmentation, merchandising, location analysis, supply chain design, staffing, and healthcare optimization.
Importance of Business Analytics
Organizations that effectively use analytics report improved decision-making, efficiency, productivity, and customer satisfaction.
Challenges in Business Analytics
Lack of understanding, competing priorities, insufficient analytical skills, poor data quality, and unclear ROI.
Descriptive Analytics
Analyzes historical data to understand trends and patterns. Methods include descriptive statistics, charts, and probability distributions.
Predictive Analytics
Uses statistical, information system, and operations research methods to predict future outcomes. Example: regression and data mining.
Prescriptive Analytics
Applies decision science and optimization models to recommend actions and allocate resources efficiently.
Business Analytics vs. Predictive Modeling
Business Analytics uses data-driven insights for decisions; Predictive Modeling uses statistical techniques to forecast unknown events.
Common Predictive Models
Decision Trees, Regression Models, Cluster Models, and Time Series Models.
Applications of Predictive Modeling
Forecasting financial performance, predicting consumer behavior, loan defaults, and product life cycles.
Sales-Promotion Decision Model
Predicts sales based on variables like price, coupons, and advertising to guide marketing decisions.
Regression in Predictive Modeling
Models the relationship between dependent and independent variables using equations like Y = f(X) + error.
Regression vs Classification
Regression predicts continuous values; Classification predicts categorical outcomes (e.g., fraud detection, spam filtering).
Elements of Prediction
Prediction involves assigning a value to an unknown target variable (y) based on known predictors (x).
Logic of Predictive Modeling
Original data reveals X-Y relationships; live data applies the model to predict unknown outcomes.
Simple Linear Regression Example
Predict hotel price (Y) from distance to city center (X) using a linear model: Price = β0 + β1 × Distance.
Types of Predictions
Quantitative prediction (numeric values), Probability prediction (likelihood), Classification (categories).
Point vs Interval Prediction
Point gives a single predicted value; Interval gives a range where the true value likely falls.
Forecasting
Predicts future values of a variable, often using time series data.
Steps of Simple Linear Regression
1) Prepare data, 2) Explore data, 3) Fit model, 4) Test significance, 5) Evaluate model, 6) Check assumptions, 7) Interpret coefficients.
Significance Testing in Regression
Includes coefficient tests, ANOVA table, confidence intervals, and F-statistics to validate model fit.
Prediction Error
The difference between actual and predicted values (ei = yi - ŷi). Measures prediction quality.
Loss Function
Translates prediction errors into numeric measures for decision-making; MSE is most common.
Mean Squared Error (MSE)
Average squared difference between actual and predicted values. Lower MSE means better accuracy.
Bias in Prediction
Error due to simplifying assumptions; high bias means the model is too simple (underfitting).
Variance in Prediction
Measures sensitivity to data changes; high variance means overfitting.
Overfitting
Model fits training data too well, capturing noise rather than trend.
Underfitting
Model too simple to capture underlying trends in data.
Bias-Variance Tradeoff
As bias decreases, variance increases. The goal is balancing both to minimize total error.
Model Evaluation Metrics
Use MSE, R-squared, and adjusted R-squared to compare models.
AIC and BIC
Criteria combining model fit and complexity; lower values indicate better balance between accuracy and simplicity.
Training vs Test Set
Training data builds the model; test data evaluates predictive accuracy.
Cross-Validation
Technique that splits data into k folds to estimate model performance more reliably.
Leave-One-Out Cross Validation (LOOCV)
Special case of k-fold where k = n; each observation is used once as a validation sample.
K-Fold vs LOOCV
LOOCV has less bias but higher variance; k-fold is computationally efficient and more stable.
Best Model Selection
The best model minimizes prediction error while avoiding overfitting; uses validation or cross-validation results.
What is the purpose of model building for prediction?
To create a statistical model that best predicts the response variable using relevant explanatory variables while avoiding overfitting.
What is the main issue with high-dimensional data in regression models?
When there are too many predictors relative to observations, it can lead to overfitting, multicollinearity, and difficulty in interpretation.
What does the p-value of an explanatory variable represent in regression?
It indicates how helpful that variable is in explaining the variation in the response; lower p-values suggest stronger relationships.
When can insignificant variables be removed from a model?
If removing them does not significantly decrease adjusted R² or worsen prediction accuracy.
What are two conflicting goals in variable selection?
Including more predictors to improve model accuracy vs. limiting predictors to reduce variance and improve interpretability.
What is stepwise regression?
A step-by-step method of adding or removing predictor variables to identify the most effective set of predictors for the model.
What is forward selection in stepwise regression?
A bottom-up approach that begins with no predictors and adds the most significant ones sequentially based on criteria like p-value or AIC.
What is backward elimination in stepwise regression?
A top-down approach that starts with all candidate variables and removes the least significant ones until only significant predictors remain.
What is the combined stepwise method?
A hybrid of forward and backward selection that allows adding and removing variables as the model updates dynamically.
What is the Akaike Information Criterion (AIC)?
A measure of model quality that balances model fit and complexity; lower AIC indicates a better model.
What is Mallows’ Cp statistic used for?
To evaluate model bias and variance; the best model has a Cp value close to the number of predictors plus the intercept.
What is the Best Subset approach?
A method that evaluates all possible combinations of predictors and selects the model with the best statistical criteria like R², AIC, Cp, or BIC.
What are advantages of stepwise regression?
It identifies the most relevant predictors, reduces overfitting, and provides interpretable insights.
What are disadvantages of stepwise regression?
It may overfit, struggle with multicollinearity, introduce selection bias, and assume linear relationships.
What is the Bayesian Information Criterion (BIC)?
Similar to AIC but imposes a larger penalty for model complexity; lower BIC indicates a more parsimonious model.
What is dimensionality reduction?
Techniques used to reduce the number of predictors, such as principal component analysis (PCA), factor analysis, or Lasso regression.
What is the omitted variable problem in regression?
It occurs when an important variable is left out of the model, causing bias in the estimated coefficients of included variables.
When does omitted variable bias occur?
When the omitted variable affects the response variable and is correlated with one or more included predictors.
What is an example of omitted variable bias?
Excluding 'education level' when modeling income with 'work experience' can bias results since education influences both income and experience.
How does multiple regression help prevent omitted variable bias?
By including multiple relevant predictors, it accounts for shared variation and isolates each variable’s true effect on the outcome.
What are the two main hypothesis tests in multiple regression?
1) The F-test for overall model significance and 2) t-tests for individual coefficients.
What does the F-test in regression evaluate?
Whether at least one predictor variable in the model has a nonzero coefficient, indicating the model provides explanatory power.
What does a t-test for regression coefficients evaluate?
Whether a specific predictor has a statistically significant effect on the response variable, holding others constant.
What is R² in regression analysis?
The proportion of variance in the dependent variable explained by the independent variables in the model.
What is adjusted R²?
A modified version of R² that accounts for the number of predictors, preventing artificial inflation when adding unnecessary variables.
What is the correlation coefficient (r)?
A measure of the linear relationship between two variables ranging from -1 to +1.
What does the regression intercept represent?
The expected value of the response variable when all predictors are zero.
How is the slope coefficient interpreted in multiple regression?
It represents the expected change in the response variable for a one-unit change in that predictor, holding all other variables constant.
Why can the same variable appear to have opposite effects in simple and multiple regression?
Because simple regression ignores other predictors, while multiple regression controls for them, revealing the variable’s true effect.
What are the four key assumptions of multiple regression?
Linearity, Normality of errors, Homoscedasticity (constant variance), and Independence of errors.
What additional assumptions are important in regression?
No significant outliers and correct model specification.
What does linearity mean in regression?
That the relationship between each predictor and the response variable is linear.
How can nonlinearity be detected?
Using scatterplots or residual plots showing curved or systematic patterns.
What are ways to correct nonlinearity?
Add polynomial terms, transform predictors (e.g., log or square root), or include interaction terms.
What is homoscedasticity?
The assumption that the variance of the residuals is constant across all levels of predicted values.
What is heteroscedasticity?
When residual variance changes with fitted values, often forming a funnel shape in residual plots.
How can heteroscedasticity be fixed?
By transforming the dependent variable (e.g., log), using weighted regression, or robust standard errors.
What does independence of errors mean?
That residuals are uncorrelated; one observation’s error does not predict another’s.
What causes correlated errors?
Time-series data or clustered samples where measurements are not independent.
How can correlated errors be detected?
Using residual lag plots, Durbin-Watson tests, or autocorrelation plots.
What is the Durbin-Watson test used for?
To detect autocorrelation in residuals; values near 2 indicate no correlation, below 1.4 indicate positive correlation.
What are outliers in regression?
Observations with extreme response values that do not follow the overall data pattern.
What are leverage points?
Observations with extreme predictor values that can strongly influence the regression line.
How can outliers be detected?
By examining standardized residuals (values beyond ±3), Cook’s distance, or DFFITS values.
What is Cook’s Distance?
A measure of how much a data point influences the fitted regression coefficients; values >1 suggest influential points.
What are consequences of fitting a model with outliers?
Biased coefficients, inflated standard errors, and misleading p-values.
How can outliers be handled?
Investigate, remove, or model them separately; retrain the model iteratively to improve fit.
What is multicollinearity?
When two or more predictors are highly correlated, making it difficult to isolate individual effects.
How can multicollinearity be detected?
Using correlation matrices, scatterplots, or the Variance Inflation Factor (VIF).
What is the Variance Inflation Factor (VIF)?
A metric that quantifies how much the variance of a coefficient is inflated due to multicollinearity; VIF > 10 suggests severe collinearity.
What are the effects of multicollinearity?
It causes inflated standard errors, unstable coefficients, and unreliable significance tests.
How can multicollinearity be mitigated?
By removing or combining correlated predictors, or using principal component or partial least squares regression.
What is the main goal of Principal Component Analysis (PCA)?
Reduce dimensionality by finding new orthogonal axes (principal components) that capture the most variance.
Name two uses of PCA.
Reduce dimensions for computation, visualize high-dimensional data, remove noise, find patterns, and identify outliers.
How are principal components ordered?
By decreasing explained variance: PC1 explains the most, PC2 the second-most, etc.
What does a high eigenvalue mean in PCA?
A principal component (eigenvector) with a high eigenvalue explains a large portion of the variance.
Why standardize variables before PCA?
To put variables on the same scale so variance contributions are comparable (especially when units differ).
What problem do shrinkage (regularization) methods address?
High variance and overfitting when there are many predictors or multicollinearity.
Write the ridge regression loss function (conceptually).
Least squares loss plus λ times the sum of squared coefficients (L2 penalty).
Write the lasso regression loss function (conceptually).
Least squares loss plus λ times the sum of absolute coefficients (L1 penalty).
How does ridge regression affect coefficients?
Shrinks coefficients towards zero but does not set them exactly to zero.
How does lasso regression affect coefficients?
Can shrink some coefficients exactly to zero, performing variable selection.
What is an elastic net?
A combination of L1 (lasso) and L2 (ridge) penalties; balances shrinkage and variable selection.