Course: ISOM 201 Data & Decision Analysis
Instructor: Dr. Yuanxiang John Li
Affiliation: Sawyer Business School, Suffolk University, Boston
4.1 Scatter Diagrams: Visual tool to show relationships between variables
4.2 Simple Linear Regression: Involves one dependent and one independent variable
4.3 Measuring the Fit of the Regression Model: Techniques to evaluate model performance
4.4 Assumptions of the Regression Model: Necessary conditions for valid statistical tests
4.5 Testing the Model for Significance: Assessing the relationship between variables
4.6 Using Computer Software for Regression: Use of software tools for analyses
4.7 Multiple Regression Analysis: Models incorporating multiple independent variables
4.8 Binary or Dummy Variables: Treatment of qualitative data in regression
4.9 Model Building: Strategies for constructing effective models
4.10 Nonlinear Regression: Models addressing non-linear relationships
4.11 Cautions and Pitfalls in Regression Analysis: Important considerations and common mistakes
Purpose of Regression Analysis:
An invaluable tool for managers.
Used to understand relationships between variables.
Utilize in predicting variable values based on others.
Types of Regression Models:
Simple Linear Regression: Contains only two variables (one dependent and one independent).
Multiple Regression Models: Involves more than one independent variable.
Dependent Variable:
Also known as the response variable.
Its value is influenced by the independent variable(s).
Independent Variable(s):
Known as predictor or explanatory variables, they are used to predict the dependent variable.
Explanatory or Predictor variable
Definition: A graphical representation to investigate relationships between variables.
Axes:
Independent variable plotted on the X-axis.
Dependent variable plotted on the Y-axis.
Case Context:
Triple A Construction specializes in home renovation.
The renovation dollar volume depends on the area payroll.
Data Representation: Scatter diagram is created from company sales versus local payroll data.
Model Structure:
Simple linear regression has one dependent and one independent variable:
Y = β0 + β1X + e
Y: dependent variable (response)
X: independent variable (predictor)
β0: intercept (Y value when X = 0)
β1: slope of the regression line
e: random error.
Estimation:
True values of slope (β1) and intercept (β0) are unknown but estimable from sample data:
Ŷ = b0 + b1X
Ŷ: predicted value of Y
b0: estimate of β0 from sample
b1: estimate of β1 from sample.
Prediction Setup:
Predict sales from area payroll.
Errors/Residual: Actual value minus Predicted value.
Regression analysis utilizes the least-squares approach to minimize Sum of Squared Errors (SSE).
Formula Assumptions: Coefficient estimates for simple linear regression are calculated using the averages of values:
Sample means for Y (Sales) and X (Payroll) are used to calculate estimates.
Regression Calculation Insights:
Computation for regression coefficients and analysis of the variance in Y scores.
Final Model Output:
Resulting regression model: Sales = 2 + 1.25(Payroll).
Sum of Squares:
Total Sum of Squares (SST): Total variability around the mean.
Sum of Squares Error (SSE): Variability around the regression line.
Sum of Squares Regression (SSR): Variability explained by the model.
Relationship: SST = SSR + SSE
Detailed statistical breakdown of Y, X, and their variances related to regression.
Definition: Proportion of variability in Y explained by the regression model.
Range: 0 to 1; higher values indicate a better-fitting model.
Example for Triple A Construction: r² is approximately 0.6944, indicating that about 69% of the variability in sales is explained by payroll.
Definition: Measures strength of linear relationships, ranging between +1 and -1.
Example calculation for Triple A Construction yields r = 0.8333.
Key Assumptions:
Errors are independent.
Errors are normally distributed.
Errors have a mean of zero.
Errors have constant variance.
Residual plots can highlight violations of these assumptions.
Error Residual analysis through various plots.
Variance estimation methods using Mean Squared Error (MSE).
General procedure for hypothesis testing around regression models, considering null and alternative hypotheses as well as F-statistics.
Expansion of simple linear regression to include multiple independent variables with a defined relationship.
Estimation process of parameters using data samples in a multiple regression framework.
Establish a model to suggest pricing based on house size and age metrics.
Information on properties sold, including selling price, square footage, condition, etc.
Similarities and differences in evaluating significance in multiple regression compared to simple models using p-values and hypothesis testing.
Consideration of significance for each independent variable in predicting outcomes for real estate pricing.
Definition: Created for qualitative data, allowing for inclusion of categorical variables (e.g., condition types) within the regression framework.
Implementation of dummy variables to enhance model predictions regarding house condition.
Importance of adjusted r² over regular r² for assessing model accuracy, particularly with added variables.
Stepwise regression methodologies including forward and backward steps to enhance predictive modeling.
Introduction and methods for transforming non-linear relationships into linear models for ease of analysis.
Analyzing the impact of weight on fuel efficiency (MPG) using regression techniques.
Cautions regarding invalid statistical tests, correlation vs. causation, various regression pitfalls, and modeling beyond known data ranges.