Simple Linear Regression
Chapter 11: Simple Linear Regression
Contents
Probabilistic Models
Fitting the Model: The Least Squares Approach
Model Assumptions
Assessing the Utility of the Model: Making Inferences about the Slope β1
The Coefficients of Correlation and Determination
Using the Model for Estimation and Prediction
A Complete Example
Introduction to the Simple Linear Regression Model
Purpose: Relate one quantitative variable to another quantitative variable.
Assessing Model Fit: Determine how well the model represents the sample data.
Key Concepts
1. Probabilistic Models
Definition: Represents phenomena with relationships between variables.
Types:
Deterministic Models: Exact relationships with negligible prediction error (e.g., y = 15x).
Probabilistic Models: Incorporates random error (e.g., y = 15x + ε) due to other variables.
2. General Form of Probabilistic Models
Equation: y = Deterministic component + Random error
Assumption: Mean value of random error (ε) is 0, leads to E(y) = Deterministic component.
3. First-Order (Straight Line) Model
Equation: y = β0 + β1x + ε
y: Dependent variable (response)
x: Independent variable (predictor)
β0: y-intercept (value of y when x = 0)
β1: Slope (change in y for a 1-unit increase in x)
Fitting the Model: The Least Squares Approach
1. Scatterplot
Visual representation of (xi, yi) pairs to evaluate model fit.
2. Least Squares Line
Properties:
Mean error = 0.
Minimum sum of squared errors (SSE).
Equation for estimates: (y-hat) = β0 + β1x
Formulas for estimates:
( eta0 = \bar{y} - \beta1 \bar{x} )
( \beta1 = \frac{SS{xy}}{SS_{xx}} )
3. Example: Advertising-Sales Data
Data collected over 5 months showing advertising expenditure and sales revenue.
Calculated estimates: ŷ = -0.1 + 0.7x (slope indicates increase in revenue for advertising increase).
Model Assumptions
Mean of probability distribution of ε is 0.
Variance of ε is constant across all x values.
Normal distribution of ε.
Independence of ε values.
Assessing the Utility of the Model: Inference about Slope β1
Slope estimator distribution is normal (dependent on assumptions).
Tests of significance can be conducted (e.g., hypothesis tests for β1).
Coefficient of Correlation & Determination
1. Coefficient of Correlation (r)
Range: -1 to +1; measures linear relationship's strength.
Formula:
( r = \frac{SS{xy}}{\sqrt{SS{xx} \cdot SS_{yy}}} )
2. Coefficient of Determination (r²)
Measures proportion of total variability in y explained by x.
Formula:
( r^2 = 1 - \frac{SSE}{SS_{yy}} )
Key Concepts Recap
Probabilistic Model: Combines deterministic elements with random error.
Method of Least Squares: Estimates intercept and slope of the regression line to minimize errors.
Utility of Model: Conduct hypothesis tests to evaluate significance of predictor variable.
Correlation Coefficients: Evaluate strength and degree of association between variables without implying causation.
Coefficient of Determination: Provides insight into the effectiveness of the model in explaining variability of the dependent variable.