CG

Simple Linear Regression

Chapter 11: Simple Linear Regression

Contents

  • Probabilistic Models

  • Fitting the Model: The Least Squares Approach

  • Model Assumptions

  • Assessing the Utility of the Model: Making Inferences about the Slope β1

  • The Coefficients of Correlation and Determination

  • Using the Model for Estimation and Prediction

  • A Complete Example

Introduction to the Simple Linear Regression Model

  • Purpose: Relate one quantitative variable to another quantitative variable.

  • Assessing Model Fit: Determine how well the model represents the sample data.

Key Concepts

1. Probabilistic Models
  • Definition: Represents phenomena with relationships between variables.

  • Types:

    • Deterministic Models: Exact relationships with negligible prediction error (e.g., y = 15x).

    • Probabilistic Models: Incorporates random error (e.g., y = 15x + ε) due to other variables.

2. General Form of Probabilistic Models
  • Equation: y = Deterministic component + Random error

  • Assumption: Mean value of random error (ε) is 0, leads to E(y) = Deterministic component.

3. First-Order (Straight Line) Model
  • Equation: y = β0 + β1x + ε

    • y: Dependent variable (response)

    • x: Independent variable (predictor)

    • β0: y-intercept (value of y when x = 0)

    • β1: Slope (change in y for a 1-unit increase in x)

Fitting the Model: The Least Squares Approach

1. Scatterplot
  • Visual representation of (xi, yi) pairs to evaluate model fit.

2. Least Squares Line
  • Properties:

    • Mean error = 0.

    • Minimum sum of squared errors (SSE).

  • Equation for estimates: (y-hat) = β0 + β1x

  • Formulas for estimates:

    • ( eta0 = \bar{y} - \beta1 \bar{x} )

    • ( \beta1 = \frac{SS{xy}}{SS_{xx}} )

3. Example: Advertising-Sales Data
  • Data collected over 5 months showing advertising expenditure and sales revenue.

  • Calculated estimates: ŷ = -0.1 + 0.7x (slope indicates increase in revenue for advertising increase).

Model Assumptions

  1. Mean of probability distribution of ε is 0.

  2. Variance of ε is constant across all x values.

  3. Normal distribution of ε.

  4. Independence of ε values.

Assessing the Utility of the Model: Inference about Slope β1

  • Slope estimator distribution is normal (dependent on assumptions).

  • Tests of significance can be conducted (e.g., hypothesis tests for β1).

Coefficient of Correlation & Determination

1. Coefficient of Correlation (r)
  • Range: -1 to +1; measures linear relationship's strength.

  • Formula:
    ( r = \frac{SS{xy}}{\sqrt{SS{xx} \cdot SS_{yy}}} )

2. Coefficient of Determination (r²)
  • Measures proportion of total variability in y explained by x.

  • Formula:
    ( r^2 = 1 - \frac{SSE}{SS_{yy}} )

Key Concepts Recap

  • Probabilistic Model: Combines deterministic elements with random error.

  • Method of Least Squares: Estimates intercept and slope of the regression line to minimize errors.

  • Utility of Model: Conduct hypothesis tests to evaluate significance of predictor variable.

  • Correlation Coefficients: Evaluate strength and degree of association between variables without implying causation.

  • Coefficient of Determination: Provides insight into the effectiveness of the model in explaining variability of the dependent variable.