STATS 762 Notes on Linear Regression Models

Essentials of Linear Regression Models

Course Overview

  • STATS 762: Graduate course equivalent to STATS 330.
    • Focuses on applied regression models with rigorous theoretical underpinnings.
    • Main software: R and R Markdown for assignments and in-class tutorials.

Administrative Details

  • Lecturers: Alain Vandal (first half) and Kate Lee (second half).
  • Schedule:
    • Lectures: Monday 9:00–10:00, Wednesday 14:00–16:00.
    • Tutorials: Thursday 9:00–10:00 or 10:00–11:00.
    • Office Hours: Thursdays 13:00–14:00 (Alain), others TBA.

Models Covered in Previous Courses

  • Ordinary (Normal) Regression: Continuous response.
  • Logistic Regression: Binary response.
  • Poisson Regression: Count data.
  • Aim of STATS 762: Deepen understanding of these models and others.

Course Outline

  1. Review linear models and generalized linear models (GLMs).
    • Topics: estimation, diagnostics, inference.
  2. Purposes of fitting regression models:
    • Prediction, understanding, control.
  3. Model choice for prediction and causal inference.
  4. Introduction to modern regression methods (e.g., lasso, quantile regression).

The Linear Model

  • Most commonly used statistical model; encompasses various types (ANOVA, ANCOVA, etc.).
  • Generalized Linear Model (GLM) is an extension of the linear model.

Linear Regression Basics

  • General equation:

    Yi = \beta0 + \beta1X{1i} + \beta2X{2i} + \epsilon_i

    • Where:
    • Y_i : response variable.
    • X{1i} , X{2i} : explanatory variables.
    • \epsilon_i : error term (normally distributed).
  • Objective: Estimate coefficients \beta0, \beta1, \beta_2 and variance \sigma^2 of error terms.

Key Characteristics of the Linear Model

  1. Responses Y_i are normally distributed.
  2. Mean \mu_i is a linear combination of predictors.
  3. Variance \sigma^2 is constant across observations.
  4. Independence of observations.

Generalized Linear Models (GLMs)

  • Include normal, binomial, and Poisson regressions as well as others (gamma, inverse-Gaussian, etc.).
  • Structure:
    • A probability distribution from a natural exponential family.
    • A link function that relates response to predictors.

Key Mathematical Functions

  • Logit Transform:

    \text{logit}(p) = \log \left( \frac{p}{1 - p} \right)

  • Inverse Logit (Logistic Function):

    p = \frac{\exp(\beta0 + \beta1X)}{1 + \exp(\beta0 + \beta1X)}

Concepts of Causation

  • Regression used for prediction and causal inference.
  • Confounding: A third variable influences both dependent and independent variables, potentially obscuring true relationships.
  • Stratification: Adjusting for confounding by separating data into subgroups.
  • Adjustments must consider all common causes of exposure and outcome to avoid biases.

Statistical Inference and Model Coefficients

  • Inference focused on:
    • Estimating model coefficients.
    • Understanding response values given predictors.
  • Standard inference methods include confidence intervals and hypothesis testing based on estimated coefficients.
  • Added Variable F-Test: Compares models to evaluate contribution of additional variables.

Practical Example: Catheter Length

  • Data includes patient height and weight as predictors for catheter length (response).
  • Model fitted using R's lm() function; interpretation of coefficients includes assessing the slope and intercept's meaning.

Predictions and Confidence Intervals

  • Use predict() to estimate expected values and provide confidence/prediction intervals.
  • Important to consider variability and relationship strength among variables when interpreting results.

Conclusion

  • Understanding linear regression models is crucial for making accurate predictions and causal conclusions in statistics and research.